The Great Fail2ban Debugging Adventure

A tale of dashboards, SSH connections, and the irony of security systems

When your monitoring system becomes the monitored - a debugging story that perfectly illustrates how security measures can protect against themselves, creating a beautiful paradox in network administration.

📊 Chapter 1: The Dashboard Mystery

Our story begins with a simple problem: a fail2ban monitoring dashboard showing the dreaded "Error fetching logs" message. What seemed like a straightforward PHP/SSH configuration issue would soon reveal itself to be something far more interesting...

20:45 PDT

Problem Identified: Dashboard on the-lab cannot fetch logs from web-node

ssh -i /home/jedi/.ssh/id_rsa -o StrictHostKeyChecking=no [email protected] 'sudo tail -50 /var/log/fail2ban.log'
ssh: connect to host 192.168.1.XX port 22: Connection refused
Initial Assessment
The error suggests SSH connectivity issues. Time to investigate the network layer and service status systematically.

🔍 Chapter 2: The Investigation

Time for some network detective work. Let's see what's happening with that SSH connection...

20:49 PDT

Network Analysis: Testing connectivity and port status

nmap -p 22 192.168.1.XX
PORT STATE SERVICE
22/tcp closed ssh

Port 22 is closed! But wait... SSH was working just minutes ago!

20:50 PDT

Contradiction Discovered: Direct access to web-node shows SSH is actually running

sudo systemctl status ssh
● ssh.service - OpenBSD Secure Shell server
Active: active (running) since Sun 2025-08-24 01:19:01 PDT

The plot thickens - SSH is running on web-node, but the-lab can't connect to it.

PLOT TWIST!

SSH is running on web-node, but the-lab can't connect. Looking at the SSH logs reveals the shocking truth...

Aug 25 20:44:15 web-node sshd[11464]: Accepted publickey for jedi from 192.168.1.XXX
Aug 25 20:46:03 web-node sshd[11526]: Failed password for jedi from 192.168.1.XXX
Aug 25 20:46:03 web-node sshd[11526]: Failed password for jedi from 192.168.1.XXX
Aug 25 20:46:05 web-node sshd[11528]: Failed password for jedi from 192.168.1.XXX

The fail2ban dashboard was banned by fail2ban itself!

⚖️ Chapter 3: The Ironic Justice

Our fail2ban system did exactly what it was designed to do - protect against brute force attacks. The PHP dashboard, trying to connect without proper key authentication, triggered multiple failed attempts and got banned!

20:52 PDT

Fail2ban Status Check:

sudo fail2ban-client status sshd
Status for the jail: sshd
|- Currently failed: 1
|- Total failed: 12
|- Currently banned: 1
|- Total banned: 2
- Banned IP list: 192.168.1.XXX

There it is - 192.168.1.XXX (the-lab) is officially banned by its own monitoring target!

🔓 Chapter 4: The Resolution

With the mystery solved, the fix was simple - unban the dashboard server!

20:52 PDT

Unbanning the-lab:

sudo fail2ban-client set sshd unbanip 192.168.1.XXX
1

Success! The dashboard server is now free to connect again.

Happy Ending

The dashboard immediately started working again. The final test showed the complete story in the fail2ban logs:

ssh [email protected] 'sudo tail -10 /var/log/fail2ban.log'

📈 Chapter 5: The Complete Log Timeline

Here's the complete story as told by the fail2ban logs:

2025-08-25 20:31:44,897 fail2ban.actions [10996]: NOTICE [sshd] Ban 192.168.1.20
2025-08-25 20:39:36,772 fail2ban.filter [10996]: INFO [sshd] Found 192.168.1.XXX - 2025-08-25 20:39:36
2025-08-25 20:39:36,773 fail2ban.filter [10996]: INFO [sshd] Found 192.168.1.XXX - 2025-08-25 20:39:36
2025-08-25 20:41:44,008 fail2ban.actions [10996]: NOTICE [sshd] Unban 192.168.1.20
2025-08-25 20:46:03,968 fail2ban.filter [10996]: INFO [sshd] Found 192.168.1.XXX - 2025-08-25 20:46:03
2025-08-25 20:46:03,969 fail2ban.filter [10996]: INFO [sshd] Found 192.168.1.XXX - 2025-08-25 20:46:03
2025-08-25 20:46:05,575 fail2ban.filter [10996]: INFO [sshd] Found 192.168.1.XXX - 2025-08-25 20:46:05
2025-08-25 20:46:05,576 fail2ban.filter [10996]: INFO [sshd] Found 192.168.1.XXX - 2025-08-25 20:46:05
2025-08-25 20:46:05,691 fail2ban.actions [10996]: NOTICE [sshd] Ban 192.168.1.XXX
2025-08-25 20:52:43,901 fail2ban.actions [10996]: NOTICE [sshd] Unban 192.168.1.XXX

📊 Incident Statistics

Servers Involved

3

the-lab, web-node, lab-node

Total Failed Attempts

12

Before banning occurred

IPs Banned

2

Both lab-node and the-lab

Debug Time

~30min

From problem to resolution

🧠 Lessons Learned

Fail2ban works perfectly: The system correctly identified and banned brute force attempts, demonstrating effective intrusion prevention.

Monitoring can be monitored: Even security dashboards aren't immune to the systems they monitor - a perfect example of recursive security.

Check the logs: The SSH service logs revealed the true story behind the connection failures, highlighting the importance of systematic log analysis.

Network debugging process: Systematic testing (nmap, systemctl status, log analysis) led to the solution rather than assumptions.

Beautiful irony: Sometimes the thing you're monitoring is what's blocking you from monitoring it - a perfect cybersecurity paradox!

Prevention Strategy
To prevent this in the future, ensure your monitoring systems use proper SSH key authentication and consider whitelisting monitoring server IPs in fail2ban configuration.