The Great Fail2ban Debugging Adventure

📊 Chapter 1: The Dashboard Mystery

Our story begins with a simple problem: a fail2ban monitoring dashboard showing the dreaded "Error fetching logs" message. What seemed like a straightforward PHP/SSH configuration issue would soon reveal itself to be something far more interesting...

20:45 PDT

Problem Identified: Dashboard on the-lab cannot fetch logs from web-node

ssh -i /home/jedi/.ssh/id_rsa -o StrictHostKeyChecking=no [email protected] 'sudo tail -50 /var/log/fail2ban.log'

ssh: connect to host 192.168.1.XX port 22: Connection refused

Initial Assessment

The error suggests SSH connectivity issues. Time to investigate the network layer and service status systematically.

🔍 Chapter 2: The Investigation

Time for some network detective work. Let's see what's happening with that SSH connection...

20:49 PDT

Network Analysis: Testing connectivity and port status

nmap -p 22 192.168.1.XX

PORT STATE SERVICE
22/tcp closed ssh

Port 22 is closed! But wait... SSH was working just minutes ago!

20:50 PDT

Contradiction Discovered: Direct access to web-node shows SSH is actually running

sudo systemctl status ssh

● ssh.service - OpenBSD Secure Shell server
Active: active (running) since Sun 2025-08-24 01:19:01 PDT

The plot thickens - SSH is running on web-node, but the-lab can't connect to it.

PLOT TWIST!

SSH is running on web-node, but the-lab can't connect. Looking at the SSH logs reveals the shocking truth...

Aug 25 20:44:15 web-node sshd[11464]: Accepted publickey for jedi from 192.168.1.XXX
Aug 25 20:46:03 web-node sshd[11526]: Failed password for jedi from 192.168.1.XXX
Aug 25 20:46:03 web-node sshd[11526]: Failed password for jedi from 192.168.1.XXX
Aug 25 20:46:05 web-node sshd[11528]: Failed password for jedi from 192.168.1.XXX

The fail2ban dashboard was banned by fail2ban itself!

⚖️ Chapter 3: The Ironic Justice

Our fail2ban system did exactly what it was designed to do - protect against brute force attacks. The PHP dashboard, trying to connect without proper key authentication, triggered multiple failed attempts and got banned!

20:52 PDT

Fail2ban Status Check:

sudo fail2ban-client status sshd

Status for the jail: sshd
|- Currently failed: 1
|- Total failed: 12
|- Currently banned: 1
|- Total banned: 2
- Banned IP list: 192.168.1.XXX

There it is - 192.168.1.XXX (the-lab) is officially banned by its own monitoring target!

🔓 Chapter 4: The Resolution

With the mystery solved, the fix was simple - unban the dashboard server!

20:52 PDT

Unbanning the-lab:

sudo fail2ban-client set sshd unbanip 192.168.1.XXX

Success! The dashboard server is now free to connect again.

Happy Ending

The dashboard immediately started working again. The final test showed the complete story in the fail2ban logs:

ssh [email protected] 'sudo tail -10 /var/log/fail2ban.log'

📈 Chapter 5: The Complete Log Timeline

Here's the complete story as told by the fail2ban logs:

2025-08-25 20:31:44,897 fail2ban.actions [10996]: NOTICE [sshd] Ban 192.168.1.20

2025-08-25 20:39:36,772 fail2ban.filter [10996]: INFO [sshd] Found 192.168.1.XXX - 2025-08-25 20:39:36

2025-08-25 20:39:36,773 fail2ban.filter [10996]: INFO [sshd] Found 192.168.1.XXX - 2025-08-25 20:39:36

2025-08-25 20:41:44,008 fail2ban.actions [10996]: NOTICE [sshd] Unban 192.168.1.20

2025-08-25 20:46:03,968 fail2ban.filter [10996]: INFO [sshd] Found 192.168.1.XXX - 2025-08-25 20:46:03

2025-08-25 20:46:03,969 fail2ban.filter [10996]: INFO [sshd] Found 192.168.1.XXX - 2025-08-25 20:46:03

2025-08-25 20:46:05,575 fail2ban.filter [10996]: INFO [sshd] Found 192.168.1.XXX - 2025-08-25 20:46:05

2025-08-25 20:46:05,576 fail2ban.filter [10996]: INFO [sshd] Found 192.168.1.XXX - 2025-08-25 20:46:05

2025-08-25 20:46:05,691 fail2ban.actions [10996]: NOTICE [sshd] Ban 192.168.1.XXX

2025-08-25 20:52:43,901 fail2ban.actions [10996]: NOTICE [sshd] Unban 192.168.1.XXX

📊 Incident Statistics

Servers Involved

the-lab, web-node, lab-node

Total Failed Attempts

Before banning occurred

IPs Banned

Both lab-node and the-lab

Debug Time

~30min

From problem to resolution

🧠 Lessons Learned

Fail2ban works perfectly: The system correctly identified and banned brute force attempts, demonstrating effective intrusion prevention.

Monitoring can be monitored: Even security dashboards aren't immune to the systems they monitor - a perfect example of recursive security.

Check the logs: The SSH service logs revealed the true story behind the connection failures, highlighting the importance of systematic log analysis.

Network debugging process: Systematic testing (nmap, systemctl status, log analysis) led to the solution rather than assumptions.

Beautiful irony: Sometimes the thing you're monitoring is what's blocking you from monitoring it - a perfect cybersecurity paradox!

Prevention Strategy

To prevent this in the future, ensure your monitoring systems use proper SSH key authentication and consider whitelisting monitoring server IPs in fail2ban configuration.