[BlueOnyx:15483] BFD running amok
William Thackrey
wethackrey at gmail.com
Thu May 29 09:29:36 -05 2014
Gents –
We're running BlueOnyx 5108R servers. The servers have Solarspeed APF/BFD installed, among other packages. Over the past few days, on one of them, we've been seeing multiple bfd processes being spawned and never stopping. Yesterday there were 95 different bfd processes running concurrently. This brings server performance to it's knees. This occurred immediately after a yum update, though we have no evidence that the problem is associated with that.
Reboots of the server didn't solve the problem. We tried extending LOCK_TIMEOUT in /usr/local/bfd/conf.bfd to 1000, but that didn't fix it. For the moment, we've killed the bfd cron job. Snippets of a ps output and the bfd log are attached below. Note there are some very high %CPU numbers. Anyone have an idea where we might look for root cause?
Thanks!
Bill Thackrey
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 11915 0.0 0.0 106096 1288 ? Ss 14:30 0:00 /bin/sh /usr/local/sbin/bfd -q
root 11936 7.4 0.7 166064 61204 ? S 14:30 0:10 /bin/sh /usr/local/sbin/bfd -s
root 12821 0.0 0.0 106096 1292 ? Ss 13:30 0:00 /bin/sh /usr/local/sbin/bfd -q
root 12840 0.3 0.7 166060 61208 ? S 13:30 0:12 /bin/sh /usr/local/sbin/bfd -s
root 15470 0.0 0.7 166064 60452 ? S 14:32 0:00 /bin/sh /usr/local/sbin/bfd -s
root 15471 109 1.4 225860 120140 ? R 14:32 0:04 /bin/sh /usr/local/sbin/bfd -s
root 15513 0.0 0.7 166060 60460 ? S 14:32 0:00 /bin/sh /usr/local/sbin/bfd -s
root 15514 116 1.4 225860 120136 ? R 14:32 0:03 /bin/sh /usr/local/sbin/bfd -s
root 15556 0.0 0.7 166064 60448 ? S 14:32 0:00 /bin/sh /usr/local/sbin/bfd -s
root 15557 119 1.4 225860 120136 ? R 14:32 0:02 /bin/sh /usr/local/sbin/bfd -s
root 22149 0.0 0.0 106096 1296 ? Ss 14:00 0:00 /bin/sh /usr/local/sbin/bfd -q
root 22169 0.5 0.7 166064 61208 ? S 14:00 0:10 /bin/sh /usr/local/sbin/bfd -s
May 28 08:15:02 savusavu BFD(13465): cleared stale lock file file.
May 28 08:30:02 savusavu BFD(6344): locked subsystem, already running ? (/usr/local/bfd/lock.utime is 899 seconds old), aborting.
May 28 08:44:58 savusavu BFD(12425): {sshd} 116.10.191.209 exceeded login failures; executed ban command '/etc/apf/apf -d 116.10.191.209 {bfd.sshd}'.
May 28 08:45:00 savusavu BFD(12425): {sshd} 212.129.12.79 exceeded login failures; executed ban command '/etc/apf/apf -d 212.129.12.79 {bfd.sshd}'.
May 28 08:45:01 savusavu BFD(9037): cleared stale lock file file.
May 28 08:45:03 savusavu BFD(12425): {sshd} 61.19.247.185 exceeded maximum login failures; host already banned or ignored.
May 28 09:15:01 savusavu BFD(12653): locked subsystem, already running ? (/usr/local/bfd/lock.utime is 900 seconds old), aborting.
May 28 09:45:01 savusavu BFD(14714): locked subsystem, already running ? (/usr/local/bfd/lock.utime is 900 seconds old), aborting.
May 28 10:00:01 savusavu BFD(18677): cleared stale lock file file.
May 28 10:15:01 savusavu BFD(26810): locked subsystem, already running ? (/usr/local/bfd/lock.utime is 900 seconds old), aborting.
May 28 10:30:01 savusavu BFD(8659): cleared stale lock file file.
May 28 10:45:01 savusavu BFD(22898): locked subsystem, already running ? (/usr/local/bfd/lock.utime is 900 seconds old), aborting.
May 28 11:00:01 savusavu BFD(5855): cleared stale lock file file.
May 28 11:15:01 savusavu BFD(23333): locked subsystem, already running ? (/usr/local/bfd/lock.utime is 900 seconds old), aborting.
May 28 11:30:01 savusavu BFD(13826): cleared stale lock file file.
May 28 11:45:01 savusavu BFD(32726): locked subsystem, already running ? (/usr/local/bfd/lock.utime is 900 seconds old), aborting.
May 28 12:00:01 savusavu BFD(21524): cleared stale lock file file.
May 28 12:15:01 savusavu BFD(13210): locked subsystem, already running ? (/usr/local/bfd/lock.utime is 900 seconds old), aborting.
May 28 12:30:01 savusavu BFD(9627): cleared stale lock file file.
May 28 12:45:02 savusavu BFD(6820): locked subsystem, already running ? (/usr/local/bfd/lock.utime is 901 seconds old), aborting.
May 28 13:00:01 savusavu BFD(5289): cleared stale lock file file.
May 28 13:37:37 savusavu BFD(19205): cleared stale lock file file.
May 28 13:38:37 savusavu BFD(20002): locked subsystem, already running ? (/usr/local/bfd/lock.utime is 60 seconds old), aborting.
May 28 17:20:03 savusavu BFD(8209): cleared stale lock file file.
More information about the Blueonyx
mailing list