[BlueOnyx:15483] BFD running amok

William Thackrey wethackrey at gmail.com
Thu May 29 09:29:36 -05 2014


Gents – 

We're running BlueOnyx 5108R servers.  The servers have Solarspeed APF/BFD installed, among other packages. Over the past few days, on one of them, we've been seeing multiple bfd processes being spawned and never stopping.  Yesterday there were 95 different bfd processes running concurrently.  This brings server performance to it's knees.  This occurred immediately after a yum update, though we have no evidence that the problem is associated with that.

Reboots of the server didn't solve the problem.  We tried extending LOCK_TIMEOUT in /usr/local/bfd/conf.bfd to 1000, but that didn't fix it.  For the moment, we've killed the bfd cron job.  Snippets of a ps output and the bfd log are attached below.  Note there are some very high %CPU numbers.  Anyone have an idea where we might look for root cause?

Thanks!
Bill Thackrey


USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     11915  0.0  0.0 106096  1288 ?        Ss   14:30   0:00 /bin/sh /usr/local/sbin/bfd -q
root     11936  7.4  0.7 166064 61204 ?        S    14:30   0:10 /bin/sh /usr/local/sbin/bfd -s
root     12821  0.0  0.0 106096  1292 ?        Ss   13:30   0:00 /bin/sh /usr/local/sbin/bfd -q
root     12840  0.3  0.7 166060 61208 ?        S    13:30   0:12 /bin/sh /usr/local/sbin/bfd -s
root     15470  0.0  0.7 166064 60452 ?        S    14:32   0:00 /bin/sh /usr/local/sbin/bfd -s
root     15471  109  1.4 225860 120140 ?       R    14:32   0:04 /bin/sh /usr/local/sbin/bfd -s
root     15513  0.0  0.7 166060 60460 ?        S    14:32   0:00 /bin/sh /usr/local/sbin/bfd -s
root     15514  116  1.4 225860 120136 ?       R    14:32   0:03 /bin/sh /usr/local/sbin/bfd -s
root     15556  0.0  0.7 166064 60448 ?        S    14:32   0:00 /bin/sh /usr/local/sbin/bfd -s
root     15557  119  1.4 225860 120136 ?       R    14:32   0:02 /bin/sh /usr/local/sbin/bfd -s
root     22149  0.0  0.0 106096  1296 ?        Ss   14:00   0:00 /bin/sh /usr/local/sbin/bfd -q
root     22169  0.5  0.7 166064 61208 ?        S    14:00   0:10 /bin/sh /usr/local/sbin/bfd -s

May 28 08:15:02 savusavu BFD(13465): cleared stale lock file file.
May 28 08:30:02 savusavu BFD(6344): locked subsystem, already running ? (/usr/local/bfd/lock.utime is 899 seconds old), aborting.
May 28 08:44:58 savusavu BFD(12425): {sshd} 116.10.191.209 exceeded login failures; executed ban command '/etc/apf/apf -d 116.10.191.209 {bfd.sshd}'.
May 28 08:45:00 savusavu BFD(12425): {sshd} 212.129.12.79 exceeded login failures; executed ban command '/etc/apf/apf -d 212.129.12.79 {bfd.sshd}'.
May 28 08:45:01 savusavu BFD(9037): cleared stale lock file file.
May 28 08:45:03 savusavu BFD(12425): {sshd} 61.19.247.185 exceeded maximum login failures; host already banned or ignored.
May 28 09:15:01 savusavu BFD(12653): locked subsystem, already running ? (/usr/local/bfd/lock.utime is 900 seconds old), aborting.
May 28 09:45:01 savusavu BFD(14714): locked subsystem, already running ? (/usr/local/bfd/lock.utime is 900 seconds old), aborting.
May 28 10:00:01 savusavu BFD(18677): cleared stale lock file file.
May 28 10:15:01 savusavu BFD(26810): locked subsystem, already running ? (/usr/local/bfd/lock.utime is 900 seconds old), aborting.
May 28 10:30:01 savusavu BFD(8659): cleared stale lock file file.
May 28 10:45:01 savusavu BFD(22898): locked subsystem, already running ? (/usr/local/bfd/lock.utime is 900 seconds old), aborting.
May 28 11:00:01 savusavu BFD(5855): cleared stale lock file file.
May 28 11:15:01 savusavu BFD(23333): locked subsystem, already running ? (/usr/local/bfd/lock.utime is 900 seconds old), aborting.
May 28 11:30:01 savusavu BFD(13826): cleared stale lock file file.
May 28 11:45:01 savusavu BFD(32726): locked subsystem, already running ? (/usr/local/bfd/lock.utime is 900 seconds old), aborting.
May 28 12:00:01 savusavu BFD(21524): cleared stale lock file file.
May 28 12:15:01 savusavu BFD(13210): locked subsystem, already running ? (/usr/local/bfd/lock.utime is 900 seconds old), aborting.
May 28 12:30:01 savusavu BFD(9627): cleared stale lock file file.
May 28 12:45:02 savusavu BFD(6820): locked subsystem, already running ? (/usr/local/bfd/lock.utime is 901 seconds old), aborting.
May 28 13:00:01 savusavu BFD(5289): cleared stale lock file file.
May 28 13:37:37 savusavu BFD(19205): cleared stale lock file file.
May 28 13:38:37 savusavu BFD(20002): locked subsystem, already running ? (/usr/local/bfd/lock.utime is 60 seconds old), aborting.
May 28 17:20:03 savusavu BFD(8209): cleared stale lock file file.






More information about the Blueonyx mailing list