[BlueOnyx:15019] Re: Vacation blocked by cced down

Eric Peabody admin at bnserve.com
Wed Mar 26 21:24:27 -05 2014


Michael,

You are right that the script will hang if cced is down but nagios has 
the logic to time out a plugin that hangs.  The default for 
service_check_timeout is 60 seconds which seems reasonable for this 
situation.  In timeout, a CRITICAL error is flagged.

Your method seems a good one too.  Trying to recover rather than just 
complain is certainly preferable and restarting cced should do just 
that.  Being lazy, I didn't bother with that function since I'd have to 
do the timeout handler first.  An event handler for nagios would work too.

I wonder if you are re-inventing the wheel and perhaps the product would 
be better if nagios was included and configured for common checks.  
There are rpms for it so the config files would be needed and a few 
custom plugins.  The monitoring would be easy but integrating the UI 
would require some work.  Some examples of frontends are here: 
http://www.nagios.org/download/frontends and here: 
http://exchange.nagios.org/directory/Addons/Frontends-%28GUIs-and-CLIs%29/Web-Interfaces/#/

Eric

On 3/26/14 8:06 PM, Michael Stauber wrote:
> Hi Eric,
>
>> Here's a nagios plugin for checking cced for those that run nagios:
>>
>> #!/usr/bin/perl -w -I/usr/sausalito/perl
>>
>> use strict;
>> use lib qw( /usr/sausalito/perl );
>> use CCE;
>>
>>
>> my $cce = new CCE;
>> $cce->connectuds();
>> my ($oid) = $cce->find("System");
>> $cce->bye('SUCCESS');
>>
>> if ($oid == '1') {
>>    print "OK";
>>    exit(0);
>> } else {
>>    print "OID == $oid";
>>    exit(2);
>> }
> Thank you. Was thinking about something like that as well.
>
> Yeah, that'll do fine to detect if CCEd is stopped or crashed.
>
> But if CCEd is hanging and this checker is run via a cronjob, it'll just
> hang on the conectuds() call as well.
>
> I was thinking about a two staged approach. One that checks for 'cced'
> or 'pperld' processes that sit around in Zombie state (D state). If none
> are present, it runs a Perl script similar to yours to check if CCEd
> actually responds expectedly to queries within a reasonable time.
>
> Another approach I discussed with Greg will be used in the new GUI.
> Instead of showing the red text on white background that CCEd is down
> (when it is down), we initiate a restart of it instead, refresh the page
> and try again. Only if all fails it'll show the usual "Doh!" page.
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.blueonyx.it/pipermail/blueonyx/attachments/20140326/07328f43/attachment.html>


More information about the Blueonyx mailing list