[BlueOnyx:15019] Re: Vacation blocked by cced down
Eric Peabody
admin at bnserve.com
Wed Mar 26 21:24:27 -05 2014
Michael,
You are right that the script will hang if cced is down but nagios has
the logic to time out a plugin that hangs. The default for
service_check_timeout is 60 seconds which seems reasonable for this
situation. In timeout, a CRITICAL error is flagged.
Your method seems a good one too. Trying to recover rather than just
complain is certainly preferable and restarting cced should do just
that. Being lazy, I didn't bother with that function since I'd have to
do the timeout handler first. An event handler for nagios would work too.
I wonder if you are re-inventing the wheel and perhaps the product would
be better if nagios was included and configured for common checks.
There are rpms for it so the config files would be needed and a few
custom plugins. The monitoring would be easy but integrating the UI
would require some work. Some examples of frontends are here:
http://www.nagios.org/download/frontends and here:
http://exchange.nagios.org/directory/Addons/Frontends-%28GUIs-and-CLIs%29/Web-Interfaces/#/
Eric
On 3/26/14 8:06 PM, Michael Stauber wrote:
> Hi Eric,
>
>> Here's a nagios plugin for checking cced for those that run nagios:
>>
>> #!/usr/bin/perl -w -I/usr/sausalito/perl
>>
>> use strict;
>> use lib qw( /usr/sausalito/perl );
>> use CCE;
>>
>>
>> my $cce = new CCE;
>> $cce->connectuds();
>> my ($oid) = $cce->find("System");
>> $cce->bye('SUCCESS');
>>
>> if ($oid == '1') {
>> print "OK";
>> exit(0);
>> } else {
>> print "OID == $oid";
>> exit(2);
>> }
> Thank you. Was thinking about something like that as well.
>
> Yeah, that'll do fine to detect if CCEd is stopped or crashed.
>
> But if CCEd is hanging and this checker is run via a cronjob, it'll just
> hang on the conectuds() call as well.
>
> I was thinking about a two staged approach. One that checks for 'cced'
> or 'pperld' processes that sit around in Zombie state (D state). If none
> are present, it runs a Perl script similar to yours to check if CCEd
> actually responds expectedly to queries within a reasonable time.
>
> Another approach I discussed with Greg will be used in the new GUI.
> Instead of showing the red text on white background that CCEd is down
> (when it is down), we initiate a restart of it instead, refresh the page
> and try again. Only if all fails it'll show the usual "Doh!" page.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.blueonyx.it/pipermail/blueonyx/attachments/20140326/07328f43/attachment.html>
More information about the Blueonyx
mailing list