<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Michael,<br>
<br>
You are right that the script will hang if cced is down but nagios
has the logic to time out a plugin that hangs. The default for
service_check_timeout is 60 seconds which seems reasonable for
this situation. In timeout, a CRITICAL error is flagged.<br>
<br>
Your method seems a good one too. Trying to recover rather than
just complain is certainly preferable and restarting cced should
do just that. Being lazy, I didn't bother with that function
since I'd have to do the timeout handler first. An event handler
for nagios would work too.<br>
<br>
I wonder if you are re-inventing the wheel and perhaps the product
would be better if nagios was included and configured for common
checks. There are rpms for it so the config files would be needed
and a few custom plugins. The monitoring would be easy but
integrating the UI would require some work. Some examples of
frontends are here: <a class="moz-txt-link-freetext" href="http://www.nagios.org/download/frontends">http://www.nagios.org/download/frontends</a> and
here:
<a class="moz-txt-link-freetext" href="http://exchange.nagios.org/directory/Addons/Frontends-%28GUIs-and-CLIs%29/Web-Interfaces/#/">http://exchange.nagios.org/directory/Addons/Frontends-%28GUIs-and-CLIs%29/Web-Interfaces/#/</a><br>
<br>
Eric<br>
<div class="moz-signature">
<meta http-equiv="content-type" content="text/html;
charset=ISO-8859-1">
<title></title>
<br>
</div>
On 3/26/14 8:06 PM, Michael Stauber wrote:<br>
</div>
<blockquote cite="mid:533379A8.6010802@blueonyx.it" type="cite">
<pre wrap="">Hi Eric,
</pre>
<blockquote type="cite">
<pre wrap="">Here's a nagios plugin for checking cced for those that run nagios:
#!/usr/bin/perl -w -I/usr/sausalito/perl
use strict;
use lib qw( /usr/sausalito/perl );
use CCE;
my $cce = new CCE;
$cce->connectuds();
my ($oid) = $cce->find("System");
$cce->bye('SUCCESS');
if ($oid == '1') {
print "OK";
exit(0);
} else {
print "OID == $oid";
exit(2);
}
</pre>
</blockquote>
<pre wrap="">
Thank you. Was thinking about something like that as well.
Yeah, that'll do fine to detect if CCEd is stopped or crashed.
But if CCEd is hanging and this checker is run via a cronjob, it'll just
hang on the conectuds() call as well.
I was thinking about a two staged approach. One that checks for 'cced'
or 'pperld' processes that sit around in Zombie state (D state). If none
are present, it runs a Perl script similar to yours to check if CCEd
actually responds expectedly to queries within a reasonable time.
Another approach I discussed with Greg will be used in the new GUI.
Instead of showing the red text on white background that CCEd is down
(when it is down), we initiate a restart of it instead, refresh the page
and try again. Only if all fails it'll show the usual "Doh!" page.
</pre>
</blockquote>
<br>
</body>
</html>