[Nagiosplug-devel] Working on testcases
Ton Voon
ton.voon at altinity.com
Thu Nov 10 16:51:55 CET 2005
On 9 Nov 2005, at 19:11, Ethan Galstad wrote:
> I'm a bit late into this thread, but here are some of my thoughts...
>
> At least one person should be getting notifications for UNKNOWN
> states, as they can be important. The UNKNOWN state doesn't really
> have a clear definition, but here's what I think it should be used to
> signify...
>
> 1. Invalid command line args passed to the plugin (e.g. the plugin
> doesn't know what to do).
>
> 2. Internal failures in the plugin itself which prevent it from
> performing a check (i.e. malloc() failures, unexpected system call
> failures, or anything else that needs to be done - but fails - before
> a check can be performed). As an example, the check_dhcp plugin
> returns an UNKNOWN state if it can't determine the local hardware
> address or bind to port 68.
>
> 3. Nagios will also assign an UNKNOWN state to any
> plugin/script/whatever that either doesn't exist on the filesystem or
> returns a code that is out-of-bounds in accordance with the plugin
> specs.
So the guidelines should be updated with:
"UNKNOWN is for invalid command args or any other failure before the
requested check can be performed - with the only exception being
hostname lookups which should return CRITICAL."
Some example changes based on the advice above:
(1) check_http -H webserver
This returns OK if it can connect to the webserver and returns data.
(2) check_http -H webserver -w 2
This returns OK if can connect to webserver and returns data within 2
seconds. If it cannot connect, then this returns UNKNOWN because it
is not the metric that is being requested to check against (currently
returns CRITICAL).
(3) check_http -H webserver -r 'string_to_find'
This returns OK if it can find the server and return data with the
string. If it cannot connect to the server (currently CRITICAL), or
gets a 302 redirection (currently OK (?) ), this should be an UNKNOWN.
(4) check_http -H webserver --pagesize=1000
Returns OK if it can find server and the web page size is >= 1000
bytes. If it cannot connect to server (currently CRITICAL) or get a
302 redirection (currently OK), this should return UNKNOWN.
(5) check_http -H webserver --pagesize=1000 -w 2
Returns OK if it can find server, the web page size is >= 1000 bytes
and time taken is <= 2. If it cannot connect to server (currently
CRITICAL) or get a 302 redirection (currently OK), this should return
UNKNOWN.
Is this right? I'm starting to think so. It is clear to me now what
state should be returned given what is actually being asked to check.
In fact, a side effect is it clearly defines what perf data should be
returned: (2) should return time taken, (4) should return page size,
(5) should return both, whereas (1) and (3) shouldn't return anything.
(There's an issue re: inconsistent arguments. I think probably it
should be something like:
check_http -H webserver --metric pagesize -w 1000 --metric time -w 2
But that's another story.)
Back to UNKNOWN - should we do it?
(However, it still doesn't make sense to treat hostname lookups
differently, but if that's the consensus, I'll go with it.)
Ton
(So much work, so little time ....)
http://www.altinity.com
T: +44 (0)870 787 9243
F: +44 (0)845 280 1725
Skype: tonvoon
More information about the Devel
mailing list