[Nagiosplug-devel] Working on testcases
Andreas Ericsson
ae at op5.se
Sun Nov 13 12:34:29 CET 2005
sean finney wrote:
> hi,
>
> just to throw another $0.02 into the bucket...
>
> On Fri, Nov 11, 2005 at 12:51:48AM +0000, Ton Voon wrote:
>
>>"UNKNOWN is for invalid command args or any other failure before the
>>requested check can be performed - with the only exception being
>>hostname lookups which should return CRITICAL."
>
>
> given the example you listed below, i don't think this is a good idea.
> rather, i think something like:
>
> "UNKNOWN is for invalid command args or other failures preventing
> the plugin from performing the specified operation."
>
> about dns: i think there are two specific and very different kinds
> of failure. there is "general resolution failure", and there is a
> "host does not exist failure". i would say that the former ought
> to remain as an UNKNOWN, as it parallels similar failures in other
> system calls such as malloc. however, if the plugin gets a "no such
> host" response, then it definitely should be CRITICAL
I'd say UNKNOWN for both. The most common cases of HOST_NOT_FOUND are
probably configuration file typos (shouldn't escalate to critical) and
DNS misconfiguration (should return critical for the check_{dig,dns}
plugins but no others).
>--as you could
> implicitly divine that the hostname is supposed to resolve. similarly,
> i feel that remote service check connection failures should remain
> CRITICAL.
>
Addendum to that last sentence;
... unless the remote service check connection is only established to
query an agent returning a specific metric.
That is, check_nt, check_nwstat, check_nrpe and check_snmp should return
UNKNOWN if it can't get the data it's requesting (as it's not the status
of the agent that's being requested). All others should return CRITICAL
which is the logical thing to do when a plugin is being asked to connect
to an ftp server and see how long it takes, as such;
if (connection_failed)
response_time = FOREVER;
if (response_time > threshold)
return CRITICAL;
>
>
>
>>(2) check_http -H webserver -w 2
>>
>>This returns OK if can connect to webserver and returns data within 2
>>seconds. If it cannot connect, then this returns UNKNOWN because it
>>is not the metric that is being requested to check against (currently
>>returns CRITICAL).
>
>
> i'd say it should still return CRITICAL.
>
It has to. It's being asked to check the functionality of a webserver
that doesn't respond. It's a bit like asking someone dead if they're
feeling alright and only chalk it up as CRITICAL if they say "no".
>
>>(3) check_http -H webserver -r 'string_to_find'
>>
>>This returns OK if it can find the server and return data with the
>>string. If it cannot connect to the server (currently CRITICAL), or
>>gets a 302 redirection (currently OK (?) ), this should be an UNKNOWN.
>
>
> again, i think things such as "connection refused" should still result
> in states indicative of a problem. the big difference in my
> view is that some problems prevent the plugin from doing its job,
> while other problems signify that there really is a problem.
>
> wrt the 302 redirections, i haven't even looked at what we're currently
> doing but feel we ought to follow the redirection (or provide
> a cmdline toggle) if we want to be good user-agents :)
>
302 redirections must be followed or dropped, according to the RFC. Note
that 302 is not an error code though, so if we don't follow we should
return OK.
> for example, malloc or name resolution failing means that the plugin
> could not tell you the service status regardless of what it was,
> whereas a "host does not exist" or "connection refused" mean that
> something is in fact wrong (and that other people would probably
> be having the same problem).
>
This logic holds with the addendum above (users don't query agents, only
network admins and network monitoring tools do that).
--
Andreas Ericsson andreas.ericsson at op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
More information about the Devel
mailing list