[Nagiosplug-devel] Working on testcases

Ton Voon ton.voon at altinity.com
Mon Nov 14 02:33:19 CET 2005


On 13 Nov 2005, at 19:57, sean finney wrote:

> just to throw another $0.02 into the bucket...

Please! I value all your opinions!


> On Fri, Nov 11, 2005 at 12:51:48AM +0000, Ton Voon wrote:
>> "UNKNOWN is for invalid command args or any other failure before the
>> requested check can be performed - with the only exception being
>> hostname lookups which should return CRITICAL."
>
> given the example you listed below, i don't think this is a good idea.
> rather, i think something like:
>
> "UNKNOWN is for invalid command args or other failures preventing
> the plugin from performing the specified operation."

Yes, I think this is better. I'm keep thinking about "things in the  
middle" that affect results. I think my facetious examples with  
check_http muddied the waters a bit.

Just to make it clearer, how about amending to:

"UNKNOWN is for invalid command args, or failures in other systems  
preventing the plugin from performing the specified operation"

So, other systems that could prevent the check include: internal  
errors (unix level: malloc, fork, etc), networks, DNS and agents.


> about dns: i think there are two specific and very different kinds
> of failure.  there is "general resolution failure", and there is a
> "host does not exist failure".  i would say that the former ought
> to remain as an UNKNOWN, as it parallels similar failures in other
> system calls such as malloc.  however, if the plugin gets a "no such
> host" response, then it definitely should be CRITICAL--as you could
> implicitly divine that the hostname is supposed to resolve.   
> similarly,
> i feel that remote service check connection failures should remain
> CRITICAL.

Given the above definition, both failures should be UNKNOWN. I'm with  
Andreas on this. But there's Sean and Ethan on CRITICAL. So the  
voting currently stands at 2-2.

If we go with CRITICAL, then this needs to be marked as an exception  
in the guidelines.

Andreas also says:

> check_nt, check_nwstat, check_nrpe and check_snmp should return  
> UNKNOWN if it can't get the data it's requesting (as it's not the  
> status of the agent that's being requested)

which also fits with this definition.


>> (2) check_http -H webserver -w 2
>>
>> This returns OK if can connect to webserver and returns data within 2
>> seconds. If it cannot connect, then this returns UNKNOWN because it
>> is not the metric that is being requested to check against (currently
>> returns CRITICAL).
>
> i'd say it should still return CRITICAL.
>

Yes, I'm clearly wrong. By the definition, it is not a failure "in  
[an]other system", so UNKNOWN is the wrong state, so it must be  
CRITICAL.


>> (3) check_http -H webserver -r 'string_to_find'
>>
>> This returns OK if it can find the server and return data with the
>> string. If it cannot connect to the server (currently CRITICAL), or
>> gets a 302 redirection (currently OK (?) ), this should be an  
>> UNKNOWN.
>
> again, i think things such as "connection refused" should still result
> in states indicative of a problem.

Connection failure is critical. By the requested arguments, the  
string_to_find is not found, so OK is wrong. But it is not a failure  
in another system, so UNKNOWN is wrong too. So it must be CRITICAL  
(or, I guess, configurable to WARNING).


>   the big difference in my
> view is that some problems prevent the plugin from doing its job,
> while other problems signify that there really is a problem.

I think that difference is a lot clearer in your mind than it is in  
mine :) I think the "other systems" helps me, which Sean's  
intuitively picked up on.


> wrt the 302 redirections, i haven't even looked at what we're  
> currently
> doing but feel we ought to follow the redirection (or provide
> a cmdline toggle) if we want to be good user-agents :)

The default is not to follow. "-f follow" in the case above will work.


Ton

http://www.altinity.com
T: +44 (0)870 787 9243
F: +44 (0)845 280 1725
Skype: tonvoon


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://nagios-plugins.org/archive/devel/attachments/20051114/7fc481f1/attachment.html>


More information about the Devel mailing list