[Nagiosplug-devel] Working on testcases
Ton Voon
ton.voon at altinity.com
Mon Nov 7 01:53:38 CET 2005
Hi!
This is an interesting and important thread and I seem to have got
some strong opinions, so we should continue with this until we get a
result.
Just going to summarise where we are:
PROBLEM
While working on testcases, have noticed that "name resolution
failure" now returns UNKNOWN instead of CRITICAL. What exactly should
UNKNOWN mean?
VIEWS
John Rouillard suggested command line option for user to choose
return code, but Ton Voon thinks this would overcomplicate. John
retracted suggestion.
Garrett Honeycutt suggested configure time option for return code,
but Andreas Ericsson thinks this is bad because compiled binaries
should behave identically across platforms. I think the "configurable
return code" suggestion can be dropped.
John suggests separating "host not found" and "cannot resolve"
exceptions, so the former is a CRITICAL and the latter is an UNKNOWN,
which is an interesting idea but I'm not sure what the philosophy of
this is.
Andreas suggests a new status code in Nagios: "Transport/network
error", and then UNKNOWN will mean "user error". With no network
error state supported, Andreas suggests using UNKNOWN.
John's analysis is that there are two functions of a plugin:
1) communication with device/service
2) analysis of device/service and assigning appropriate status
[and perf data]
MY TAKE
Trying to tie these views together, I think "transport/network"
errors goes into (1). John's suggestion about "host not found" and
"cannot resolve" go into (1) as well, but then this suggests there is
no difference in state.
My feeling is that (2) depends on (1), so if (1) is not possible -
for ANY reason - then I think that should be a CRITICAL (with
appropriate message text). I think Nagios helps with the "transport/
network" error with things like "flapping" and "soft states" (I think
Nagios works well because it doesn't try and come up with lots of
different plugin states and just keeps it simple).
I think Garrett summed it up best for me: "I would rather get false
positives than miss something because the status was UNKNOWN as
opposed to CRITICAL"
NEXT STEPS
I think we need to bat this around a bit more to get consensus. If it
gets to the stage where we need a vote, I'm happy to cast one out to
the community.
Ton
http://www.altinity.com
T: +44 (0)870 787 9243
F: +44 (0)845 280 1725
Skype: tonvoon
More information about the Devel
mailing list