[Nagiosplug-devel] Suggested alterations to the Performance Protocol
Ton Voon
tonvoon at mac.com
Tue Sep 7 13:24:04 CEST 2004
On 7 Sep 2004, at 16:14, Ben Clewett wrote:
> Dear Group,
>
> I would like to discuss upgrading the performance protocol as defined
> in:
>
> http://nagiosplug.sourceforge.net/developer-guidelines.html#AEN185
>
> I have noticed a few people making suggestions and it might be time to
> get a new version published. Certainly to discuses whether it is time
> to do this.
>
> Just to start things off:
>
> 1.
>
> Patch a loop-hole in the document. The Warn and Crit values should be
> of the same UOM as the value. Second, all numbers, value, max, min,
> warn and crit, should be directly comparable.
>
> Although this is common sense, I have seen some plugins (eg,
> check_disk) which certainly used to use a different number type for
> the warn & crit as to the value. In this case, 'Disk Free' and 'Disk
> Used', which are not comparable.
Good point and a no-brainer. I'll update tonight (the sf.net web page
will not update immediately - it takes a few days for the proxy cache
to expire).
I think check_disk used to be broken in its reporting of perf data, but
I believe the current CVS version is correct.
>
> 2.
>
> Suggested by Yves Mettier: The addition of a special reserved
> variable, 'check_time' which records the time at which the plugin
> completed the check.
>
> I can't remember if units were suggested, but in line with Nagios, the
> time as seconds from 01-01-1970 00:00:00 UTC, or standard UNIX time,
> may make sense. If Yves is reading this, he may be able to comment
> further.
Firstly, why is this performance data?
Secondly, I think something like this should be done by Nagios, not the
plugins. Seems a bit of a waste to code in start/stop times in each
plugin when the core execution engine would hold all this information.
There needs to be a change to Nagios to pass this data through somehow,
but then this would work for every plugin.
(I think "time", which is really "elapsed time", is slightly different
as this will remove timings from things that are outside of the core
check, so for example check_dns gives the time for the dns lookup, but
removes plugin startup, variable parsing, host resolution checks, etc)
However, I like the idea of "special reserved variables" - I think it
is worthwhile to add a table with a list of common labels, such as
"time". Any comments?
> 3.
>
> The addition of macro's to define special numbers. Some mentioned are
> NULL to indicate no value or an invalid value. INF and -INF to
> indicate an infinite value. Possibly NAN to represent Not a Number,
> as with division by zero. Not often used, but do have a place.
This is already covered in
http://nagiosplug.sourceforge.net/developer-
guidelines.html#THRESHOLDFORMAT but is not specifically mentioned for
the perf data output. This should be clearer.
I like the idea of macros. I had proposed using some arcane characters
(such as ~ for negative infinity), but I think your macro idea is far
clearer. Any comments?
> 4.
>
> To allow any UOM unit. For instance, 'degc' for temperature, 'users'
> for a user count etc.
I think degc makes sense (is there a formal SI unit for degrees
centigrade?), but users doesn't - users is already covered in point 10a
at http://nagiosplug.sourceforge.net/developer-guidelines.html#AEN185.
For example, "active_users=10" would be sufficient without a UOM, but
"cabinet_temperature=20" could be in degrees centigrade or degrees
Fahrenheit.
The idea was that the label was free text to describe the thing being
measured, while the UOM gives the graphing program enough data on how
to graph (eg, RRD has a concept of graphing the difference between two
values for counters type data). Thus having an exhaustive list of UOM
units would make it extra coding. But there does seem to be confusion
as things like B (bytes) and s (seconds) are UOMs whereas it wouldn't
matter to the graphing program. Maybe we should be more like SI units?
>
> 5.
>
> There is no way of representing a date. There may be some plugins,
> eg, recording user information, which do want to record a date.
>
> I have suggested UNIX time above. However another suggestion is to
> use the popular SQL syntax: '%Y-%m-%q %d:%M:%S.ms', eg, '2004-09-07
> 16:10:15.123'. Or a component of 'date', 'data time', 'time.ms'. It
> works for SQL :)
I would prefer to use Unix time, only because of brevity. As long as it
gets translated later (and there are lots of common functions for it),
then the graphing would be okay.
Would Unix time with a .ms make sense for more granularity? This would
presumably need a UOM defined too.
>
> I hope this meets with a friendly reception, and I look forward to
> seeing where we can take this.
My personal schedules dictate the amount of time I can afford on this,
but I hope it is always a friendly reception... :)
Does anyone have any strong feelings on these suggestions?
Ton
More information about the Devel
mailing list