[Nagiosplug-devel] RFC: Performance data guidelines

kjell.sundtjonn at elkem.no kjell.sundtjonn at elkem.no
Wed Jul 16 10:55:21 CEST 2003


Ton

A few comments

>1) I don't like the min and max values. I think that information is held
>with the UOM (% is 0-100, seconds is 0-infinity). If there is no UOM, then
>assume any value.
My reason to include the max min values is to bring along as much
information in the performance data string as possible. Max an min is
relevant information to get correct scaling on graphs drawn by tools as
RRD. RRD accept this as an optional parameter in the data source definition
and I think that it should be included when available (that is when the
plugin logic can deduct this in a sensible way).
If the UOM is % you can assume min =0, max=100, but if you monitor free
your diskspace in GB, information about the total diskspace available is
valuable information the plugin can provide.

>2) what about check_disk -w 5% -w 10000? If there is no min/max, then it
>could be: 'label=value[UOM][:critical:warning[:critical:warning]]'

What about changing the general layout to
'label=value1[UOM];value2[UOM];...[:[critical1;critical2;...]
[:[warning1;warning2;...][:[max1;max2;...][:min1;max2;...]]]]'

This can handle the situation you describe, all information is carried over
to the tool you choose to use to parse the performance data in a structured
format easily parseable.
We should allow any character (except '=') in the label. Changing this to
correct syntax for RRD datasource (or other tool of your choice) name
should be left to the program you select to parse the data.

>3) what about "critical at 10%, but no warning levels"? Can just use a
null,
>I guess.
>4) check_procs allows you to say -c 5:5 to mean alert if not exactly 5
>processes. Is this doable at all? If so, would we need to change the
>separators?
Let us allow for embedding parameters in quotes
No_processes=5:'5:5'::10:0

I must say that I am sceptical to the proposal by Peter Hoogendijk to use
scanf and print format specifiers as parameters to the plugins. Let us try
to develop a common recommendation on performance data that enables all
relevant information to be forwarded from the plugin to the tool you select
to parse the data. It is important that this is a common definition that
simplifies the program needed for parsing the data.

Kjell Sundtjønn



|---------+-------------------------------------------->
|         |           "Voon, Ton" <Ton.Voon at egg.com>   |
|         |           Sent by:                         |
|         |           nagiosplug-devel-admin at lists.sour|
|         |           ceforge.net                      |
|         |                                            |
|         |                                            |
|         |           15.07.2003 15:32                 |
|         |                                            |
|---------+-------------------------------------------->
  >----------------------------------------------------------------------------------------------|
  |                                                                                              |
  |       To:       "'kjell.sundtjonn at elkem.no'" <kjell.sundtjonn at elkem.no>, NagiosPlug Devel    |
  |        <nagiosplug-devel at lists.sourceforge.net>                                              |
  |       cc:                                                                                    |
  |       Subject:  RE: [Nagiosplug-devel] RFC: Performance data guidelines                      |
  >----------------------------------------------------------------------------------------------|




Kjell,

Firstly, just want to say thank you for your contribution. This is a
fascinating thread. I much rather have this discussion now than it raised
as
design problems afterwards!

Yeah, I thought afterwards that check_disk has to be different as a
summation does not really tell you anything useful. My preference is that
the the output reflects the filesystem, not the device, but we can use a
switch for that.

I think the : sepearated fields instead of crit,warn,critp,warnp is better
too - the new check_disk allows different thresholds per disk, so this fits
in well. However, some questions pop up:

1) I don't like the min and max values. I think that information is held
with the UOM (% is 0-100, seconds is 0-infinity). If there is no UOM, then
assume any value.
2) what about check_disk -w 5% -w 10000? If there is no min/max, then it
could be: 'label=value[UOM][:critical:warning[:critical:warning]]'
3) what about "critical at 10%, but no warning levels"? Can just use a
null,
I guess.
4) check_procs allows you to say -c 5:5 to mean alert if not exactly 5
processes. Is this doable at all? If so, would we need to change the
separators?

Ton

> -----Original Message-----
> From: kjell.sundtjonn at elkem.no [mailto:kjell.sundtjonn at elkem.no]
> Sent: Saturday, July 12, 2003 5:41 PM
> To: NagiosPlug Devel
> Subject: RE: [Nagiosplug-devel] RFC: Performance data guidelines
>
>
>
> I really like the idea of including the critical and warning
> level together
> with max and min values in the performance data, but let me propose an
> alternative layout  based on colon (:) separated fields :
>
> - output of format 'label=value[UOM]:[critical]:[warning]:[max]:[min]'
> comma separated
> - labels 1-19 characters long in class [a-zA-Z0-9_] (spaces
> allowed, but
> not recommended)
> - values, critical, warning, max, min in class [-0-9.]. No spaces.
> - critical and warning is the thresholds for this measurement
> - max and min is the maximum/minimum value for the measurement
>
> It think this is easier to parse than the proposal from Ton based on
> 'magical' words.
>
> Example
>
> Disk space
> DISK OK [22118452 kB (84%) free on /dev/hda3] [81574 kB (85%) free on
> /dev/hda2] [252600 kB (100%) free on
> /dev/shm]|_dev_hda3=84%:10:25:100:0,
> _dev_hda2=85%:10:25:100:0,_dev_shm=100%:10:25:100:0
>
> For disk space and other plugins where the UOM is defined
> when the plugin
> is called, use the active OUM as the value for the
> performance data. Notice
> how the / is replaced with _ to ensure a valid RRD datasource
> name. It is
> necessary to show the performance data for each disk in a
> disk set, not
> only for the total as Ton proposes.
>
> PING
>
> PING OK - Packet loss = 0%, RTA =
> 1.00ms|packet_loss=0%:20:10:100:0,RTA=1ms:20:30::0
>
> The empty max value for RTA is understood as undefined.
>
>
>
> Kjell Sundtjønn
>
>
>
> |---------+-------------------------------------------->
> |         |           "Voon, Ton" <Ton.Voon at egg.com>   |
> |         |           Sent by:                         |
> |         |           nagiosplug-devel-admin at lists.sour|
> |         |           ceforge.net                      |
> |         |                                            |
> |         |                                            |
> |         |           11.07.2003 16:10                 |
> |         |                                            |
> |---------+-------------------------------------------->
>
> >-------------------------------------------------------------
> ---------------------------------|
>   |
>                                    |
>   |       To:       NagiosPlug Devel
> <nagiosplug-devel at lists.sourceforge.net>                    |
>   |       cc:
>                                    |
>   |       Subject:  RE: [Nagiosplug-devel] RFC: Performance
> data guidelines                      |
>
> >-------------------------------------------------------------
> ---------------------------------|
>
>
>
>
> I'm starting to side with Kjell's and Karl's idea of labels
> being separate
> from the units. I think that was the flaw in my original
> proposal - if we
> can standarise on the units, then RRD generation should be
> fairly easy and
> then you can keep labels descriptive and whatever you think
> is suitable for
> a particular plugin.
>
> So my amended proposal is:
>
> - output of format 'label=value[UOM]' comma separated
> - labels 1-19 characters long in class [a-zA-Z0-9_] (should spaces be
> allowed?)
> - special labels of warn, warnp, crit and critp (or just warn
> and crit with
> different units?). These pass the threshold levels specified
> on the command
> line. My idea on this is that you can then use RRD to draw
> yellow/red lines
> to show where the warning levels are.
> - values in class [-0-9.]. No spaces. Karl has a worry about returned
> values
> from SNMP OIDs, but I think values should always be a number,
> so it can be
> parsed to remove extraneous characters
> - units one of:
>
> no unit specified - assume a number (int or float) of things (users,
> processes, load averages)
> s - seconds (also, us, ms)
> % - percentage
> b - bytes (also kb, Mb, Tb)
> c - a continuous counter (such as bytes transmitted on an
> interface) (Does
> this interfere with a standard unit?)
>
> So some examples:
>
> check_ping:
> PING OK - Packet loss = 0%, RTA = 1.00
> ms|packet_loss=0%,rta=1ms,warnp=10%,critp=20%
>
> check_disk:
> DISK OK [1150211 kB (57%) free on
> /dev/dsk/c0t0d0s0]|free_percent=57%,free=1150Mb,warn=100Mb,warnp=10%
> I still think that you do not need the total, used and
> used_percent because
> these are calculatable from free and free_percent. I would
> also use free
> rather than used because the lowest limit is 0 and the output
> shows free. I
> think if you specify a set of disks, then data is returned
> for the total of
> the disks.
>
> check_swap:
> CRITICAL - Swap used: 18% (778368 out of
> 4194272)|free_percent=82%,free=778Mb,warnp=5%
>
> check_load:
> OK - load average: 0.03, 0.04, 0.05|load1=0.03,warn=1,crit=2
> I think we should only return performance data for 1 set of timings,
> otherwise it gets very complicated (on a side issue, it is
> possible to have
> a plugin return % values instead of load levels?)
>
> check_procs:
> OK - 5 processes running with command name
> /usr/local/apache/bin/httpd|processes=5,warn=10
> Hmmm, this goes against my check_disk example of using 0 as a
> lower bound
> as
> check_procs can only be reported "upwards"
>
> check_users:
> USERS OK - 2 users currently logged in|users=2,warn=10,crit=20
>
> Are we getting closer?
>
> Ton
>
>
> This private and confidential e-mail has been sent to you by Egg.
> The Egg group of companies includes Egg Banking plc
> (registered no. 2999842), Egg Financial Products Ltd (registered
> no. 3319027) and Egg Investments Ltd (registered no. 3403963) which
> carries out investment business on behalf of Egg and is regulated
> by the Financial Services Authority.
> Registered in England and Wales. Registered offices: 1
> Waterhouse Square,
> 138-142 Holborn, London EC1N 2NA.
> If you are not the intended recipient of this e-mail and have
> received it in error, please notify the sender by replying with
> 'received in error' as the subject and then delete it from your
> mailbox.
>
>
>
> -------------------------------------------------------
> This SF.Net email sponsored by: Parasoft
> Error proof Web apps, automate testing & more.
> Download & eval WebKing and get a free book.
> www.parasoft.com/bulletproofapps1
> _______________________________________________
> Nagiosplug-devel mailing list
> Nagiosplug-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagiosplug-devel
> ::: Please include plugins version (-v) and OS when reporting
> any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>
>
>
>
>
>
>
> -------------------------------------------------------
> This SF.Net email sponsored by: Parasoft
> Error proof Web apps, automate testing & more.
> Download & eval WebKing and get a free book.
> www.parasoft.com/bulletproofapps1
> _______________________________________________
> Nagiosplug-devel mailing list
> Nagiosplug-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagiosplug-devel
> ::: Please include plugins version (-v) and OS when reporting
> any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>


-------------------------------------------------------
This SF.Net email sponsored by: Parasoft
Error proof Web apps, automate testing & more.
Download & eval WebKing and get a free book.
www.parasoft.com/bulletproofapps1
_______________________________________________
Nagiosplug-devel mailing list
Nagiosplug-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagiosplug-devel
::: Please include plugins version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null









More information about the Devel mailing list