[Nagiosplug-devel] RFC: Performance data guidelines
Voon, Ton
Ton.Voon at egg.com
Tue Jul 15 06:31:03 CEST 2003
Kjell,
Firstly, just want to say thank you for your contribution. This is a
fascinating thread. I much rather have this discussion now than it raised as
design problems afterwards!
Yeah, I thought afterwards that check_disk has to be different as a
summation does not really tell you anything useful. My preference is that
the the output reflects the filesystem, not the device, but we can use a
switch for that.
I think the : sepearated fields instead of crit,warn,critp,warnp is better
too - the new check_disk allows different thresholds per disk, so this fits
in well. However, some questions pop up:
1) I don't like the min and max values. I think that information is held
with the UOM (% is 0-100, seconds is 0-infinity). If there is no UOM, then
assume any value.
2) what about check_disk -w 5% -w 10000? If there is no min/max, then it
could be: 'label=value[UOM][:critical:warning[:critical:warning]]'
3) what about "critical at 10%, but no warning levels"? Can just use a null,
I guess.
4) check_procs allows you to say -c 5:5 to mean alert if not exactly 5
processes. Is this doable at all? If so, would we need to change the
separators?
Ton
> -----Original Message-----
> From: kjell.sundtjonn at elkem.no [mailto:kjell.sundtjonn at elkem.no]
> Sent: Saturday, July 12, 2003 5:41 PM
> To: NagiosPlug Devel
> Subject: RE: [Nagiosplug-devel] RFC: Performance data guidelines
>
>
>
> I really like the idea of including the critical and warning
> level together
> with max and min values in the performance data, but let me propose an
> alternative layout based on colon (:) separated fields :
>
> - output of format 'label=value[UOM]:[critical]:[warning]:[max]:[min]'
> comma separated
> - labels 1-19 characters long in class [a-zA-Z0-9_] (spaces
> allowed, but
> not recommended)
> - values, critical, warning, max, min in class [-0-9.]. No spaces.
> - critical and warning is the thresholds for this measurement
> - max and min is the maximum/minimum value for the measurement
>
> It think this is easier to parse than the proposal from Ton based on
> 'magical' words.
>
> Example
>
> Disk space
> DISK OK [22118452 kB (84%) free on /dev/hda3] [81574 kB (85%) free on
> /dev/hda2] [252600 kB (100%) free on
> /dev/shm]|_dev_hda3=84%:10:25:100:0,
> _dev_hda2=85%:10:25:100:0,_dev_shm=100%:10:25:100:0
>
> For disk space and other plugins where the UOM is defined
> when the plugin
> is called, use the active OUM as the value for the
> performance data. Notice
> how the / is replaced with _ to ensure a valid RRD datasource
> name. It is
> necessary to show the performance data for each disk in a
> disk set, not
> only for the total as Ton proposes.
>
> PING
>
> PING OK - Packet loss = 0%, RTA =
> 1.00ms|packet_loss=0%:20:10:100:0,RTA=1ms:20:30::0
>
> The empty max value for RTA is understood as undefined.
>
>
>
> Kjell Sundtjønn
>
>
>
> |---------+-------------------------------------------->
> | | "Voon, Ton" <Ton.Voon at egg.com> |
> | | Sent by: |
> | | nagiosplug-devel-admin at lists.sour|
> | | ceforge.net |
> | | |
> | | |
> | | 11.07.2003 16:10 |
> | | |
> |---------+-------------------------------------------->
>
> >-------------------------------------------------------------
> ---------------------------------|
> |
> |
> | To: NagiosPlug Devel
> <nagiosplug-devel at lists.sourceforge.net> |
> | cc:
> |
> | Subject: RE: [Nagiosplug-devel] RFC: Performance
> data guidelines |
>
> >-------------------------------------------------------------
> ---------------------------------|
>
>
>
>
> I'm starting to side with Kjell's and Karl's idea of labels
> being separate
> from the units. I think that was the flaw in my original
> proposal - if we
> can standarise on the units, then RRD generation should be
> fairly easy and
> then you can keep labels descriptive and whatever you think
> is suitable for
> a particular plugin.
>
> So my amended proposal is:
>
> - output of format 'label=value[UOM]' comma separated
> - labels 1-19 characters long in class [a-zA-Z0-9_] (should spaces be
> allowed?)
> - special labels of warn, warnp, crit and critp (or just warn
> and crit with
> different units?). These pass the threshold levels specified
> on the command
> line. My idea on this is that you can then use RRD to draw
> yellow/red lines
> to show where the warning levels are.
> - values in class [-0-9.]. No spaces. Karl has a worry about returned
> values
> from SNMP OIDs, but I think values should always be a number,
> so it can be
> parsed to remove extraneous characters
> - units one of:
>
> no unit specified - assume a number (int or float) of things (users,
> processes, load averages)
> s - seconds (also, us, ms)
> % - percentage
> b - bytes (also kb, Mb, Tb)
> c - a continuous counter (such as bytes transmitted on an
> interface) (Does
> this interfere with a standard unit?)
>
> So some examples:
>
> check_ping:
> PING OK - Packet loss = 0%, RTA = 1.00
> ms|packet_loss=0%,rta=1ms,warnp=10%,critp=20%
>
> check_disk:
> DISK OK [1150211 kB (57%) free on
> /dev/dsk/c0t0d0s0]|free_percent=57%,free=1150Mb,warn=100Mb,warnp=10%
> I still think that you do not need the total, used and
> used_percent because
> these are calculatable from free and free_percent. I would
> also use free
> rather than used because the lowest limit is 0 and the output
> shows free. I
> think if you specify a set of disks, then data is returned
> for the total of
> the disks.
>
> check_swap:
> CRITICAL - Swap used: 18% (778368 out of
> 4194272)|free_percent=82%,free=778Mb,warnp=5%
>
> check_load:
> OK - load average: 0.03, 0.04, 0.05|load1=0.03,warn=1,crit=2
> I think we should only return performance data for 1 set of timings,
> otherwise it gets very complicated (on a side issue, it is
> possible to have
> a plugin return % values instead of load levels?)
>
> check_procs:
> OK - 5 processes running with command name
> /usr/local/apache/bin/httpd|processes=5,warn=10
> Hmmm, this goes against my check_disk example of using 0 as a
> lower bound
> as
> check_procs can only be reported "upwards"
>
> check_users:
> USERS OK - 2 users currently logged in|users=2,warn=10,crit=20
>
> Are we getting closer?
>
> Ton
>
>
> This private and confidential e-mail has been sent to you by Egg.
> The Egg group of companies includes Egg Banking plc
> (registered no. 2999842), Egg Financial Products Ltd (registered
> no. 3319027) and Egg Investments Ltd (registered no. 3403963) which
> carries out investment business on behalf of Egg and is regulated
> by the Financial Services Authority.
> Registered in England and Wales. Registered offices: 1
> Waterhouse Square,
> 138-142 Holborn, London EC1N 2NA.
> If you are not the intended recipient of this e-mail and have
> received it in error, please notify the sender by replying with
> 'received in error' as the subject and then delete it from your
> mailbox.
>
>
>
> -------------------------------------------------------
> This SF.Net email sponsored by: Parasoft
> Error proof Web apps, automate testing & more.
> Download & eval WebKing and get a free book.
> www.parasoft.com/bulletproofapps1
> _______________________________________________
> Nagiosplug-devel mailing list
> Nagiosplug-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagiosplug-devel
> ::: Please include plugins version (-v) and OS when reporting
> any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>
>
>
>
>
>
>
> -------------------------------------------------------
> This SF.Net email sponsored by: Parasoft
> Error proof Web apps, automate testing & more.
> Download & eval WebKing and get a free book.
> www.parasoft.com/bulletproofapps1
> _______________________________________________
> Nagiosplug-devel mailing list
> Nagiosplug-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagiosplug-devel
> ::: Please include plugins version (-v) and OS when reporting
> any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>
More information about the Devel
mailing list