[Nagiosplug-devel] RFC: Performance data guidelines
Karl DeBisschop
karl at debisschop.net
Tue Aug 5 16:50:04 CEST 2003
Sorry to top-post.
Can we tie this thread up? I am near finished with markup for
translation and would like to start doing perf data before a mid-august
alpha of 1.4.
I'm findinf myself thinking in terms of
label1=value1[UOM];[critical1];[warning1];[max1];[min1] \
label2=value2[UOM];[critical2];[warning2];[max2];[min2]
I'm thinking no spaces except between indiviual attributes - we' need to
transfor any to '-'. And ';' and '=' would be reserved as well.
I agree with Ton that max and min will often be unneeded, but I don't
see that the possibility hurts.
I think the standard units are OK, exceot 'B' for bytes and 'b; for bits
would be more standard.
But I'm really unhappy with any of the proposals. I'd just like to
settle on something.
--
Karl
On Wed, 2003-07-16 at 13:52, kjell.sundtjonn at elkem.no wrote:
> Ton
>
> A few comments
>
> >1) I don't like the min and max values. I think that information is held
> >with the UOM (% is 0-100, seconds is 0-infinity). If there is no UOM, then
> >assume any value.
> My reason to include the max min values is to bring along as much
> information in the performance data string as possible. Max an min is
> relevant information to get correct scaling on graphs drawn by tools as
> RRD. RRD accept this as an optional parameter in the data source definition
> and I think that it should be included when available (that is when the
> plugin logic can deduct this in a sensible way).
> If the UOM is % you can assume min =0, max=100, but if you monitor free
> your diskspace in GB, information about the total diskspace available is
> valuable information the plugin can provide.
>
> >2) what about check_disk -w 5% -w 10000? If there is no min/max, then it
> >could be: 'label=value[UOM][:critical:warning[:critical:warning]]'
>
> What about changing the general layout to
> 'label=value1[UOM];value2[UOM];...[:[critical1;critical2;...]
> [:[warning1;warning2;...][:[max1;max2;...][:min1;max2;...]]]]'
>
> This can handle the situation you describe, all information is carried over
> to the tool you choose to use to parse the performance data in a structured
> format easily parseable.
> We should allow any character (except '=') in the label. Changing this to
> correct syntax for RRD datasource (or other tool of your choice) name
> should be left to the program you select to parse the data.
>
> >3) what about "critical at 10%, but no warning levels"? Can just use a
> null,
> >I guess.
> >4) check_procs allows you to say -c 5:5 to mean alert if not exactly 5
> >processes. Is this doable at all? If so, would we need to change the
> >separators?
> Let us allow for embedding parameters in quotes
> No_processes=5:'5:5'::10:0
>
> I must say that I am sceptical to the proposal by Peter Hoogendijk to use
> scanf and print format specifiers as parameters to the plugins. Let us try
> to develop a common recommendation on performance data that enables all
> relevant information to be forwarded from the plugin to the tool you select
> to parse the data. It is important that this is a common definition that
> simplifies the program needed for parsing the data.
>
> Kjell Sundtjønn
>
>
>
> |---------+-------------------------------------------->
> | | "Voon, Ton" <Ton.Voon at egg.com> |
> | | Sent by: |
> | | nagiosplug-devel-admin at lists.sour|
> | | ceforge.net |
> | | |
> | | |
> | | 15.07.2003 15:32 |
> | | |
> |---------+-------------------------------------------->
> >----------------------------------------------------------------------------------------------|
> | |
> | To: "'kjell.sundtjonn at elkem.no'" <kjell.sundtjonn at elkem.no>, NagiosPlug Devel |
> | <nagiosplug-devel at lists.sourceforge.net> |
> | cc: |
> | Subject: RE: [Nagiosplug-devel] RFC: Performance data guidelines |
> >----------------------------------------------------------------------------------------------|
>
>
>
>
> Kjell,
>
> Firstly, just want to say thank you for your contribution. This is a
> fascinating thread. I much rather have this discussion now than it raised
> as
> design problems afterwards!
>
> Yeah, I thought afterwards that check_disk has to be different as a
> summation does not really tell you anything useful. My preference is that
> the the output reflects the filesystem, not the device, but we can use a
> switch for that.
>
> I think the : sepearated fields instead of crit,warn,critp,warnp is better
> too - the new check_disk allows different thresholds per disk, so this fits
> in well. However, some questions pop up:
>
> 1) I don't like the min and max values. I think that information is held
> with the UOM (% is 0-100, seconds is 0-infinity). If there is no UOM, then
> assume any value.
> 2) what about check_disk -w 5% -w 10000? If there is no min/max, then it
> could be: 'label=value[UOM][:critical:warning[:critical:warning]]'
> 3) what about "critical at 10%, but no warning levels"? Can just use a
> null,
> I guess.
> 4) check_procs allows you to say -c 5:5 to mean alert if not exactly 5
> processes. Is this doable at all? If so, would we need to change the
> separators?
>
> Ton
>
> > -----Original Message-----
> > From: kjell.sundtjonn at elkem.no [mailto:kjell.sundtjonn at elkem.no]
> > Sent: Saturday, July 12, 2003 5:41 PM
> > To: NagiosPlug Devel
> > Subject: RE: [Nagiosplug-devel] RFC: Performance data guidelines
> >
> >
> >
> > I really like the idea of including the critical and warning
> > level together
> > with max and min values in the performance data, but let me propose an
> > alternative layout based on colon (:) separated fields :
> >
> > - output of format 'label=value[UOM]:[critical]:[warning]:[max]:[min]'
> > comma separated
> > - labels 1-19 characters long in class [a-zA-Z0-9_] (spaces
> > allowed, but
> > not recommended)
> > - values, critical, warning, max, min in class [-0-9.]. No spaces.
> > - critical and warning is the thresholds for this measurement
> > - max and min is the maximum/minimum value for the measurement
> >
> > It think this is easier to parse than the proposal from Ton based on
> > 'magical' words.
> >
> > Example
> >
> > Disk space
> > DISK OK [22118452 kB (84%) free on /dev/hda3] [81574 kB (85%) free on
> > /dev/hda2] [252600 kB (100%) free on
> > /dev/shm]|_dev_hda3=84%:10:25:100:0,
> > _dev_hda2=85%:10:25:100:0,_dev_shm=100%:10:25:100:0
> >
> > For disk space and other plugins where the UOM is defined
> > when the plugin
> > is called, use the active OUM as the value for the
> > performance data. Notice
> > how the / is replaced with _ to ensure a valid RRD datasource
> > name. It is
> > necessary to show the performance data for each disk in a
> > disk set, not
> > only for the total as Ton proposes.
> >
> > PING
> >
> > PING OK - Packet loss = 0%, RTA =
> > 1.00ms|packet_loss=0%:20:10:100:0,RTA=1ms:20:30::0
> >
> > The empty max value for RTA is understood as undefined.
> >
> >
> >
> > Kjell Sundtjønn
> >
> >
> >
> > |---------+-------------------------------------------->
> > | | "Voon, Ton" <Ton.Voon at egg.com> |
> > | | Sent by: |
> > | | nagiosplug-devel-admin at lists.sour|
> > | | ceforge.net |
> > | | |
> > | | |
> > | | 11.07.2003 16:10 |
> > | | |
> > |---------+-------------------------------------------->
> >
> > >-------------------------------------------------------------
> > ---------------------------------|
> > |
> > |
> > | To: NagiosPlug Devel
> > <nagiosplug-devel at lists.sourceforge.net> |
> > | cc:
> > |
> > | Subject: RE: [Nagiosplug-devel] RFC: Performance
> > data guidelines |
> >
> > >-------------------------------------------------------------
> > ---------------------------------|
> >
> >
> >
> >
> > I'm starting to side with Kjell's and Karl's idea of labels
> > being separate
> > from the units. I think that was the flaw in my original
> > proposal - if we
> > can standarise on the units, then RRD generation should be
> > fairly easy and
> > then you can keep labels descriptive and whatever you think
> > is suitable for
> > a particular plugin.
> >
> > So my amended proposal is:
> >
> > - output of format 'label=value[UOM]' comma separated
> > - labels 1-19 characters long in class [a-zA-Z0-9_] (should spaces be
> > allowed?)
> > - special labels of warn, warnp, crit and critp (or just warn
> > and crit with
> > different units?). These pass the threshold levels specified
> > on the command
> > line. My idea on this is that you can then use RRD to draw
> > yellow/red lines
> > to show where the warning levels are.
> > - values in class [-0-9.]. No spaces. Karl has a worry about returned
> > values
> > from SNMP OIDs, but I think values should always be a number,
> > so it can be
> > parsed to remove extraneous characters
> > - units one of:
> >
> > no unit specified - assume a number (int or float) of things (users,
> > processes, load averages)
> > s - seconds (also, us, ms)
> > % - percentage
> > b - bytes (also kb, Mb, Tb)
> > c - a continuous counter (such as bytes transmitted on an
> > interface) (Does
> > this interfere with a standard unit?)
> >
> > So some examples:
> >
> > check_ping:
> > PING OK - Packet loss = 0%, RTA = 1.00
> > ms|packet_loss=0%,rta=1ms,warnp=10%,critp=20%
> >
> > check_disk:
> > DISK OK [1150211 kB (57%) free on
> > /dev/dsk/c0t0d0s0]|free_percent=57%,free=1150Mb,warn=100Mb,warnp=10%
> > I still think that you do not need the total, used and
> > used_percent because
> > these are calculatable from free and free_percent. I would
> > also use free
> > rather than used because the lowest limit is 0 and the output
> > shows free. I
> > think if you specify a set of disks, then data is returned
> > for the total of
> > the disks.
> >
> > check_swap:
> > CRITICAL - Swap used: 18% (778368 out of
> > 4194272)|free_percent=82%,free=778Mb,warnp=5%
> >
> > check_load:
> > OK - load average: 0.03, 0.04, 0.05|load1=0.03,warn=1,crit=2
> > I think we should only return performance data for 1 set of timings,
> > otherwise it gets very complicated (on a side issue, it is
> > possible to have
> > a plugin return % values instead of load levels?)
> >
> > check_procs:
> > OK - 5 processes running with command name
> > /usr/local/apache/bin/httpd|processes=5,warn=10
> > Hmmm, this goes against my check_disk example of using 0 as a
> > lower bound
> > as
> > check_procs can only be reported "upwards"
> >
> > check_users:
> > USERS OK - 2 users currently logged in|users=2,warn=10,crit=20
> >
> > Are we getting closer?
> >
> > Ton
> >
> >
> > This private and confidential e-mail has been sent to you by Egg.
> > The Egg group of companies includes Egg Banking plc
> > (registered no. 2999842), Egg Financial Products Ltd (registered
> > no. 3319027) and Egg Investments Ltd (registered no. 3403963) which
> > carries out investment business on behalf of Egg and is regulated
> > by the Financial Services Authority.
> > Registered in England and Wales. Registered offices: 1
> > Waterhouse Square,
> > 138-142 Holborn, London EC1N 2NA.
> > If you are not the intended recipient of this e-mail and have
> > received it in error, please notify the sender by replying with
> > 'received in error' as the subject and then delete it from your
> > mailbox.
> >
> >
> >
> > -------------------------------------------------------
> > This SF.Net email sponsored by: Parasoft
> > Error proof Web apps, automate testing & more.
> > Download & eval WebKing and get a free book.
> > www.parasoft.com/bulletproofapps1
> > _______________________________________________
> > Nagiosplug-devel mailing list
> > Nagiosplug-devel at lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/nagiosplug-devel
> > ::: Please include plugins version (-v) and OS when reporting
> > any issue.
> > ::: Messages without supporting info will risk being sent to /dev/null
> >
> >
> >
> >
> >
> >
> >
> > -------------------------------------------------------
> > This SF.Net email sponsored by: Parasoft
> > Error proof Web apps, automate testing & more.
> > Download & eval WebKing and get a free book.
> > www.parasoft.com/bulletproofapps1
> > _______________________________________________
> > Nagiosplug-devel mailing list
> > Nagiosplug-devel at lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/nagiosplug-devel
> > ::: Please include plugins version (-v) and OS when reporting
> > any issue.
> > ::: Messages without supporting info will risk being sent to /dev/null
> >
>
>
> -------------------------------------------------------
> This SF.Net email sponsored by: Parasoft
> Error proof Web apps, automate testing & more.
> Download & eval WebKing and get a free book.
> www.parasoft.com/bulletproofapps1
> _______________________________________________
> Nagiosplug-devel mailing list
> Nagiosplug-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagiosplug-devel
> ::: Please include plugins version (-v) and OS when reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>
>
>
>
>
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: VM Ware
> With VMware you can run multiple operating systems on a single machine.
> WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines at the
> same time. Free trial click here: http://www.vmware.com/wl/offer/345/0
> _______________________________________________
> Nagiosplug-devel mailing list
> Nagiosplug-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagiosplug-devel
> ::: Please include plugins version (-v) and OS when reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
More information about the Devel
mailing list