[Nagiosplug-devel] RFC: New threshold syntax
Matthias Eble
matthias.eble at mailing.kaufland-informationssysteme.com
Fri Mar 28 17:50:53 CET 2008
> Ton and Thomas agree that Perfdata should be in a fixed UOM and
> not the one specified in the thresholds (at least for now).
> - changing the threshold UOM will destroy old graphs
> - Defining a base unit should be up to the respective plugins and be
> as small as possible (sec,bytes,...)
> - Thus uom is optional even when no thresholds are defined (like
> --load1 to just graph load1)
When changing perfdata uom(-prefix) rrd will show up something like
1k MB. Taking bytes is the most precise offer we can pass to the
graphers. They can then define how to handle/display them.
> Ton could imagine some helper functions (cmdline, web pages, google
> calculator) to verify complex thresholds
That could also be part of the library so every plugin could have a
dryrun option to print which values would cause what. Based on the
defined thresholds, (for example x:y) one could test/print what rc the
values x,y,x+1,x-1,y+1,y-1 would cause.
> and Andreas likes to see a
> possibility to shorten --freespace warn=inf:300KB
> to --freespace w=inf:300KB
Me too.
> Andreas also thinks that taking the simplicity off the plugins/specs
> will take off one important advantage of nagios
yes
> and that
> Ton should be shot :D
noooo!
> Thomas dropped in to use getsubopt style arguments like --metric
> min=2,uom_prefix=Ki,uom=b,.. which makes it easier
> to keep backward compatibility when introducing new values.
I think getsubopt style is a real improvement in this discussion.
> Thus the options for defining a threshold are (ignoring uom for the
> moment):
> 1) --threshold-time=crit_range/warn_range
> 2) --threshold name=time,warn=range,crit=range
> 3) --threshold=time -w range -c range
> Thomas thinks about something like
> 4) --threshold name=cpu,type=warn,min=0,max=80,inside
I'd prefer to see some kind of range since it's shorter than min=,max=
> Nathan pointed out that it is more intuitive to specify only ok and
> warning ranges.
> Everything outside them is critical, which Ton thinks is "brilliant".
> ...
> Nathan added that ':' could be replaced by '..' and using '/' as a range
> seperator:
> --time=ok/0..3/seconds
> --freespace=ok/300..inf/KB,warn/100..300/KB
> --load=ok/0..2,0..1.5,0..1.2/
--freespace ok/300..inf/KB,warn/100..300/KB
or
--freespace ok=300..inf/KB,warn=100..300/KB
looks good to me but should we seperate prefix and uom?
>
> --End of summary
>
> So to me there are multiple open questions
>
> Key questions:
> - Must the threshold specification argument be valid without quoting?
To me: yes (for numeric values/ranges). Required quoting opens a brider
range of syntax though.
> - Is it necessary to allow multiple ranges per thresh warn=10:20,50:60?
The Performance data definition doesn't permit this up to now but I
could imagine some people would like to see this.
> - Should thresholds be defined ok/warn rather than warn/crit?
I like the approach but this means not only the syntax is changed.
People need to start thinking when converting.
> - Should plugins only print perfdata for explicitly selected metrics
> or should there be a base set?
I'd vote for a base set, to get some values (beside the alert ones) for
free. Having to look what all the plugins offer is exhausting.
I'd thus say printing as much as perfdata as possible would be best.
Also most rrd based perfdata tools will run in severe problems with
new/changing metric labels after creation.
> - Should there be an explicit range limit (10:inf over 10:)
10:inf or 10::inf looks cleaner to me.
> - Is it favorable to have multiple range styles like
> 1<x<10 *and* 1:10 *and* ... in parallel?
Not if you ask me.
> Further questions:
> - should perfdata inherit threshold's uom/prefix?
No. See above
> - replace range seperator ':' with '..'?
I guess that depends on some other factors, but I like Nathans suggestions
> - Which component is responsible for sanity checking of thresholds?
> - Should base8 UOM-prefixes be allowed?
No opinion for now.
> I'll post my thoughts later on.
I've some hints, too:
Since it looks like the default alerting mechanism will be "inside",
default range behaviour for plain numbers (X gets 0:X) should be
reversed, too. So X will result in X:inf instead of 0:X
Or should we drop those plain thresholds completely?
What about mixing uom-prefix in one range? Might this be needed in the
future?
One more thing which has been in my head for a couple of weeks, now is
that we need to strengthen percent support in our library. This could be
done by adding an optional(?) base value to get_status so that this can
calculate percentage.
At the moment, my favourite threshold/range definition is following:
--throughput ok=1..5/M,warn=1..300/M/B
Where ok takes the default UOM (here bit) and warn uses an own UOM
(byte). But this is also invalid with our perfdata specs.
with ranges like
[^]start..end
Matthias
More information about the Devel
mailing list