[Nagiosplug-devel] check_disk enhancements
Ton Voon
ton.voon at altinity.com
Fri Jul 14 01:53:12 CEST 2006
Hi!
I've spent a lot of time on check_disk and I think it is much better
now. However, there are a few things I wanted to get some opinions on.
Firstly, I've fixed a lot of major bugs in check_disk. Turns out that
absolute values were incorrectly measured. The docs say that ./
check_disk -w 100 -p / should check for 100MB free on /, but in fact
it was checking 100 blocks (whatever the filesystem blocksize was),
which is not the same thing at all.
Secondly, the values for space used were incorrectly calculated,
because of changing types and then losing accuracy from floats and
not doubles. I've copied the same techniques used in coreutils' df
command, so the results should be exactly the same as df would output.
Thirdly, the parsing of "best match" filesystems and excluding
filesystems was incorrect. These functions have been moved off into a
library function where it is now tested using libtap (Haven't heard
of it? I've only been harping on about how great it is for a year
now! http://jc.ngo.org.uk/trac-bin/trac.cgi/wiki/LibTap and http://
www.onlamp.com/pub/a/onlamp/2006/01/19/libtap.html). I've also added
in an "exact-match" option, due to public demand.
Lastly, we can now compare against multiple threshold values. Not
just the current freespace_units, freespace_percent and
usedinodes_percent, but also usedspace_units and usedspace_percent.
Others can be easily added. However, there are problems with how to
specify these thresholds (see below).
The t/check_disk.t tests have been updated as well, so some long
standing bugs have been fixed. The only test failures at the moment
are for range checking. Is this something that should be done
generally? For instance, should we raise errors re: ranges where
warning will never occur? Eg, warn if inside 0:10, critical if inside
0:15? Or eg, percent must be between 0 and 100? I tend to think that
it should be left to the user.
One regression that I have left in is the trimming of perf data. The
warn/crit/max/min values were not being generated correctly, and
there are no library routines for it yet (though there are in the
Nagios::Plugin module). I plan on putting that back in at some stage.
Anyone desperate for it?
The biggest problem that I've discovered is that the range
specification for -w and -c are inverted from the norm. This was
noticed when using the library range checking routines. check_disk -w
10% means alert if freespace is below 10%, but we normally mean to
alert if it is outside the range. So, for instance, check_procs -w
1:1 means alert if greater than 1 process.
I've got a hack for check_disk (forcing a @ at the beginning of the
range, which means to alert inside), but I was wondering if we should
introduce a new way of defining thresholds. I'm thinking something like:
--freespace="0:5;0:2" (warn if outside 0 to 5, crit if outside 0
to 2)
--usedspace_percent=";90:100" (no warn, crit if outside 90 to 100)
--usedinode="100:;200:" (warn if outside 100 to infinity, crit if
outside 200 to infinity)
This also matches with perfdata output.
Any opinions?
Ton
http://www.altinity.com
T: +44 (0)870 787 9243
F: +44 (0)845 280 1725
Skype: tonvoon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://nagios-plugins.org/archive/devel/attachments/20060714/41662c67/attachment.html>
More information about the Devel
mailing list