[Nagiosplug-devel] Patch for persistent data on plugins

Alain Williams addw at phcomp.co.uk
Thu Sep 17 13:04:49 CEST 2009


On Thu, Sep 17, 2009 at 02:05:52AM -0400, Thomas Guyot-Sionnest wrote:

> My take on this...:
> 
> 1. Tests should use libtap,

I didn't see that, it is not referenced in an appropriate README.
I'll look at it.

> and lets remove those goto's before someone
> else see them ;)

There are no gotos in state_save.c
There are a few in the test program and in check_procs.c -- they are used
how a goto should be used -- in an error state and to increase clarity; one target
in a function does not confuse and avoids excessive nested {}.
Summary: without goto the code becomes more complicated/difficult-to-read.

> 2. Doc should go in the docbook, not a plain text file (if there are too
> technical portions those could be be code comments

I cannot see anything called docbook in the plugins tar file ?

> 3. A plugin shouldn't share state file for multiple different
> invocations. Plugins may run in parallel so this is asking for trouble
> (one invocation overwriting another plugin's data). That also call for a
> much simpler API and plugin code.

That is why the functions have the 'instance' argument, from my documentation:

  name      This should be the name of the program
  instance  If the program may be run several times (eg to monitor different
            disks) this should contain a way of distinguishing the different
            instances. If this is not needed it should be NULL.
  Neither name not instance should contain spaces or slashes ('/').

An instance may be 'sda', 'sdb', ... for something that is monitoring disks.

I have added a bit more to the documentation of np_state_save_start().

> 4. Should we write a temp file first and move it over instead of
> truncating it? IMHO the former is much better is the file is not opened
> and saver at once...

That could be done. I don't think that it worth the extra complexity.
There will only be an error if something really nasty happens (eg out of
disk space). If state is lost then a warning may be lost, but if a disk
has gone full - worse things are afoot.
If a state file is corrupt then the plugin will probably rewrite it with
fresh/clean state information.

I won't do anything without further discussion.

It might make sense if the nagios plugin state files were removed on
nagios start (but not reload) anyway: discuss.


> 5. time.h is already included in common.h

Removed from check_procs.c

> 6. Binary strings should go with string length - they may contain nulls.

The C standard is NUL terminated strings. I wanted to make it easy for C programmers
to use. If they have other (non usual) structures/... then they can use
np_state_save_binary_record(). Did you look at the test program ?
This shows the different ways of doing things - there are different ways
depending on how complicated the data is that needs to be saved;
ie if it is simple use: np_state_save_serial() - all done in one
function call.

> 7. Why do np_state_add_vcomment() needs first + comment_array? Why a
> standalone array won't work??

Look at the documentation:
        /* Set the plugin command line arguments as a comment
         * in the state save file.
         */
        if(np_state_add_vcomment("Args:", argv))
                die(STATE_UNKNOWN, _("Can't add state store comments as: %s"), np_state_perror());

ie you need the first one so that you can describe what is on the rest of the line.

I have added a check to allow the first argument of np_state_add_vcomment()
to be NULL - in which case it is ignored and just the array added.

> 8. I don't think the descripting/comments should go in there. They could
> be added manually or trough a serialization library, but in most cases
> the pluginj's verbose mode should be enough to debug.

That is if you understand what is generating the file. Remember that people who
are new to this have problems understanding how it all hangs together.
I found the comments useful when debugging check_procs.
Typically the comments will be just: plugin name & command line args; date in human
understandable format; a DO NOT EDIT line. The over head is 100-200 bytes.
(See later discussion)

> 9. On some unix unlink() may damage file system if the path is a
> directory and the plugin is run as root. Any sysadmin ending up in that
> situation should be shot in the head, but it's an easy check...

I use unlink() in np_state_destroy_save() to remove the state file. This is
not a directory. I have modified the internal function generate_filename()
to check & return an error if: name is empty or: name or instance contain
a space or '/'. That will prevent trying to unlink a directory.

Changes in my private version that I will repost when appropriate.

> 10. DIR_DEFAULT should come from configure with an option to change it.
> The default should likely be PREFIX/var/rw or something like that
> (nagios temp path... you get the idea)

I'll do that.

> 12. I'd suggest having all files prefixed with "np_" instead of using
> some sort of heuristic to determine whenever we should add
> "nagios-plugin_" (bonus point: making prefix a configure parameter).

That is only if DIR_DEFAULT is used, ie it is in /tmp.
Making it a configure option (your point 10) fixes this (ie it is
the same thing).

One point that you did NOT make that I thought you might is my use of
the environment variable NAGIOS_PLUGIN_STATE_DIRECTORY. I can't see
a simple way of avoiding this without adding complexity to the plugins
(which is something that I wanted to avoid). It would be nice if there
was some nagios support for this -- ie a nagios config parameter that
makes it set this env var, rather than having to set this in the
environment before calling nagios.

> 13. I like the ascii header, but I'd use a more compact version. This
> ensure state files are as small as possible, preventing waste (very
> useful when a tmpfs is used for storage and helping usage of tail files
> on fs that support it).

Hmmm, it is a trade off between compactness and ease of understanding by the
nagios newbie system admin. A quick look at those generated by the test
program shows that the header length is less than 180 chars.
For those that have not seen it we are talking about a header like:

	I Nagios Plugin State File
	# This is generated by the plugin - DO NOT HAND EDIT
	# Plugin id: test instance: array 
	V 1
	# File written: Fri Sep 11 18:33:50 2009
	T 4aaa89fe
	B 1
	N 11

Removing the comments takes that down to 52 bytes (from 179).
I could do something like testing for env var
NAGIOS_PLUGIN_STATE_NO_COMMENT and suppress the comments,
but is that worth it ?

''Nagios Plugin State File'' is 24 chars & could be shortened,
I made is explicit so that if a sysadmin found the file and asked
the question ''what is this file ?'' that it would be an easy answer.

> 14. There's no additional tests in check_procs.t

Hmmmm: what is the purpose of the extra checks ?
I could check parsing of command line args. Checking of saved state
is really difficult (what is happening on the system that the test is
being run on ?).
The main thing to test is the save/restore of test data - this is what
the plugin_save_test.c does.

-- 
Alain Williams
Linux/GNU Consultant - Mail systems, Web sites, Networking, Programmer, IT Lecturer.
+44 (0) 787 668 0256  http://www.phcomp.co.uk/
Parliament Hill Computers Ltd. Registration Information: http://www.phcomp.co.uk/contact.php
Past chairman of UKUUG: http://www.ukuug.org/
#include <std_disclaimer.h>




More information about the Devel mailing list