State Retention Routines
Ton Voon, June 16, 2010
The aim is to create a set of library routines that can be used for saving state information between invocations of a plugin. This way, it is possible to calculate the rate of change and provide threshold calculations on this, rather than just the current state.
This is based on a patch submitted by Alain Williams, Nagios::Monitoring::Plugin::Differences by Jose Luis Martinez and comments on the mailing list (see references).
Lots of discussion between Holger and I ended up with this.
Terms
- Location
- Use
./configure
--sharedstatedir
to define, default$PREFIX/var
. Override withNAGIOS_PLUGIN_STATE_DIRECTORY
envvar at runtime if set. Add plugin name to end. - Key
- Is used as the filename of the store. Default to
state.dat
. Recommend that this is set to the string returned bynp_state_generate_key()
, to be unique per plugin call. Key can only consist of alphanumerics and underscore.
Format
Example format:
# NP state file
1 [file format version]
{data version}
{time}
{data}
Structs
np_state_key
char *name
char *plugin_name
int data_version
char *_filename
np_state_data
time_t time
void *data
int length (of binary data)
Calls
np_state_generate_key(argv)
Returns a string to use as a key_name
, based on an MD5 hash of argv
, thus
hopefully a unique key per service/plugin invocation. Use the
Extra-Opts parse of argv
, so that uniqueness in parameters are
reflected there.
np_state_init(plugin_name, key_name, data_version)
Sets variables. Generates filename. Returns np_state_key
. Die with
UNKNOWN
if exception.
np_state_read(np_state_key)
Returns np_state_data
. Will return NULL
if no data is available (first
run). If key currently exists, read data. If state file format version is
not expected, return as if no data. Get state data version number and compare
to expected. If numerically lower, then return as no previous state. Die
with UNKNOWN
if exceptional error.
np_state_write_string(np_state_key, time, string)
If time==NULL
, use current time. Create state file, with state file format
version, default text. Write version, time, and data. Avoid locking problems
- use mv
to write and then swap. Possible loss of state data if two things
writing to same key at same time.
np_state_write_binary(np_state_key, time, start, length)
Same as np_state_write_string()
, but writes binary data.
np_state_data_cleanup(np_state_data)
Cleanup.
np_state_key_cleanup(np_state_key)
Cleanup.
Notes
- All opens and close within these functions, retaining atomicity.
- Libtap tests for library.
- Update Development Guidelines with library usage.
- This has problems if a remote host is checked from different Nagios instances.
- Binary data may not restore on a program compiled with different options from the program that saved it; e.g., 32 or 64 bit.
- Binary data may include a structure containing a pointer. Pointer values
may not be used in the reading program - i.e., you need to overwrite the
value with something
malloc()
ed in the current run of the program. - State files could be left lying around. We recommend you run a regular job to remove unmodified state files older than 1 week.
References
- http://www.nagios-plugins.org/archive/devel/2009-September/007767.html
for the initial patch. - http://www.nagios-plugins.org/archive/devel/2009-September/thread.html#7773
for a conversation about the patch. - http://www.nagios-plugins.org/archive/devel/2009-September/007749.html
for Nagios::Monitoring::Plugin::Differences.