[Nagiosplug-devel] Please can I have GIT access
Jose Luis Martinez
jlmartinez-lists-nagplug-devel at capside.com
Tue Sep 8 12:26:06 CEST 2009
> Also, one more thing to consider with local storage is when you have
> multiple nagios instances - either part of distributed monitoring,
> migrations of just for testing. In many case data accuracy rely on the
> fact the the same data is relative to the last check. If you have two
> nagios instances doing the same NRPE check, the scheduling may cause one
> check to get a very small interval of data. For example, with cpu usage
> check, that mean instead of getting the last 5 minutes, you may get only
> the last few seconds. You can easily miss a CPU hog because at the
> moment the check is executing the CPU was idle for the last few seconds,
> even if it was full the rest of the time (because the test/backup/old
> Nagios instancegot the rest of the interval data).
This is a problem that I've found with plugins that use their own
storage for the checks. I've never had problems with multiple Nagios
instances on one machine (I don't do that), but I have had it with the
same check defined multiple times. Going along with the CPU example:
check_cpu --cpu 1
check_cpu --cpu 2
If the developer hasn't forseen that the plugin will be executed with
different parameters, the readings for cpu 1 and 2 can get mixed.
Another case:
check_cpu --cpu 1 --display system,iowait
check_cpu --cpu 1 --display idle,irq
Maybe check_cpu is a bad example, but think about a plugin that can
output LOTS of performance data (hundreds of data channels), and you
want a couple of subsets output in separate checks.
The solution Nagios::Plugin::Differences applies is to let the developer
choose an alternative temp file, but a couple of problems arise:
- he has to be aware of the problem
- even knowing about the problem, he can leave out a condition to
select an alternative temp file.
This has made me change the Nagios::Plugin::Differences API to add a
user specified "id" to the temp file generation bit. This adds a string
to the temp file name so you can choose from what temp file to read and
write to.
/tmp/_nagios_plugin_${script_basename}_${id}.tmp
One method I'm using is to MD5 all the params that the plugin recieves.
That creates a "unique" string for the id part of the temp file (I'm
aware of the collisions that can ocurre problem... but have no elegant
solution for now).
> The fact that this method is nearly transparent make it even easier to
> fall into this pitfall and pretty hard to figure out the problem without
> knowing how the plugin actually work.
You're right. Leaving things to the developer can lead to these hard to
diagnose problems.
> By comparison, when using performance data strings the stored data is
> bound to a single Nagios service on a single Nagios instance. The same
> check can run many times yet the plugins will *always* get it's last
> performance data string.
The problem, in my opinion, is that a plugin has no idea about what
service check definition it is bound to, so it can't determine reliably
the state of it's last execution.
> I don't care how it's implemented in the end, but I'm favor any method
> that can allow this kind of granularity without having to specifically
> think about it.
If Nagios / NRPE could just pass a unique ID for each service check
definition, the plugins could use that by default to generate the
tempfile name for their local storage. The unique ID could be a GUID, so
that different Nagios instances would not generate the same IDs, thus
solving the "multiple Nagios instances" problem too...
Just my 2 cents,
Jose Luis Martinez
jlmartinez at capside.com
More information about the Devel
mailing list