[Nagiosplug-devel] RFC: Nagios 3 and Embedded Perl Plugins
Andreas Ericsson
ae at op5.se
Tue Jan 9 14:38:55 CET 2007
Thomas Guyot-Sionnest wrote:
> On 08/01/07 07:02 AM, Andreas Ericsson wrote:
>> Stéphane Urbanovski wrote:
>>> Andreas Ericsson a écrit :
>>>
>>>> But you just said to load this newfangled dream-version of nrpe as a
>>>> module? That sort of microsoft'ish thinking leads to "integrated" and
>>>> very unstable code I'm afraid.
>>> (Ok, my english is really poor ...)
>>>
>>> Not the "newnrpe", wich is a separate process, but only the communication with newnrpe part
>>>
>> Ah, I see what you mean now. I'm afraid that fairly drastically reduces
>> the scalability of Nagios. Assume for a second that you have 1500 hosts
>> to monitor, all of which use NRPE for checking local stuff. Keeping up
>> the connection with those 1500 hosts requires 1500 open file-descriptors
>> at all times. Most systems can have a lot more files than that open per
>> process at any given time, but there is still a hard limit lurking
>> somewhere which means Nagios can no longer check an arbitrary number of
>> hosts and services. The worst part is that that hard limit will be set
>> differently on different systems.
>>
>> I'm afraid you'll find that this just isn't useful enough to warrant the
>> massive developer effort it would take to write it and seeing as you're
>> the only one arguing your case, you'd have to write it yourself to get
>> it implemented. Either way, further discussion is fairly pointless until
>> you have some code available.
>
> Actually I think now it's getting interesting. If done properly, this
> could be a nice way of doing distributed active checking.
>
> Using the same system Stéphane described Nagios could have open
> connections to remote execution hosts that runs the checks and read back
> results. Different services properties would determine if the service
> can be run directly on the host (if Nagios has an open connection to it)
> or if it has to be remote. Check execution load could be run on
> dedicated servers, or even be spread out across monitored hosts.
>
Yes, but a distributed static mesh redundancy thing is pretty different
from an NRPE-daemon with an option to keep connections alive. A nice
example of where "think big" doesn't work, but "think bigger" does.
I'm working on a module that does just that, but it requires a fullblown
Nagios installation on each of the poller nodes and the decision of
which host is monitored by what system is determined by hostgroups
instead of through some automagic solution that could possibly (and
would probably) get things wrong from time to time.
> On big setups this had the clear advantage of scalability, but on
> smaller setups it can also be interesting as one could use very cheap
> servers for running the Nagios daemon in HA, and provide redundancy by
> spreading the checks across monitored servers themselves.
>
Yup. That's the plan. Especially the scalability bit. The idea is to get
an infinite number of layers of pollers/masters, as each poller can in
turn have pollers connected to it. Now I'm just hoping we'll release it
publicly soon so I can get the nagios community to test and patch it for
me while I lounge and drink beer ;-)
--
Andreas Ericsson andreas.ericsson at op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
More information about the Devel
mailing list