[Nagiosplug-devel] Re: SNMP + Nag Was: Kickoff for 1.5
Stanley Hopcroft
Stanley.Hopcroft at IPAustralia.Gov.AU
Thu Mar 10 03:30:01 CET 2005
Dear Folks,
Not much here but,
On Wed, Mar 09, 2005 at 09:21:23PM -0500, Subhendu Ghosh wrote:
> On Wed, 9 Mar 2005, Harper Mann wrote:
>
> >Hi Everyone,
> >
> >There are several items in an SNMP plugin discussion we're interested in
> >and
> >are working on. What I can remember off the top of my head is:
> >
> >1) How to manage and alarm on counter data like interface traffic, etc. We
> >use check_rrd, which was mentioned earlier in this thread, and perhaps
> >that's sufficient since we customarily store and graph, but standardizing
> >this would be good. We're not sure RRDTool will scale to sufficient size
> >installations.
> >
If the devices support RMON (and most do), then the alarm group
transforms the problem into one of trap harvesting (ie define alarm
thresholds on trunks in the switch/router) and have it send traps when
the threshold is exceeded. Only con is static, non adapatives,
thresholds.
See below if you want to allow for diurnal/seasonal variation.
> >2) We've had a request to collect 3-4 SNMP values (in, out, errors) from
> >more than 10,000 interfaces every 15 minutes so we're looking into how to
> >scale to such a large installation. Aside from how to get plugins to keep
> >up with collecting, what's the best way to store so much performance data?
> >
> >3) Fix the performance data so it conforms to the project standards and
> >manages OIDs and Symbolic names well for multiple requests.
> >
>
> Separate out the functionality - Nagios is primarily a fault management
> tool. For 10k interface performance choose a performance
> management(monitor) tool.
>
Absolutely. I think the nomenclature is
1 a poller/collector - interrogates the thingys and saves the data
2 an analyser/presenter - summarise the saved data and report by various
means.
These are best implemented as separate processes so they can perform
without tradeoffs.
Non blocking IO with Net::SNMP out performs forking an Net::SNMP::get.
Storing data in RRDs has the advantages that
1 Lots of third party applications know and love RRDs (orca, cricket)
2 The Holt-Winters time series prediction algorithm can let the analyser
distinguish a daily surge from an anomaly/problem
NB Toby the RRD man haa got funding from a client to bring the dev
branch RRD - with the HW stuff - into supported production form.
3 the RRDs are self maintaining. Except in exceptional cases there is no
need to unload and resize databases when the db fills up (it never does)
4 the storage of an RRD never exceeds what is allocated when the RRD is
created.
> I've been partial to Cricket to snmp data collection - the snmp
engine is
> pretty well designed so that each device is only contacted once and all
> the different oids are requested together. (cricket.sf.net)
> I've seen it scale quite well so long as you can stagger the the hosts
> groups (ie. not everything runs at the same 15 min interval) and you can
> use snmp v2 and get-bulk
>
> For alarms - either check_rrd or snmptraps from Cricket (and possibly
> 2Cacti in the near future).
Sounds good to me if you can't get RMON (or don't/can't configure your
devices - although that were the case, you prob couldn't poll them).
>
> By forcing Nagios to do traffic measurements from snmp - the scalability
> is not present based on the plugin architecture. You need something else
> to do the active monitor and check the results.
Here Here. Let Nag present part of the conclusions - its neat to have
the plugin output return a hyperlink to an RRDtool or other CGI that
allows the Nag viewer to display the RRDtool graphs.
> For small installs that
> don't want multiple tools, it would work, but large installs like yours
> should definitely use separate tools.
Amen brother.
>
> I used to monitor about the same number of interfaces with mrtg arounf
> '98-'00. disk i/o was the biggest issue. (ram disk to the rescue).
>
> RRDtool scales as well as the underlying hardware (disk i/o) and file
> layout.
The bottle neck is more likely to be in the poller than RRDtool in my
view (that's why there are fpings and so on).
>
> --
> -sg
>
Does this adequately sum up what's been presented that's relevant to Nag
SNMP plugins ?
1 the plugins should probably confine themselves to checking state
rather than collecting/storing performance data (leaving this to a
standalone poller that may or may not interact with Nag directly)
2 traffic thresholds are best dealt with by
2.1 standalone poller + analyser submitting passive service check
results to Nag (possibly via traps to a trap collector), or
2.2 device specific means (RMON)
3 The probs of dealing with large numbers of communitys remain although
it seems to me that the -C option should go a long way to help (maybe in
conjunction with a heap of included files defining different arguments
for commands.
4 Plugins that save/store state probably don't scale and should thereby
be excluded from developer focus
5 It may be worth recognising that SNMP pollers/managers are a good
supplement to Nag; the poller is getting close to peak development and
therefore effort is only needed in exploiting synergy rather than
seeking to do it again with plugins.
Yours sincerely.
--
Stanley Hopcroft
IP Australia
Ph: (02) 6283 3189 Fax: (02) 6281 1353
PO Box 200 Woden ACT 2606
http://www.ipaustralia.gov.au
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: disclaimer.txt
URL: <http://nagios-plugins.org/archive/devel/attachments/20050310/26d2ca69/attachment.txt>
More information about the Devel
mailing list