[Nagiosplug-devel] check_icmp 1.0 needs a recall -- fails completely after an hour

Joe Rhett jrhett at meer.net
Wed Feb 2 19:49:08 CET 2005


Okay, here's the deal.  check_icmp apears to work fine for about an hour,
and then suddenly it stops working entirely.  When it starts to fail, it
continues to fail even from the command line. Everything starts returning
nan.  Here's some verbose++,++,++   You'll note that it does send the
packets and it does receive the replies.  It just doesn't know it ;-(

ardbeg3:[1014] ./check_icmp -v -v -v -H auth.mv
setsockopt failed
Setting alarm timeout to 10 seconds
packets: 5, targets: 1
target_interval: 0.000, pkt_interval 80.000
crit.rta: 500.000
max_completion_time: 3400.000
icmp_pkt_size = 64
icmp_pkt_size = 64
crit = {500000, 80%}, warn = {200000, 40%}
pkt_interval: 80000  target_interval: 0  retry_interval: 0
icmp_pkt_size: 64  timeout: 10
received 16384 bytes from 209.157.152.19
handle_random_icmp(0x804dd54, 0xbfbff9d0)
recvfrom_wto() timed out during a 80000 usecs wait
received 16384 bytes from 209.157.152.19
handle_random_icmp(0x804dd54, 0xbfbff9d0)
recvfrom_wto() timed out during a 40000 usecs wait
recvfrom_wto() timed out during a 40000 usecs wait
received 16384 bytes from 209.157.152.19
handle_random_icmp(0x804dd54, 0xbfbff9d0)
recvfrom_wto() timed out during a 26666 usecs wait
recvfrom_wto() timed out during a 26666 usecs wait
recvfrom_wto() timed out during a 26666 usecs wait
received 16384 bytes from 209.157.152.19
handle_random_icmp(0x804dd54, 0xbfbff9d0)
received 16384 bytes from 205.217.152.11
handle_random_icmp(0x804dd54, 0xbfbff9d0)
recvfrom_wto() timed out during a 20000 usecs wait
recvfrom_wto() timed out during a 20000 usecs wait
recvfrom_wto() timed out during a 20000 usecs wait
received 16384 bytes from 209.157.152.19
handle_random_icmp(0x804dd54, 0xbfbff9d0)
recvfrom_wto() timed out during a 16000 usecs wait
recvfrom_wto() timed out during a 16000 usecs wait
recvfrom_wto() timed out during a 16000 usecs wait
time_passed: 509813  final_wait: 2890187  max_completion_time: 3400000
Waiting for 2890187 micro-seconds (2890.187 msecs)
received 2049 bytes from 209.157.140.206
handle_random_icmp(0x804dd54, 0xbfbff9d0)
recvfrom_wto() timed out during a 578037 usecs wait
received 2049 bytes from 209.157.140.206
handle_random_icmp(0x804dd54, 0xbfbff9d0)
recvfrom_wto() timed out during a 578037 usecs wait
received 2049 bytes from 209.157.140.206
handle_random_icmp(0x804dd54, 0xbfbff9d0)
recvfrom_wto() timed out during a 578037 usecs wait
finish(0) called
icmp_sent: 5  icmp_recv: 0  icmp_lost: 0
targets: 1  targets_alive: 1
CRITICAL -
auth.mv: rta nan, lost 100%|
rta=0.000ms;200.000;500.000;0; pl=100%;40;80;;
targets: 1, targets_alive: 0

ardbeg3:[1015] ping auth.mv
PING auth.mv.meer.net (209.157.152.19): 56 data bytes
64 bytes from 209.157.152.19: icmp_seq=0 ttl=60 time=1.255 ms
^C
--- auth.mv.meer.net ping statistics ---
2 packets transmitted, 1 packets received, 50% packet loss
round-trip min/avg/max/stddev = 1.255/1.255/1.255/0.000 ms

On Wed, Feb 02, 2005 at 12:39:06AM -0800, Joe Rhett wrote:
> Ugh, sorry I spoke way too soon.  Soon after I implemented it, everything
> started coming back with NaN results :-(
> 
> On Tue, Feb 01, 2005 at 09:08:02PM -0800, Joe Rhett wrote:
> > Good news.  This one compiles and runs on FreeBSD ;-)
> > 
> > On Sun, Jan 30, 2005 at 12:28:24AM +0100, Andreas Ericsson wrote:
> > > Finished. :) check_icmp-1.0 is now available, and check_rta is 
> > > officially dropped.
> > > 
> > > I've started using the new check_icmp in production in our own system to 
> > > get a proper evaluation of it. So far so good.
> > > 
> > > It compiles silently and cleanly with all possible warning-flags enabled 
> > > on Linux (Owl 1.1 and current, Debian 3.0, Fedora Core 3, RedHat 7.2, 
> > > SuSE 9.0), FreeBSD 5.2.1, NetBSD 1.6, Tru64 Unix 5.1b and HP-UX (11i v2 
> > > and 11i 11.11).
> > > 
> > > All the above had a decently fresh version of gcc installed, except 
> > > Linux Owl, which was gcc 2.95.3.
> > > 
> > > Under Tru64 Unix 4.0g it compiles with warnings for %llu printf format 
> > > strings (for certain debug strings).
> > > 
> > > According to reports it also compiles nicely under solaris 2.9.
> > > 
> > > I only have root access to linux systems right now, so I haven't been 
> > > able to run very many tests, but timings are completely consistent with 
> > > system ping and all the features and logic works as expected.
> > > 
> > > It's available at;
> > > http://oss.op5.se/nagios/check_icmp-1.0.tar.gz
> > > http://devel.op5.se/oss/check_icmp-1.0.tar.gz
> > > 
> > > I'll also upload the source file to the SF tracker page.
> > > 
> > > Cheers
> > > 
> > > 
> > > Andreas Ericsson wrote:
> > > >Ahoy all.
> > > >
> > > >To all those who have tested and commented on check_rta, thanks.
> > > >
> > > >I've fixed the SIGSEGV crash (I think, at least I can't reproduce it on 
> > > >my own system any more), and some timing issues causing check_rta to 
> > > >time out prematurely with an obscene packet loss even when rta is below 
> > > >1ms.
> > > >
> > > >Inspired by the check_traceroute (or check_route, I don't remember) perl 
> > > >plugin, I've also added an -l flag to specify time-to-live on outgoing 
> > > >packets. It should work where the perl plugin does.
> > > >
> > > >The latest version is available from the usual places;
> > > >https://devel.op5.se/oss
> > > >http://oss.op5.se/nagios
> > > >
> > > >For those of you who have offered me access to Solaris and *BSD boxes, 
> > > >I've attached my public SSH-key. The following commands should work to 
> > > >allow me in without a password.
> > > >useradd -m exon
> > > >mkdir ~exon/.ssh
> > > >cp id_dsa.pub ~exon/.ssh/authorized_keys
> > > >chown exon:exon -R ~exon/.ssh
> > > >chmod 700 -R ~exon/.ssh
> > > >
> > > >Send me a private email with the hostname/ip-address when you're done. 
> > > >I'd like to spend this day testing everything out so it can be in the 
> > > >1.4-release of the plugins and we can finally get rid of check_ping and 
> > > >the other output parsing ping plugins.
> > > >
> > > >
> > > 
> > > -- 
> > > Andreas Ericsson                   andreas.ericsson at op5.se
> > > OP5 AB                             www.op5.se
> > > Lead Developer
> > > 
> > > 
> > > -------------------------------------------------------
> > > This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
> > > Tool for open source databases. Create drag-&-drop reports. Save time
> > > by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
> > > Download a FREE copy at http://www.intelliview.com/go/osdn_nl
> > > _______________________________________________________
> > > Nagios Plugin Development Mailing List 
> > > Nagiosplug-devel at lists.sourceforge.net
> > > Unsubscribe at https://lists.sourceforge.net/lists/listinfo/nagiosplug-devel
> > > ::: Please include plugins version (-v) and OS when reporting any issue. 
> > > ::: Messages without supporting info will risk being sent to /dev/null
> > 
> > -- 
> > Joe Rhett
> > Senior Geek
> > Meer.net
> 
> -- 
> Joe Rhett
> Senior Geek
> Meer.net
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
> Tool for open source databases. Create drag-&-drop reports. Save time
> by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
> Download a FREE copy at http://www.intelliview.com/go/osdn_nl
> _______________________________________________________
> Nagios Plugin Development Mailing List Nagiosplug-devel at lists.sourceforge.net
> Unsubscribe at https://lists.sourceforge.net/lists/listinfo/nagiosplug-devel
> ::: Please include plugins version (-v) and OS when reporting any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null

-- 
Joe Rhett
Senior Geek
Meer.net




More information about the Devel mailing list