[Nagiosplug-devel] Re: check_icmp 1.0 needs a recall -- fails completely after an hour
Andreas Ericsson
ae at op5.se
Thu Feb 3 00:57:09 CET 2005
Joe Rhett wrote:
> Okay, here's the deal. check_icmp apears to work fine for about an hour,
> and then suddenly it stops working entirely. When it starts to fail, it
> continues to fail even from the command line. Everything starts returning
> nan. Here's some verbose++,++,++ You'll note that it does send the
> packets and it does receive the replies. It just doesn't know it ;-(
>
That's weird.
> ardbeg3:[1014] ./check_icmp -v -v -v -H auth.mv
> setsockopt failed
> Setting alarm timeout to 10 seconds
> packets: 5, targets: 1
> target_interval: 0.000, pkt_interval 80.000
> crit.rta: 500.000
> max_completion_time: 3400.000
> icmp_pkt_size = 64
> icmp_pkt_size = 64
> crit = {500000, 80%}, warn = {200000, 40%}
> pkt_interval: 80000 target_interval: 0 retry_interval: 0
> icmp_pkt_size: 64 timeout: 10
> received 16384 bytes from 209.157.152.19
164384 bytes is bit much. Considering icmp_pkt_size is 64, it shouldn't
ever receive more than 84 bytes in a proper ICMP_ECHOREPLY (IP-header +
icmp_pkt_size). This can probably be traced to the fact that new
instances of check_icmp doesn't retain the sourcequench status and again
sends packets too fast for the node at 209.157.152.19. I'll re-implement
the u_sleep function and see if it works better when packets aren't
sprayed quite so fast.
> handle_random_icmp(0x804dd54, 0xbfbff9d0)
> recvfrom_wto() timed out during a 80000 usecs wait
> received 16384 bytes from 209.157.152.19
> handle_random_icmp(0x804dd54, 0xbfbff9d0)
> recvfrom_wto() timed out during a 40000 usecs wait
> recvfrom_wto() timed out during a 40000 usecs wait
> received 16384 bytes from 209.157.152.19
> handle_random_icmp(0x804dd54, 0xbfbff9d0)
> recvfrom_wto() timed out during a 26666 usecs wait
> recvfrom_wto() timed out during a 26666 usecs wait
> recvfrom_wto() timed out during a 26666 usecs wait
> received 16384 bytes from 209.157.152.19
> handle_random_icmp(0x804dd54, 0xbfbff9d0)
> received 16384 bytes from 205.217.152.11
> handle_random_icmp(0x804dd54, 0xbfbff9d0)
> recvfrom_wto() timed out during a 20000 usecs wait
> recvfrom_wto() timed out during a 20000 usecs wait
> recvfrom_wto() timed out during a 20000 usecs wait
> received 16384 bytes from 209.157.152.19
> handle_random_icmp(0x804dd54, 0xbfbff9d0)
> recvfrom_wto() timed out during a 16000 usecs wait
> recvfrom_wto() timed out during a 16000 usecs wait
> recvfrom_wto() timed out during a 16000 usecs wait
> time_passed: 509813 final_wait: 2890187 max_completion_time: 3400000
> Waiting for 2890187 micro-seconds (2890.187 msecs)
> received 2049 bytes from 209.157.140.206
> handle_random_icmp(0x804dd54, 0xbfbff9d0)
> recvfrom_wto() timed out during a 578037 usecs wait
> received 2049 bytes from 209.157.140.206
> handle_random_icmp(0x804dd54, 0xbfbff9d0)
> recvfrom_wto() timed out during a 578037 usecs wait
> received 2049 bytes from 209.157.140.206
> handle_random_icmp(0x804dd54, 0xbfbff9d0)
> recvfrom_wto() timed out during a 578037 usecs wait
> finish(0) called
> icmp_sent: 5 icmp_recv: 0 icmp_lost: 0
> targets: 1 targets_alive: 1
> CRITICAL -
> auth.mv: rta nan, lost 100%|
> rta=0.000ms;200.000;500.000;0; pl=100%;40;80;;
> targets: 1, targets_alive: 0
>
> ardbeg3:[1015] ping auth.mv
> PING auth.mv.meer.net (209.157.152.19): 56 data bytes
> 64 bytes from 209.157.152.19: icmp_seq=0 ttl=60 time=1.255 ms
> ^C
> --- auth.mv.meer.net ping statistics ---
> 2 packets transmitted, 1 packets received, 50% packet loss
> round-trip min/avg/max/stddev = 1.255/1.255/1.255/0.000 ms
>
> On Wed, Feb 02, 2005 at 12:39:06AM -0800, Joe Rhett wrote:
>
>>Ugh, sorry I spoke way too soon. Soon after I implemented it, everything
>>started coming back with NaN results :-(
>>
>>On Tue, Feb 01, 2005 at 09:08:02PM -0800, Joe Rhett wrote:
>>
>>>Good news. This one compiles and runs on FreeBSD ;-)
>>>
>>>On Sun, Jan 30, 2005 at 12:28:24AM +0100, Andreas Ericsson wrote:
>>>
>>>>Finished. :) check_icmp-1.0 is now available, and check_rta is
>>>>officially dropped.
>>>>
>>>>I've started using the new check_icmp in production in our own system to
>>>>get a proper evaluation of it. So far so good.
>>>>
>>>>It compiles silently and cleanly with all possible warning-flags enabled
>>>>on Linux (Owl 1.1 and current, Debian 3.0, Fedora Core 3, RedHat 7.2,
>>>>SuSE 9.0), FreeBSD 5.2.1, NetBSD 1.6, Tru64 Unix 5.1b and HP-UX (11i v2
>>>>and 11i 11.11).
>>>>
>>>>All the above had a decently fresh version of gcc installed, except
>>>>Linux Owl, which was gcc 2.95.3.
>>>>
>>>>Under Tru64 Unix 4.0g it compiles with warnings for %llu printf format
>>>>strings (for certain debug strings).
>>>>
>>>>According to reports it also compiles nicely under solaris 2.9.
>>>>
>>>>I only have root access to linux systems right now, so I haven't been
>>>>able to run very many tests, but timings are completely consistent with
>>>>system ping and all the features and logic works as expected.
>>>>
>>>>It's available at;
>>>>http://oss.op5.se/nagios/check_icmp-1.0.tar.gz
>>>>http://devel.op5.se/oss/check_icmp-1.0.tar.gz
>>>>
>>>>I'll also upload the source file to the SF tracker page.
>>>>
>>>>Cheers
>>>>
>>>>
>>>>Andreas Ericsson wrote:
>>>>
>>>>>Ahoy all.
>>>>>
>>>>>To all those who have tested and commented on check_rta, thanks.
>>>>>
>>>>>I've fixed the SIGSEGV crash (I think, at least I can't reproduce it on
>>>>>my own system any more), and some timing issues causing check_rta to
>>>>>time out prematurely with an obscene packet loss even when rta is below
>>>>>1ms.
>>>>>
>>>>>Inspired by the check_traceroute (or check_route, I don't remember) perl
>>>>>plugin, I've also added an -l flag to specify time-to-live on outgoing
>>>>>packets. It should work where the perl plugin does.
>>>>>
>>>>>The latest version is available from the usual places;
>>>>>https://devel.op5.se/oss
>>>>>http://oss.op5.se/nagios
>>>>>
>>>>>For those of you who have offered me access to Solaris and *BSD boxes,
>>>>>I've attached my public SSH-key. The following commands should work to
>>>>>allow me in without a password.
>>>>>useradd -m exon
>>>>>mkdir ~exon/.ssh
>>>>>cp id_dsa.pub ~exon/.ssh/authorized_keys
>>>>>chown exon:exon -R ~exon/.ssh
>>>>>chmod 700 -R ~exon/.ssh
>>>>>
>>>>>Send me a private email with the hostname/ip-address when you're done.
>>>>>I'd like to spend this day testing everything out so it can be in the
>>>>>1.4-release of the plugins and we can finally get rid of check_ping and
>>>>>the other output parsing ping plugins.
>>>>>
>>>>>
>>>>
>>>>--
>>>>Andreas Ericsson andreas.ericsson at op5.se
>>>>OP5 AB www.op5.se
>>>>Lead Developer
>>>>
>>>>
>>>>-------------------------------------------------------
>>>>This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
>>>>Tool for open source databases. Create drag-&-drop reports. Save time
>>>>by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
>>>>Download a FREE copy at http://www.intelliview.com/go/osdn_nl
>>>>_______________________________________________________
>>>>Nagios Plugin Development Mailing List
>>>>Nagiosplug-devel at lists.sourceforge.net
>>>>Unsubscribe at https://lists.sourceforge.net/lists/listinfo/nagiosplug-devel
>>>>::: Please include plugins version (-v) and OS when reporting any issue.
>>>>::: Messages without supporting info will risk being sent to /dev/null
>>>
>>>--
>>>Joe Rhett
>>>Senior Geek
>>>Meer.net
>>
>>--
>>Joe Rhett
>>Senior Geek
>>Meer.net
>>
>>
>>-------------------------------------------------------
>>This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
>>Tool for open source databases. Create drag-&-drop reports. Save time
>>by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
>>Download a FREE copy at http://www.intelliview.com/go/osdn_nl
>>_______________________________________________________
>>Nagios Plugin Development Mailing List Nagiosplug-devel at lists.sourceforge.net
>>Unsubscribe at https://lists.sourceforge.net/lists/listinfo/nagiosplug-devel
>>::: Please include plugins version (-v) and OS when reporting any issue.
>>::: Messages without supporting info will risk being sent to /dev/null
>
>
--
Andreas Ericsson andreas.ericsson at op5.se
OP5 AB www.op5.se
Lead Developer
More information about the Devel
mailing list