[Nagiosplug-devel] Problem: check_icmp incorrectly reporting packet loss?

Sascha Runschke srunschke at abit.de
Tue Mar 22 02:41:21 CET 2005


Greetings,

I have been using Nagios together with check_icmp for quite some time now
and it mostly went very smooth. Just lately I made a few changes and now
it seems that check_icmp is causing quite some problems.
Since I am running quite a highly parallelized Nagios environment, I 
ordered
a new server to get rid of the load spikes, I switched from an older 
Nagios 2
beta to 2.0b2 and updated my old nagiosplug cvs snapshot to 1.4.
The old server was running RH9, the new one is running RH Enterprise 
Server
4. SELinux is disabled, since I couldn't get it to work with the 
Nagios.cmd file
being accessed by the http- and usr-content.

After a few days of smooth work, check_icmp seems to start choking, 
reporting
paket loss and high latency where there is none actually.

Following scenario:

HP Compaq DL 380 G4 (Xeon 3.4Ghz) 3 GB Ram
RedHat Enterprise Server 4 (SE disabled)
Nagios 2.0b2
check_icmp 2005_03_15 (tried that after having problems with the one from 
nagiosplug 1.4)
I'm running around 300 service checks per minute, quite some of them icmp 
related.

Problem:

After a few days check_icmp suddenly started to report high latency and 
60-80%
paket loss, but the strange thing is that it only reported it for 2 
outgoing lines.
(I am monitoring about 10 remote sites in addition to our HQ)
So my first conclusion was problems with the internet connection for those
sites. But to my surprise there were absolutely no problems. Neither the 
users
there noticed _anything_, nor did my ping tests bring anything up. Even 
pinging
from the Nagios machine worked absolutely fine - but using check_icmp to 
ping
brought up the same strange behaviour - high latency (500-1000ms) and high 
paket
loss (60-80%) with half of the checks made - the others were fine.

Using /bin/ping at the same time brings up:

[root at nagios check_icmp-2005-03-15]# ping 62.159.233.94
PING 62.159.233.94 (62.159.233.94) 56(84) bytes of data.
...
...
--- 62.159.233.94 ping statistics ---
21 packets transmitted, 21 received, 0% packet loss, time 20027ms
rtt min/avg/max/mdev = 37.718/109.663/307.297/98.933 ms, pipe 2

The quite high rtt is normal, since that site has a continuous bandwidth 
usage,
but it seldom spikes - and it doesn't spike to 800ms with 80% loss for 44 
hours
in a row like check_icmp reported.

A -v -v -v log output from check_icmp at the same time when the ping was 
done
is attached.

I'll see if I get to dig into check_icmp.c myself, but I am not too sure 
it's gonna happen
since I'm loaded with work :/

Any ideas or hints to the problem?

sash




--------------------------------------------------
Sascha Runschke
Netzwerk Administration
IT-Services

ABIT AG
Robert-Bosch-Str. 1
40668 Meerbusch

Tel.:+49 (0) 2150.9153.226
Mobil:+49 (0) 173.5419665
mailto:SRunschke at abit.de

http://www.abit.net
http://www.abit-epos.net
http://www.my-academy.net
--------------------------------------------------
Der Inhalt dieser Email sowie die Anhänge sind ausschließlich für den 
bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorgesehene Adressat 
dieser Email oder dessen Vertreter sein sollten, so beachten Sie bitte, 
daß jede Form der Kenntnisnahme, Veröffentlichung,  Vervielfältigung oder 
Weitergabe des Inhalts dieser Email unzulässig ist. Wir möchten Sie 
außerdem darauf hinweisen, daß die Kommunikation per Email über das 
Internet unsicher ist, da fuer unberechtigte Dritte grundsätzlich die 
Möglichkeit der Kenntnisnahme und Manipulation besteht. Wenn Sie diese 
Nachricht versehentlich erhalten, informieren Sie bitte den Absender und 
löschen diese Nachricht mit den Anhängen. Herzlichen Dank

The information and any attachments contained in this email are intended 
solely for the addressee. Access to this email by anyone else is 
unauthorized. If you are not the intended recipient, any form of 
disclosure, reproduction, distribution or any action taken or refrained 
from in reliance on it, is prohibited and may be unlawful. We also like to 
inform you that communication via email over the internet is insecure 
because third parties may have the possibility to access and manipulate 
emails. If you have received the message in error, please advise the 
sender and delete the message and any attachments. Thank you very much.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: check_icmp_logoutput.txt
URL: <http://nagios-plugins.org/archive/devel/attachments/20050322/099e1af1/attachment.txt>


More information about the Devel mailing list