[Nagiosplug-devel] Problem: check_icmp incorrectly reporting packet loss?
Sascha Runschke
srunschke at abit.de
Tue Mar 22 02:41:21 CET 2005
Greetings,
I have been using Nagios together with check_icmp for quite some time now
and it mostly went very smooth. Just lately I made a few changes and now
it seems that check_icmp is causing quite some problems.
Since I am running quite a highly parallelized Nagios environment, I
ordered
a new server to get rid of the load spikes, I switched from an older
Nagios 2
beta to 2.0b2 and updated my old nagiosplug cvs snapshot to 1.4.
The old server was running RH9, the new one is running RH Enterprise
Server
4. SELinux is disabled, since I couldn't get it to work with the
Nagios.cmd file
being accessed by the http- and usr-content.
After a few days of smooth work, check_icmp seems to start choking,
reporting
paket loss and high latency where there is none actually.
Following scenario:
HP Compaq DL 380 G4 (Xeon 3.4Ghz) 3 GB Ram
RedHat Enterprise Server 4 (SE disabled)
Nagios 2.0b2
check_icmp 2005_03_15 (tried that after having problems with the one from
nagiosplug 1.4)
I'm running around 300 service checks per minute, quite some of them icmp
related.
Problem:
After a few days check_icmp suddenly started to report high latency and
60-80%
paket loss, but the strange thing is that it only reported it for 2
outgoing lines.
(I am monitoring about 10 remote sites in addition to our HQ)
So my first conclusion was problems with the internet connection for those
sites. But to my surprise there were absolutely no problems. Neither the
users
there noticed _anything_, nor did my ping tests bring anything up. Even
pinging
from the Nagios machine worked absolutely fine - but using check_icmp to
ping
brought up the same strange behaviour - high latency (500-1000ms) and high
paket
loss (60-80%) with half of the checks made - the others were fine.
Using /bin/ping at the same time brings up:
[root at nagios check_icmp-2005-03-15]# ping 62.159.233.94
PING 62.159.233.94 (62.159.233.94) 56(84) bytes of data.
...
...
--- 62.159.233.94 ping statistics ---
21 packets transmitted, 21 received, 0% packet loss, time 20027ms
rtt min/avg/max/mdev = 37.718/109.663/307.297/98.933 ms, pipe 2
The quite high rtt is normal, since that site has a continuous bandwidth
usage,
but it seldom spikes - and it doesn't spike to 800ms with 80% loss for 44
hours
in a row like check_icmp reported.
A -v -v -v log output from check_icmp at the same time when the ping was
done
is attached.
I'll see if I get to dig into check_icmp.c myself, but I am not too sure
it's gonna happen
since I'm loaded with work :/
Any ideas or hints to the problem?
sash
--------------------------------------------------
Sascha Runschke
Netzwerk Administration
IT-Services
ABIT AG
Robert-Bosch-Str. 1
40668 Meerbusch
Tel.:+49 (0) 2150.9153.226
Mobil:+49 (0) 173.5419665
mailto:SRunschke at abit.de
http://www.abit.net
http://www.abit-epos.net
http://www.my-academy.net
--------------------------------------------------
Der Inhalt dieser Email sowie die Anhänge sind ausschließlich für den
bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorgesehene Adressat
dieser Email oder dessen Vertreter sein sollten, so beachten Sie bitte,
daß jede Form der Kenntnisnahme, Veröffentlichung, Vervielfältigung oder
Weitergabe des Inhalts dieser Email unzulässig ist. Wir möchten Sie
außerdem darauf hinweisen, daß die Kommunikation per Email über das
Internet unsicher ist, da fuer unberechtigte Dritte grundsätzlich die
Möglichkeit der Kenntnisnahme und Manipulation besteht. Wenn Sie diese
Nachricht versehentlich erhalten, informieren Sie bitte den Absender und
löschen diese Nachricht mit den Anhängen. Herzlichen Dank
The information and any attachments contained in this email are intended
solely for the addressee. Access to this email by anyone else is
unauthorized. If you are not the intended recipient, any form of
disclosure, reproduction, distribution or any action taken or refrained
from in reliance on it, is prohibited and may be unlawful. We also like to
inform you that communication via email over the internet is insecure
because third parties may have the possibility to access and manipulate
emails. If you have received the message in error, please advise the
sender and delete the message and any attachments. Thank you very much.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: check_icmp_logoutput.txt
URL: <http://nagios-plugins.org/archive/devel/attachments/20050322/099e1af1/attachment.txt>
More information about the Devel
mailing list