[Nagiosplug-devel] Trouble with check_by_ssh and Sun's SSH.

gabriel rosenkoetter grosen at cc3.com
Tue Jun 8 14:06:25 CEST 2004


[My apologies if this post appears twice. I *think* that Mailman on
sourceforge is choking on my envelope address not matching my
subscription--and functional email--address.]

I seem to be having the same problem described by Patrick Walentiny
here:

  http://sourceforge.net/mailarchive/message.php?msg_id=6714332

I can confirm that this happens only when running check_by_ssh
against Sun's localized OpenSSH port that ships with Solaris 9.

This announces itself as "SSH-2.0-Sun_SSH_1.0" on the wire.

The relevant Sun packages are:

system      SUNWsshcu      SSH Common, (Usr)
system      SUNWsshdr      SSH Server, (Root)
system      SUNWsshdu      SSH Server, (Usr)
system      SUNWsshr       SSH Client and utilities, (Root)
system      SUNWsshu       SSH Client and utilities, (Usr)

The only relevant Sun patch that I seem to have is 113273-04, for
the SUNWsshdu package.

I'm executing this checkcommand:

define command {
  command_name check_remote_disk
  command_line $USER1$/check_by_ssh -l nagios -i $USER2$/check_disk -H $HOSTADDRESS$ -C 'echo'
}

On the remote host, I have an SSH public key in
~nagios/.ssh/authorized_keys with:

command="/usr/local/nagios/libexec/check_disk -w 10% -c 5% -l" ssh-dss [...]

When I run check_by_ssh by hand and then echo $?, it invariable
returns 0, but Nagios persists in producing output like this:

---
Notification Type: PROBLEM 
 
Host: delta
Service: disk usage - Unix
Address: x.y.z.67
State: UNKNOWN
 
Date/Time: Tue Jun 8 13:58:10 EDT 2004

Additional Info:

DISK OK - free space: / 2961 MB (74%): /usr 9029 MB (89%): /var 4775 MB (95%): /var/run 1942 MB (100%): /tmp 1942 MB (99%): /opt 8474 MB (84%): /export/home 19878 MB (99%): /xy/moe/A 195575 MB (36%):
---

This would seem to be inherently contradictory to me. I shouldn't be
getting the "DISK OK" unless check_disk had returned a 0 exit code,
which should be passed by sshd to check_by_ssh, which should then
return this.

I would be blaming Solaris here... if I didn't get exactly what I
expected when I try to execute this same command on the system
running the Nagios daemon:

jedi:~# /usr/local/nagios/libexec/check_by_ssh -l nagios -i
/etc/nagios/ssh-keys/check_disk -H delta -C 'echo' && echo "returned: $?"
DISK OK - free space: / 2961 MB (74%); /usr 9029 MB (89%); /var 4775 MB (95%); /var/run 1923 MB (100%); /tmp 1923 MB (99%); /opt 8474 MB (84%); /export/home 19878 MB (99%); /xy/moe/A 195575 MB (36%);| /=2961MB;3624;3825;0;4027 /usr=9028MB;9079;9583;0;10088 /var=4775MB;4539;4791;0;5044 /var/run=1923MB;1730;1826;0;1923 /tmp=1923MB;1748;1845;0;1943 /opt=8473MB;9079;9583;0;10088 /export/home=19877MB;18155;19164;0;20173 /xy/moe/A=195575MB;483192;510036;0;536880
returned: 0

I am conscious of this reply:

  http://sourceforge.net/mailarchive/message.php?msg_id=6714335

but Meg seems to have misunderstood the problem. The issue is not
that we're seeing false positives (OK--return code 0--returned when
the service is not okay) but false negatives (UNKNOWN--return code
3--returned when the service IS okay, and even has returned the
string saying it's okay).

Has anyone figured out what's broken about Sun's SSH or the
check_by_ssh plugin's interpretations of its replies that causes
this?

-- 
gabriel rosenkoetter
Transcontinental CC3 Unix & Linux sysadmin
grosen at cc3.com




More information about the Devel mailing list