[Nagiosplug-devel] [ nagiosplug-Bugs-1993363 ] check_procs times out on Solaris 10
SourceForge.net
noreply at sourceforge.net
Fri Jun 20 16:40:25 CEST 2008
Bugs item #1993363, was opened at 2008-06-13 16:58
Message generated for change (Comment added) made by maemigh
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=397597&aid=1993363&group_id=29880
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: General plugin execution
Group: snapshot tarball
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: maemigh (maemigh)
Assigned to: Ton Voon (tonvoon)
Summary: check_procs times out on Solaris 10
Initial Comment:
I'm having problems with the latest snapshot of check_procs timing out. The version is v1991 (nagios-plugins 1.4.12). This is using pst3 included with the plugins.
./check_procs
CRITICAL - Plugin timed out after 10 seconds
./check_procs -w 2:2 -c 2:2 -C nagios
CRITICAL - Plugin timed out after 10 seconds
I've tried sparc and x86 builds, both timeout:
Solaris 5.10 Generic_118833-36 sparc
Solaris Generic_127112-11 i386
----------------------------------------------------------------------
>Comment By: maemigh (maemigh)
Date: 2008-06-20 10:40
Message:
Logged In: YES
user_id=1520524
Originator: YES
SVN from 6-19 is causing a segfault on one of our Solaris 9 servers:
Here is some output from truss:
stat("/usr/platform/SUNW,Sun-Fire-V240/lib/sparcv9/libkvm_psr.so.1",
0xFFFFFFFF7FFFE8D0) Err#2 ENOENT
brk(0x100102520) = 0
brk(0x100106520) = 0
stat("/dev/kmem", 0xFFFFFFFF7FFFF2D0) = 0
stat("/dev/mem", 0xFFFFFFFF7FFFF250) = 0
stat("/dev/kmem", 0xFFFFFFFF7FFFF1D0) = 0
stat("/dev/allkmem", 0xFFFFFFFF7FFFF150) = 0
open("/dev/kmem", O_RDONLY) = 3
open("/dev/mem", O_RDONLY) = 4
open("/dev/ksyms", O_RDONLY) = 5
read(5, "7F E L F020201\0\0\0\0\0".., 16) = 16
lseek(5, 0, SEEK_SET) = 0
lseek(5, 0, SEEK_END) = 895494
mmap(0x00000000, 895494, PROT_READ, MAP_PRIVATE, 5, 0) =
0xFFFFFFFF7E300000
munmap(0xFFFFFFFF7E300000, 895494) = 0
close(5) = 0
pread(3, "\0\0030E92 " u p", 8, 0x0142C300) = 8
pread(3, "\0\0030E92 " u p", 8, 0x0142C300) = 8
ioctl(1, TCGETA, 0xFFFFFFFF7FFFE14C) = 0
fstat(1, 0xFFFFFFFF7FFFE0E0) = 0
S UID PID PPID VSZ RSS %CPU COMMAND ARGS
write(1, " S U I D P I".., 52) = 52
pread(3, "\0\003\005F7 5 0\0\003\0".., 2584, 0x30E92227570) = 2584
pread(3, "\0\0 Z ?", 4, 0x30003070C4C) = 4
pread(3, "\0\0\0 R\0\0 Z ?\0\0\0\0".., 32, 0x30003070C48) = 32
open("/proc/23103/as", O_RDONLY) = 5
pread(5, "FFFFFFFF7FFFFD\0\0\0\0\0".., 1240, 0xFFFFFFFF7FFFFB28) = 1240
close(5) = 0
open("/proc/23103/psinfo", O_RDONLY) = 5
read(5, "\b\0C4 H\0\0\001\0\0 Z ?".., 416) = 416
close(5) = 0
O 0 23103 23102 2096 1160 0.1 pst3 ./pst3
write(1, " O 0 2 3 1 0".., 52) = 52
pread(3, "\0\003\01CE6D6 8\0\003\0".., 2584, 0x3000513A120) = 2584
pread(3, "\0\0 Z >", 4, 0x3000316EA84) = 4
pread(3, "\0\0\0 u\0\0 Z >\0\0\0\0".., 32, 0x3000316EA80) = 32
open("/proc/23102/as", O_RDONLY) = 5
pread(5, "FFFFFFFF7FFFFCE8FFFFFFFF".., 1272, 0xFFFFFFFF7FFFFB08) = 1272
close(5) = 0
open("/proc/23102/psinfo", O_RDONLY) = 5
read(5, "\b02 @\b\0\0\002\0\0 Z >".., 416) = 416
close(5) = 0
S 0 23102 22954 2800 1784 0.1 truss truss ./pst3
write(1, " S 0 2 3 1 0".., 59) = 59
pread(3, "\0\003\001 -A8 p\0\003\0".., 2584, 0x3000253B450) = 2584
pread(3, "\0\0 YAA", 4, 0x30002D0DF6C) = 4
pread(3, "\0\0\019\0\0 YAA\0\0030E".., 32, 0x30002D0DF68) = 32
open("/proc/22954/as", O_RDONLY) = 5
pread(5, "FFBFFF p\0\0\0\0FFBFFF t".., 188, 0xFFBFFF44) = 188
close(5) = 0
brk(0x100106520) = 0
brk(0x10091E520) = 0
Incurred fault #6, FLTBOUNDS %pc = 0xFFFFFFFF7F400648
siginfo: SIGSEGV SEGV_MAPERR addr=0x101131598
Received signal #11, SIGSEGV [default]
siginfo: SIGSEGV SEGV_MAPERR addr=0x101131598
----------------------------------------------------------------------
Comment By: maemigh (maemigh)
Date: 2008-06-19 14:31
Message:
Logged In: YES
user_id=1520524
Originator: YES
PS_COMMAND was set, but it was set to a location that didn't exist. I was
running the check_procs command without first doing a make install. I'm
thinking that the plugin should probably report that
/usr/local/nagios/libexec/pst3 doesn't exist rather than timing out after
10 seconds. I ran into another problem with pst3 in that it will not run
inside a Solaris zone (as /dev/kmem does not exist) -- are there plans to
make changes to allow for use in zones?
----------------------------------------------------------------------
Comment By: Ton Voon (tonvoon)
Date: 2008-06-17 05:23
Message:
Logged In: YES
user_id=664364
Originator: NO
Sorry, misread your report. So are you saying that PS_COMMAND is not set?
----------------------------------------------------------------------
Comment By: Ton Voon (tonvoon)
Date: 2008-06-17 05:22
Message:
Logged In: YES
user_id=664364
Originator: NO
Hi maemigh,
The issue is that pst3 times out as it is taking too long to query. We've
found this on a master host with multiple zones. Please try the snapshot at
http://nagiosplug.sf.net/snapshot as pst3 has been optimised to make less
kvm calls.
It would be useful if you could give us timings before and after the
snapshot.
Beware, we've recently found an issue where it can coredump if a process
disappears as it is trying to access it - a fix is due soon.
Ton
----------------------------------------------------------------------
Comment By: maemigh (maemigh)
Date: 2008-06-16 16:32
Message:
Logged In: YES
user_id=1520524
Originator: YES
Had time to do a little more digging. This happens if the file pointed to
by PS_COMMAND does not exist. There do not appear to be any checks within
spopen to handle a return code from execve in the event of an error.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=397597&aid=1993363&group_id=29880
More information about the Devel
mailing list