[Nagiosplug-devel] check_ntp (Was: Flight 1.4.8, ready for boarding)
sean finney
seanius at seanius.net
Thu Apr 5 09:08:28 CEST 2007
heya,
On Wed, 2007-04-04 at 18:05 -0400, Thomas Guyot-Sionnest wrote:
> > okay. i *think* that's the same thing with just a different name,
> > right?
>
> Not really. I don't know what's taken into account when calculating the
> jitter but the dispersion is definately higher in general (or always?)
okay, here are some definitions i got from an ntp powerpoint
presentation made by the author:
Jitter: exponential average of first-order time differences
Dispersion: maximum error due oscillator frequency tolerance.
so yeah, not quite the same thing. and as you pointed out the
differences in value are quite big. should we really be checking it
with jitter then? maybe instead we could extend the plugin to check
various variables/thresholds, and have it return some failure status
when a non-existant variable is requested?
> > but anyway, as far sampling/averaging goes, the offset/delay can vary a
> > bit more if the network is less than reliable iirc, hence the multiple
> > requests. this is what the ntp cmdline client does as well.
>
> What's you're pooling in the jitter section is local variables on the
> remote server. That server will update them as time goes, but they'Re
> not affected by network conditions.
ah, i see. so the control packet data probably won't change in the
interval that it's being checked. in that case it doesn't make any
sense to average it, i agree. however, since this is udp we're talking
about, maybe it's still worthwhile to throw a couple extra packets on
the wire to make sure one of them is recieved? or perhaps we could
default to a single packet, but provide a configurable retry parameter
or something.
> The hostname is resolved again in the jitter section. Anyway that's not
> a big deal, would only be useful for implementing #4.
oh, i was looking in the offset_request function there, sorry.
> >> 4. When multiple servers are specified (either multiple IP per hostname
> >> or multiple -H aguments, check the jitter for all servers.
> >
> > i think this falls back into the mimicking-behaviour design again.
> > previously i believe we only checked the jitter on the remote clock
> > declared as the sync source, but i could be wrong. i don't really think
> > this is the *right* behaviour, but before i went fixing it the idea was
> > to get something that was compatible with the current versoin.
>
> Right now it gets the first server listed in dns (while the offset
> function gets them all) and find the synchronization source. It then
> check the sync source; if there is none it will check all candidates.
right. again to be filed under "behaves as before" wrt check_ntp and
ntpq/ntpdate.
> > actually, istr someone pointing out several months ago that we were
> > really doing the wrong thing to begin with wrt jitter checking, and that
> > we ought to really be checking the local jitter and not the jitter of
> > remote systems to begin with, or something like that. i'm going from
> > some hazy memory here, but i think ultimately the problem is that there
> > are two use cases for check_ntp, but the code has in the past and still
> > currently not differentiated between the two cases.
>
> The server jitter somewhat related to its peers's jitter so that's not a
> big deal. Moreover, Older ntp server does not have dispersion for the
> server itself so it makes supporting them even worse.
out of curiosity, is there any difference on the packet level that could
let us know the version/vendor of the ntp server?
> > first you have the case of checking the status of the local system, by
> > connecting to peers specified on the cmdline and verifying the offset.
> > in such cases we really want to see the local jitter and not the remote
> > jitter.
>
> Can you explain? I remember the old perl script user to show a 0/almost
> 0 jitter on localhost, but that's devinately not what we get when
> getting the server jitter. It looks more like it was getting the time
> and then showing the jitter in that operation.
my memory is hazy, but i'll go digging through the list archives and see
if i can find the message i'm thinking of.
> > the second case is when you're actually interested in the status of the
> > remote system, and in this case you're comparing the state of its clock
> > with that of yours (or others), and in which case you're interested in
> > the jitter on the remote system.
>
> This is what check_ntp currently do and should be the default behavior IMHO.
so for clarity: with check_ntp -H host, should the jitter on the host be
calculated, or the jitter of its sync source / candidate sync sources be
checked?
> > if i'm remembering all of this correctly, i think it would be best to
> > provide a flag for which form of check we're doing and then have the
> > plugin behave appropriately based on that.
>
> Agreed. But for testing the first case we should only accept to run it
> locally (ex. trough NRPE on a remote time server) otherwise it doesn't
> make much sense.
agreed on both these.
> >> 5. Look into the possibility of storing some of the sent header in a
> >> linked list on write and then match them on reads. That will allow to
> >> send all packets as fast as possible (ex. when checking the jitter of
> >> all sync candidates) and also to easily drop odd packets. If put in a
> >> separate routine that would also allow to easily loop for additional
> >> packets and append the data. (Any other suggestion?)
> >
> > i'm not quite sure i follow here. how this is different from poll on an
> > array of sockets...? currently afaik the data *is* sent as fast as
> > possible, and we read the data as fast as it comes in. if we need more
> > per-host information, we know ahead of time how many hosts/sockets/etc
> > that are needed, so i don't think there's any need for a linked list
> > instead of a pre-allocated array for whatever extra data we need to
> > track.
>
> This has nothing to do with the array of sockets, but rather making sure
> what we get back is what we expect. This is of the lowest priority but
> should speed up a bit some sequential operations when there is latency
> (like checking jitter on all candidates).
i think i was looking at the offset_request function again, oops.
anyway, probably the same method could still be used, though the setup
might be a little more complicated if the total size results from a few
different gettaddrinfo calls. is that what you were thinking of using
the linked list for? i.e. the setup and not the actual i/o?
sean
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 191 bytes
Desc: This is a digitally signed message part
URL: <http://nagios-plugins.org/archive/devel/attachments/20070405/899f4f29/attachment.sig>
More information about the Devel
mailing list