Thank you for setting me straight. I never could find an official
explanation for the older behavior and that increment/decrement comment didn't
make sense. Since there never was an easy means of directly observing
what that behavior was I've been working with an outdated presentation based on
poorly documented initial information from WAY back on NetView Version 4. At
least I wasn't alone in my error of understanding.
I hope someday they'll get the Administrator's Reference and the Man Pages
updated to how it behaves today. The note that the value will
"remain as configured" isn't very informative and I'm afraid I
misread it as "working as it previously did". It would help if
they simply stated (and I hope this is correct):
"NetView will wait <timeoutvalue> tenths of
seconds for an SNMP or ICMP
reply and if one does not arrive it will retry the
query up to <retryvalue>
additional times. An error is returned if a response
is not received on any of
the attempts."
I'm never sure on these configurations whether the value configured is the
total number of attempts or the number of retries. At least one of the
IBM Tivoli products is confusing in that regard. The XNMSNMPCONF menu is
also confusing since it displays the data and has you enter the timeout value
in seconds. It then records the value in OVSNMP.CONF (and the database)
in tenths of seconds and XNMSNMPCONF –RES also displays in tenths
of a second.
My apologies to those I misled (including me). I will now feel safe in my
own system in setting longer timeout values than I have been using and it may
fix some problems we occasionally have with nodes which are two WAN hops away
from home.
Bill
-----Original Message-----
From: owner-nv-l@lists.us.ibm.com
[mailto:owner-nv-l@lists.us.ibm.com] On Behalf Of Leslie Clark
Sent: Monday, June
27, 2005 12:17 PM
To: nv-l@lists.us.ibm.com
Subject: RE: [nv-l] Status Polling
myth
The behavior you are referring to was corrected some
time ago. There have been two
changes.
In '97, the man man page for ovsnmp.conf was updated to match the actual
behavior,
which was NOT exponential increments, but something more arcane as described
in
the APAR below. That behavior (decrementing!) I remember as resulting in a lot
of
false
alarms that appeared to make no sense. That behavior was modified in V6.0.1
to
just do what you told it to do. However, the man page for ovsnmp.conf still
talks
about
increments and decrements. Time for another update...
(What
increment exponentially, as I recall, is name resolution timeouts with
retries.)
Here's
the current behavior, from the release notes for 6.0.1, in the new features
section:
Ping and SNMP
Timeout Values
The netmon
daemon no longer dynamically adjusts the ping and
SNMP timeout
values. These values remain as configured in the
SNMP Options
dialog.
And
here's the description of the apar that tells how it was behaving before that:
APAR
- IX68241
TIMEOUT VALUE FOR STATUS POLL
IS INCREASED BY 1 SECOND, BUT MAN PAGE STATES AN EXPONENTIAL ALGORITHM IS USED.
The
man page for ovsnmp.conf describes the timeoutInterval for
polling a device. The timeoutInterval value is described to
begin at x and is doubled until the specified timeoutInterval
value is reached on the last retry. This is not how the
timeoutInterval value actually works. The first status poll
to a device uses the timeoutInterval value specified. If the
device replies within this time, then the value is decremented
by 1 second. The value will continue to be decremented until
it reaches either 1 second or the device no longer responds to
the poll within the time. When the device no longer responds
within the time, then the timeoutInterval value is increased
by 1 second, except on the last retry the timeoutInterval value
is increased to the full value specified plus 1 second.
The
ovsnmp.conf man page has been updated to correct the description
of the timeoutInterval value.
Cordially,
Leslie A. Clark
IBM Global Services - Systems Mgmt & Networking
(248) 552-4968 Voicemail, Fax, Pager
"Evans, Bill" <Bill.Evans@hq.doe.gov>
Sent
by: owner-nv-l@lists.us.ibm.com
06/24/2005 04:49 PM
|
To
|
"'nv-l@lists.us.ibm.com'"
<nv-l@lists.us.ibm.com>
|
cc
|
|
Subject
|
RE: [nv-l] Status Polling
|
|
Your description was
much clearer than mine.
"NetView compensates by its
geometrically increasing waits on retries"
Translation: "Every retry doubles the timeout
value" = "a geometric progression".
The NetView for Administrators class also used to warn
attendees not to set the retries and wait time so high that the total exceeded
the polling cycle. Your example of 7 retries with 1 second timeout would
exceed a two minute polling cycle. NetView Administration is not for the
arithmetically challenged or those who don't appreciate the relationships among
the tuning values.
Thanks for clearing up my muddy wording.
Bill Evans
-----Original Message-----
From: owner-nv-l@lists.us.ibm.com [mailto:owner-nv-l@lists.us.ibm.com] On Behalf Of Barr, Scott
Sent: Friday, June
24, 2005 4:28 PM
To: nv-l@lists.us.ibm.com
Subject: RE: [nv-l] Status Polling
One warning about retries.
Each time you retry, the SNMP or ping, netmon appears
to double the
timeout value. So, if you set 7 retries with 1 second time out, you get
1, 2, 4, 8, 16, 32, 64 seconds timeout values.
If you have a lot of nodes this way, that can cause
more issues than it
solves.
One caveat, I've been doing TEC/Framework/ITM for a
while so the way
netmon behaves may have changed some time ago.