[Top] [All Lists]

RE: [nv-l] Status Polling myth

To: nv-l@lists.us.ibm.com
Subject: RE: [nv-l] Status Polling myth
From: Oliver Bruchhaeuser <oliver.bruchhaeuser@de.ibm.com>
Date: Tue, 28 Jun 2005 08:06:25 +0200
Delivery-date: Tue, 28 Jun 2005 07:06:54 +0100
Envelope-to: nv-l-archive@lists.skills-1st.co.uk
In-reply-to: <1D99739B79BF7744BF8927B8F2274CA2063EB4@HQGTNEX5.doe.local>
Reply-to: nv-l@lists.us.ibm.com
Sender: owner-nv-l@lists.us.ibm.com

Demystification #2:

the "Retry Count" value in the SNMP Configuration dialog actually means the total number of pings or snmp requests. Including the initial one.
It is planned to change the confusing wording in this dialog in the future.


"Evans, Bill" <Bill.Evans@hq.doe.gov>
Sent by: owner-nv-l@lists.us.ibm.com

27.06.2005 19:40
Please respond to

"'nv-l@lists.us.ibm.com'" <nv-l@lists.us.ibm.com>
RE: [nv-l] Status Polling myth

Thank you for setting me straight.   I never could find an official explanation for the older behavior and that increment/decrement  comment didn't make sense.  Since there never was an easy means of directly observing what that behavior was I've been working with an outdated presentation based on poorly documented initial information from WAY back on NetView Version 4.  At least I wasn't alone in my error of understanding.  

I hope someday they'll get the Administrator's Reference and the Man Pages updated to how it behaves today.  The note that the value will "remain as configured" isn't very informative and I'm afraid I misread it as "working as it previously did".  It would help if they simply stated (and I hope this is correct):

    "NetView will wait <timeoutvalue> tenths of seconds for an SNMP or ICMP
     reply and if one does not arrive it will retry the query up to <retryvalue>
     additional times. An error is returned if a response is not received on any of
     the attempts."  

I'm never sure on these configurations whether the value configured is the total number of attempts or the number of retries.  At least one of the IBM Tivoli products is confusing in that regard.  The XNMSNMPCONF menu is also confusing since it displays the data and has you enter the timeout value in seconds.  It then records the value in OVSNMP.CONF (and the database) in tenths of seconds and  XNMSNMPCONF –RES also displays in tenths of a second.  

My apologies to those I misled (including me).  I will now feel safe in my own system in setting longer timeout values than I have been using and it may fix some problems we occasionally have with nodes which are two WAN hops away from home.  


-----Original Message-----
owner-nv-l@lists.us.ibm.com [mailto:owner-nv-l@lists.us.ibm.com] On Behalf Of Leslie Clark
Monday, June 27, 2005 12:17 PM
RE: [nv-l] Status Polling myth


The behavior you are referring to was corrected some time ago. There have been two

changes. In '97, the man man page for ovsnmp.conf was updated to match the actual

behavior, which was NOT exponential increments, but something more arcane as described

in the APAR below. That behavior (decrementing!) I remember as resulting in a lot of

false alarms that appeared to make no sense. That behavior was modified in V6.0.1

to just do what you told it to do. However, the man page for ovsnmp.conf still talks

about increments and decrements. Time for another update...

(What increment exponentially, as I recall, is name resolution timeouts with retries.)

Here's the current behavior, from the release notes for 6.0.1, in the new features section:

Ping and SNMP Timeout Values

The netmon daemon no longer dynamically adjusts the ping and

SNMP timeout values. These values remain as configured in the

SNMP Options dialog.

And here's the description of the apar that tells how it was behaving before that:

APAR - IX68241


The man page for ovsnmp.conf describes the timeoutInterval for  
a device.  The timeoutInterval value is described to  
begin at x and is doubled until the specified timeoutInterval  
value is reached on the last retry.  This is not how the  
value actually works.  The first status poll  
to a device uses the timeoutInterval value specified.  If the  
device replies within this time, then the value is decremented  
by 1 second.  The value will continue to be decremented until  
it reaches either 1 second or the device no longer responds to  
the poll within the time.  When the device no longer responds  
within the time, then the timeoutInterval value is increased  
by 1 second, except on the last retry the timeoutInterval value  
is increased to the full value specified plus 1 second.

The ovsnmp.conf man page has been updated to correct the description  
of the timeoutInterval value.      


Leslie A. Clark
IBM Global Services - Systems Mgmt & Networking
(248) 552-4968 Voicemail, Fax, Pager

"Evans, Bill" <Bill.Evans@hq.doe.gov>
Sent by: owner-nv-l@lists.us.ibm.com

06/24/2005 04:49 PM

Please respond to

"'nv-l@lists.us.ibm.com'" <nv-l@lists.us.ibm.com>
RE: [nv-l] Status Polling



Your description was much clearer than mine.  

        "NetView compensates by its geometrically increasing waits on retries"  

Translation: "Every retry doubles the timeout value" = "a geometric progression".  

The NetView for Administrators class also used to warn attendees not to set the retries and wait time so high that the total exceeded the polling cycle.  Your example of 7 retries with 1 second timeout would exceed a two minute polling cycle.   NetView Administration is not for the arithmetically challenged or those who don't appreciate the relationships among the tuning values.  

Thanks for clearing up my muddy wording.

Bill Evans


-----Original Message-----
From: owner-nv-l@lists.us.ibm.com [
mailto:owner-nv-l@lists.us.ibm.com] On Behalf Of Barr, Scott
Sent: Friday, June 24, 2005 4:28 PM

To: nv-l@lists.us.ibm.com

Subject: RE: [nv-l] Status Polling

One warning about retries.

Each time you retry, the SNMP or ping, netmon appears to double the
timeout value. So, if you set 7 retries with 1 second time out, you get

1, 2, 4, 8, 16, 32, 64 seconds timeout values.

If you have a lot of nodes this way, that can cause more issues than it

One caveat, I've been doing TEC/Framework/ITM for a while so the way
netmon behaves may have changed some time ago.

<Prev in Thread] Current Thread [Next in Thread>

Archive operated by Skills 1st Ltd

See also: The NetView Web