nv-l
[Top] [All Lists]

Re: Netmon Timeout and Retry intervals

To: nv-l@lists.tivoli.com
Subject: Re: Netmon Timeout and Retry intervals
From: Gareth_Holl@tivoli.com
Date: Sun, 3 Sep 2000 02:08:27 -0400

I think the easiest way to figure out what netmon is doing is to follow a netmon
trace and pick a few examples. Choose a node which is actually down and see how
netmon treats it. Also choose one which is up and follow it through the
sequence. A "netmon -M 3" trace should be sufficient. You will see "timeout  =
x" at the end of a "Sending ping...." entry in the trace file which should
indicate how long it will be before you see the "expired ping" message if no
ping response comes in first. The x above represents the timeout in seconds.

For NetView 5.1 to 5.1.3 (not sure if 5.0 is the same), netmon reduces the
timeout value for nodes which have been successfully pinged by 1 second each
time until a timeout of 1 second is reached. In your example, if a node which
has been up for some time suddenly goes down, it will be pinged as follows:
ping#1, wait 1 second (timeout reduced from 2 seconds to 1 second because it has
been successfully pingeg up until now)
ping#2, wait 3 seconds (take configured timeout value of 2 seconds and add a
second = 3 seconds)
ping#3, wait 4 seconds (another second is added I think)

You are right when you say that retries really means total tries. You should be
able to verify the above sequence by following the time stamps in the
netmon.trace file. On the next status polling cycle, you will see that this node
is pinged only once (netmon does not use retries on nodes that have previously
marked down) and the last timeout value should be used - in this case 4 seconds
before another expired message would be seen if no response came back.

Below is an example taken from an actual netmon trace file. The snmp config info
is as follows:
Status polling = 60 seconds
Retries = 2  (which means total ping attempts will be 2)
Timeout = 3 seconds
(Note: There are entries in the trace file between the lines shown below)
13:01:45 : nl_pinger.c[234] : sending ping to 10.178.33.59 seqnum = 25098 ident
= 24002 timeout = 1
13:01:46 : nl_pinger.c[234] : sending ping to 10.178.33.59 seqnum = 25098 ident
= 24002 timeout = 4
13:01:50 : nl_pinger.c[413] :  expired ping to 10.178.33.59 (UDCP213B) seqnum =
25098 ident = 24002
13:02:50 : nl_pinger.c[234] : sending ping to 10.178.33.59 seqnum = 26552 ident
= 24002 timeout = 4
13:02:50 : nl_pinger.c[916] : -> received ping from 10.178.33.59 (UDCP213B)
Explanation:
Node 10.178.33.59 had been successfully pinged prior to 13:01:45 with the last
successful ping probably at 13:00:45. This has lead to an adjusted timeout value
of 1 second as shown by ".....timeout = 1" at the end of the trace entry. So
netmon is only going to wait 1 second for a ping response before sending another
ping. Thus at 13:01:46, netmon sends its second and last ping attempt. You will
notice that the timeout value is now 4. This was obtained by taking the
configured timeout value of 3 seconds and adding 1 second. So the second ping
will wait 4 seconds for a response before timing out. In this case the response
never came and because it was the final attempt, you will see the expired
message at 13:01:50 which is 4 seconds after the ping was sent. Notice that the
two pings and the expired ping message all have the same ping sequence number
(seqnum = 25098) as they all belong to the same poll cycle for the node
10.178.33.59.
This node's status is checked again 60 seconds later at 13:02:50 as configured
(notice the seqnum is different as this is a new poll cycle). The last timeout
value is used. In this case, the node is back up (the ping response is
received). I would expect that the next  check at 13:03:50 would have a timeout
= 3 and so on.
Remember that if node 10.178.33.59 had not responded to the ping at 13:02:50
within 4 seconds, you would have seen an expired message (no retries since the
node was marked as down from a previous poll cycle).

Please note that I am not from netmon support and this is just my understanding
based on what I have experienced and been told.

Regards,
Gareth



Gareth Holl
Software Engineer
Olympics Advocate
gholl@tivoli.com

Tivoli Systems / IBM Corporation
Research Triangle Park,  NC  27703.    1-800-TIVOLI-8


don-n-darla@att.net on 09/02/2000 01:45:05 AM

Please respond to IBM NetView Discussion <nv-l@tkg.com>

To:   nv-l@tkg.com
cc:    (bcc: Gareth Holl/Tivoli Systems)
Subject:  [NV-L] Netmon Timeout and Retry intervals




Whoa..... hold on there folks.
According to the last information that I received from
support, NetView 5.1 (and later) for Unix was changed,
it does NOT work as you think.

My information goes like this:
1. Ping the interface
2. Wait the configured wait interval
3. Ping the interface again
4. Add 1 second to the previous wait interval
5. Repeat steps 3-4 until last retry
6. Drop back to the configured wait interval
7. Mark interface Down

Example:
Timeout=2
Retry=3
Ping - Wait 2 - Ping - Wait 3 - Ping - Wait 2 - DOWN
This is the default setting that equates to 7 seconds.


Additionally: A retry of zero or one, pings once.
The retry count is actually the "try count".

I hope someone from netmon support can comment on this.

Don Davis
Alliance of Professionals and Consultants, Inc.
Tivoli Certified Instructor / Consultant
_________________________________________________________________________
NV-L List information and Archives: http://www.tkg.com/nv-l


<Prev in Thread] Current Thread [Next in Thread>

Archive operated by Skills 1st Ltd

See also: The NetView Web