This is excellent information, Gareth/Don, for understanding how netmon
status polls. Thanks very much!
Do either of you, or anyone else, know how NetView/MLM status polls? Is
there any kind of tracing capability that can be used with midmand to get
details similar to the netmon trace?
Joel Gerber - I/T Networking Professional - USAA Information Technology Co.
- San Antonio, TX
* (210)913-4231 * mailto:Joel.Gerber@USAA.com "
http://www.usaa.com
-----Original Message-----
From: Gareth_Holl@tivoli.com [SMTP:Gareth_Holl@tivoli.com]
Sent: Sunday, September 03, 2000 01:08
To: IBM NetView Discussion
Subject: Re: [NV-L] Netmon Timeout and Retry intervals
I think the easiest way to figure out what netmon is doing is to follow a
netmon
trace and pick a few examples. Choose a node which is actually down and see
how
netmon treats it. Also choose one which is up and follow it through the
sequence. A "netmon -M 3" trace should be sufficient. You will see "timeout
=
x" at the end of a "Sending ping...." entry in the trace file which should
indicate how long it will be before you see the "expired ping" message if no
ping response comes in first. The x above represents the timeout in seconds.
For NetView 5.1 to 5.1.3 (not sure if 5.0 is the same), netmon reduces the
timeout value for nodes which have been successfully pinged by 1 second each
time until a timeout of 1 second is reached. In your example, if a node
which
has been up for some time suddenly goes down, it will be pinged as follows:
ping#1, wait 1 second (timeout reduced from 2 seconds to 1 second because it
has
been successfully pingeg up until now)
ping#2, wait 3 seconds (take configured timeout value of 2 seconds and add a
second = 3 seconds)
ping#3, wait 4 seconds (another second is added I think)
You are right when you say that retries really means total tries. You should
be
able to verify the above sequence by following the time stamps in the
netmon.trace file. On the next status polling cycle, you will see that this
node
is pinged only once (netmon does not use retries on nodes that have
previously
marked down) and the last timeout value should be used - in this case 4
seconds
before another expired message would be seen if no response came back.
Below is an example taken from an actual netmon trace file. The snmp config
info
is as follows:
Status polling = 60 seconds
Retries = 2 (which means total ping attempts will be 2)
Timeout = 3 seconds
(Note: There are entries in the trace file between the lines shown below)
13:01:45 : nl_pinger.c[234] : sending ping to 10.178.33.59 seqnum = 25098
ident
= 24002 timeout = 1
13:01:46 : nl_pinger.c[234] : sending ping to 10.178.33.59 seqnum = 25098
ident
= 24002 timeout = 4
13:01:50 : nl_pinger.c[413] : expired ping to 10.178.33.59 (UDCP213B)
seqnum =
25098 ident = 24002
13:02:50 : nl_pinger.c[234] : sending ping to 10.178.33.59 seqnum = 26552
ident
= 24002 timeout = 4
13:02:50 : nl_pinger.c[916] : -> received ping from 10.178.33.59 (UDCP213B)
Explanation:
Node 10.178.33.59 had been successfully pinged prior to 13:01:45 with the
last
successful ping probably at 13:00:45. This has lead to an adjusted timeout
value
of 1 second as shown by ".....timeout = 1" at the end of the trace entry. So
netmon is only going to wait 1 second for a ping response before sending
another
ping. Thus at 13:01:46, netmon sends its second and last ping attempt. You
will
notice that the timeout value is now 4. This was obtained by taking the
configured timeout value of 3 seconds and adding 1 second. So the second
ping
will wait 4 seconds for a response before timing out. In this case the
response
never came and because it was the final attempt, you will see the expired
message at 13:01:50 which is 4 seconds after the ping was sent. Notice that
the
two pings and the expired ping message all have the same ping sequence
number
(seqnum = 25098) as they all belong to the same poll cycle for the node
10.178.33.59.
This node's status is checked again 60 seconds later at 13:02:50 as
configured
(notice the seqnum is different as this is a new poll cycle). The last
timeout
value is used. In this case, the node is back up (the ping response is
received). I would expect that the next check at 13:03:50 would have a
timeout
= 3 and so on.
Remember that if node 10.178.33.59 had not responded to the ping at 13:02:50
within 4 seconds, you would have seen an expired message (no retries since
the
node was marked as down from a previous poll cycle).
Please note that I am not from netmon support and this is just my
understanding
based on what I have experienced and been told.
Regards,
Gareth
Gareth Holl
Software Engineer
Olympics Advocate
gholl@tivoli.com
Tivoli Systems / IBM Corporation
Research Triangle Park, NC 27703. 1-800-TIVOLI-8
don-n-darla@att.net on 09/02/2000 01:45:05 AM
Please respond to IBM NetView Discussion <nv-l@tkg.com>
To: nv-l@tkg.com
cc: (bcc: Gareth Holl/Tivoli Systems)
Subject: [NV-L] Netmon Timeout and Retry intervals
Whoa..... hold on there folks.
According to the last information that I received from
support, NetView 5.1 (and later) for Unix was changed,
it does NOT work as you think.
My information goes like this:
1. Ping the interface
2. Wait the configured wait interval
3. Ping the interface again
4. Add 1 second to the previous wait interval
5. Repeat steps 3-4 until last retry
6. Drop back to the configured wait interval
7. Mark interface Down
Example:
Timeout=2
Retry=3
Ping - Wait 2 - Ping - Wait 3 - Ping - Wait 2 - DOWN
This is the default setting that equates to 7 seconds.
Additionally: A retry of zero or one, pings once.
The retry count is actually the "try count".
I hope someone from netmon support can comment on this.
Don Davis
Alliance of Professionals and Consultants, Inc.
Tivoli Certified Instructor / Consultant
_________________________________________________________________________
NV-L List information and Archives: http://www.tkg.com/nv-l
|