The polling interval should be set to whatever meets your monitoring
requirements (and capabilities of the platform). If polling the
international links every 10 minutes (instead of 5) is often enough, or not
too often because you want to minimize management traffic on the WAN links,
that is your call. The timeout/retries values need to be less than your
polling interval. For example, a timeout/retries of 9/4 will give you a
total timeout of (9+18+36+72+144) 279 seconds which is a bit less than 5
minutes. You could even go back to a polling interval of 5 minutes if you
wanted.
There is another thing to be aware of when "tuning" the timeout/retries
values. The Interface Down and Up traps that result from the netmon polling
are also affected. The values above of 9/4 means that the netmon poll will
take 279 seconds to time out. This means that the resulting Interface Down
trap will be delayed at least 279 seconds after the interface actually "went
down" (could not be pinged). There can be an additional delay of whatever
the polling interval is, which in your case is 10 minutes. In other words,
if the interface went down right after a successful poll, then it will be
another 10 minutes plus the 279 second timeout until you get the Interface
Down trap.
-----Original Message-----
From: Stoner, Raymond [SMTP:raymond.stoner@SPCORP.COM]
Sent: Monday, November 09, 1998 14:38
To: NV-L@UCSBVM.UCSB.EDU
Subject: Re: Node/interface UP/Down Reset on Match
Joel & James, Thanks, I did not realize that would happen. So I have
adjusted our International Links to 9 & 4. Our international links
seem
to be behaving better. Our polling interval for these links is at 10
minutes, global default is 5, should I leave this as is?
-----Original Message-----
From: Joel A. Gerber [mailto:joel.gerber@usaa.com]
Sent: Monday, November 09, 1998 2:42 PM
To: NV-L@UCSBVM.UCSB.EDU
Subject: Re: Node/interface UP/Down Reset on Match
James is right. You need to be careful when increasing retries.
Timeout/retries are not unique to the NetView application, but will
simply
control what happens at the lower TCP/IP layers in the protocol
stack.
The
most common implementation on all platforms is to double the timeout
value
for every retry which is exactly what AIX does. A timeout/retry
combination
of 30/40 will result in total timeout of a million years!! (try the
math
yourself: take 2 to the 40th power times 30 seconds). You need to
be
especially careful when increasing retries, but you should be
careful
with
the timeout value, too. For example, changing the timeout from 1 to
10
seconds with a retries of 5 means you increased the total timeout
from
63
seconds to 630 seconds.
We use a global default of 5.0 second timeout and 3 retries. For
resources
that need a longer timeout we use 9.0 seconds and 4 retries.
-----Original Message-----
From: James_Shanks@TIVOLI.COM
[SMTP:James_Shanks@TIVOLI.COM]
Sent: Friday, November 06, 1998 15:14
To: NV-L@UCSBVM.UCSB.EDU
Subject: Re: Node/interface UP/Down Reset on Match
40 retries? That cannot be right. You should not increase
the
retries
like that. It would mean that netmon would never be
finished
with
the
polling cycle for this device. The retry count is how many
times
netmon
should try the device before he considers it down. With a
high
timeout,
he would still be waiting on timeouts from one cycle when it
is
time
to
begin the next, which will lead to very starnge results.
Drop
that
back to
where it was. What you want is longer timeouts but few
retries.
There are sample rulsesets for Node Down/UP and Interface
Down/UP.
Have
you looked at those?
James Shanks
Tivoli (NetView for UNIX) L3 Support
"Stoner, Raymond" <raymond.stoner@SPCORP.COM> on 11/06/98
03:38:54
PM
Please respond to Discussion of IBM NetView and POLYCENTER
Manager
on
NetView <NV-L@UCSBVM.UCSB.EDU>
To: NV-L@UCSBVM.UCSB.EDU
cc: (bcc: James Shanks)
Subject: Re: Node/interface UP/Down Reset on Match
I have changed and continue to increment the polling to
these
devices as
you suggested maybe my values are NG. I currently have
(just
for
these
specific devices) timeout at 30 retries at 40 and Polling
interval
every
10 minutes. We started @ 8 5 and 5. I'll do some netmon
tracing
on
Monday.
I probably do not have the rule structured properly.
(NetView
rookie)
Not quite sure how to match up the events.
-----Original Message-----
From: James_Shanks@TIVOLI.COM
[mailto:James_Shanks@TIVOLI.COM]
Sent: Friday, November 06, 1998 2:49 PM
To: NV-L@UCSBVM.UCSB.EDU
Subject: Re: Node/interface UP/Down Reset on Match
Normally, I would recommend you look at polling intervals
and
timeouts,
since that controls what when netmon decides that an
interface
is
down
and
sends the traps. I would suggest a separate entry in the
SNMP
Configuration for these entries with a longer timeout. If
that's
not
working, perhaps you might try a netmon trace to see what is
happening
here. If you need help with that, I'd call Support and ask
for
it.
The ruleset issue is more puzzling to me, because in
principle,
this
is
just the sort of thing Pass/Reset-On-Match should do well.
The
problem
may
be your timing however. Ten seconds is way too fine an
increment
for
the
daemon to handle. The heartbeat mechanism for checking the
threshold is
set at 15 seconds, so it would be impossible to get good
results
lower
than
that. Why not have him hold it for a minute or two? Then
if
there is
going to be an UP event, you are sure not to miss it.
James Shanks
Tivoli (NetView for UNIX) L3 Support
"Stoner, Raymond" <raymond.stoner@SPCORP.COM> on 11/06/98
02:02:14
PM
Please respond to Discussion of IBM NetView and POLYCENTER
Manager
on
NetView <NV-L@UCSBVM.UCSB.EDU>
To: NV-L@UCSBVM.UCSB.EDU
cc: (bcc: James Shanks)
Subject: Node/interface UP/Down Reset on Match
Sometimes we receiving a Node/Interface Down event and a
second
or
two
later he Node/Interface Up event is received, especially on
our
International links. I have tried to adjust the timeout and
retry
intervals for these nodes but this problem still occurs. I
would
like to
hold the down messages for about ten seconds to see if the
up
message is
received, if not then forward the down event on to our T/EC
console.
A
ruleset using the Reset on Match might be the way to go, but
I'm
having
trouble getting that to work. Any suggestions on dealing
with
the
rule
or this situation is greatly appreciated.
We are running NetView V4.1 on AIX 4.1.5
Raymond Stoner
Technical Advisor
Schering Plough Corporation
1011 Morris Ave. Union NJ 07083-7120
Phone : (908)-820-6268 Fax : (908)-820-6102
email: raymond.stoner@spcorp.com
iloviT
|