James is right. You need to be careful when increasing retries.
Timeout/retries are not unique to the NetView application, but will simply
control what happens at the lower TCP/IP layers in the protocol stack. The
most common implementation on all platforms is to double the timeout value
for every retry which is exactly what AIX does. A timeout/retry combination
of 30/40 will result in total timeout of a million years!! (try the math
yourself: take 2 to the 40th power times 30 seconds). You need to be
especially careful when increasing retries, but you should be careful with
the timeout value, too. For example, changing the timeout from 1 to 10
seconds with a retries of 5 means you increased the total timeout from 63
seconds to 630 seconds.
We use a global default of 5.0 second timeout and 3 retries. For resources
that need a longer timeout we use 9.0 seconds and 4 retries.
-----Original Message-----
From: James_Shanks@TIVOLI.COM [SMTP:James_Shanks@TIVOLI.COM]
Sent: Friday, November 06, 1998 15:14
To: NV-L@UCSBVM.UCSB.EDU
Subject: Re: Node/interface UP/Down Reset on Match
40 retries? That cannot be right. You should not increase the
retries
like that. It would mean that netmon would never be finished with
the
polling cycle for this device. The retry count is how many times
netmon
should try the device before he considers it down. With a high
timeout,
he would still be waiting on timeouts from one cycle when it is time
to
begin the next, which will lead to very starnge results. Drop that
back to
where it was. What you want is longer timeouts but few retries.
There are sample rulsesets for Node Down/UP and Interface Down/UP.
Have
you looked at those?
James Shanks
Tivoli (NetView for UNIX) L3 Support
"Stoner, Raymond" <raymond.stoner@SPCORP.COM> on 11/06/98 03:38:54
PM
Please respond to Discussion of IBM NetView and POLYCENTER Manager
on
NetView <NV-L@UCSBVM.UCSB.EDU>
To: NV-L@UCSBVM.UCSB.EDU
cc: (bcc: James Shanks)
Subject: Re: Node/interface UP/Down Reset on Match
I have changed and continue to increment the polling to these
devices as
you suggested maybe my values are NG. I currently have (just for
these
specific devices) timeout at 30 retries at 40 and Polling interval
every
10 minutes. We started @ 8 5 and 5. I'll do some netmon tracing on
Monday.
I probably do not have the rule structured properly. (NetView
rookie)
Not quite sure how to match up the events.
-----Original Message-----
From: James_Shanks@TIVOLI.COM [mailto:James_Shanks@TIVOLI.COM]
Sent: Friday, November 06, 1998 2:49 PM
To: NV-L@UCSBVM.UCSB.EDU
Subject: Re: Node/interface UP/Down Reset on Match
Normally, I would recommend you look at polling intervals and
timeouts,
since that controls what when netmon decides that an interface is
down
and
sends the traps. I would suggest a separate entry in the SNMP
Configuration for these entries with a longer timeout. If that's
not
working, perhaps you might try a netmon trace to see what is
happening
here. If you need help with that, I'd call Support and ask for it.
The ruleset issue is more puzzling to me, because in principle, this
is
just the sort of thing Pass/Reset-On-Match should do well. The
problem
may
be your timing however. Ten seconds is way too fine an increment
for
the
daemon to handle. The heartbeat mechanism for checking the
threshold is
set at 15 seconds, so it would be impossible to get good results
lower
than
that. Why not have him hold it for a minute or two? Then if
there is
going to be an UP event, you are sure not to miss it.
James Shanks
Tivoli (NetView for UNIX) L3 Support
"Stoner, Raymond" <raymond.stoner@SPCORP.COM> on 11/06/98 02:02:14
PM
Please respond to Discussion of IBM NetView and POLYCENTER Manager
on
NetView <NV-L@UCSBVM.UCSB.EDU>
To: NV-L@UCSBVM.UCSB.EDU
cc: (bcc: James Shanks)
Subject: Node/interface UP/Down Reset on Match
Sometimes we receiving a Node/Interface Down event and a second or
two
later he Node/Interface Up event is received, especially on our
International links. I have tried to adjust the timeout and retry
intervals for these nodes but this problem still occurs. I would
like to
hold the down messages for about ten seconds to see if the up
message is
received, if not then forward the down event on to our T/EC console.
A
ruleset using the Reset on Match might be the way to go, but I'm
having
trouble getting that to work. Any suggestions on dealing with the
rule
or this situation is greatly appreciated.
We are running NetView V4.1 on AIX 4.1.5
Raymond Stoner
Technical Advisor
Schering Plough Corporation
1011 Morris Ave. Union NJ 07083-7120
Phone : (908)-820-6268 Fax : (908)-820-6102
email: raymond.stoner@spcorp.com
iloviT
|