I recommend you either;
- Put MLM's out there if the WAN typology supports this or
- write an event ruleset to call a shell script which pings the node a few times
to make sure the node down trap is genuine before forwarding the trap.
Believe me, if you're getting invalid node down events due to network timeouts
etc, you won't beat it by fiddling with netmon. Run netmon lean and mean. Use
Unix to spwan off processes to take the load off netmon. You will just cripple
NetView and netmon by increasing retries and timeouts.
If you need more details, mail me
erik nilsson <erik@NETMAN.SE> on 11/02/99 22:41:59
Please respond to Discussion of IBM NetView and POLYCENTER Manager on NetView
cc: (bcc: Leonard I. Bocock/NZ/Unisys)
Subject: The polling process
When configuring the SNMP polling parameters we have decreased
the polling intervall to 2min with the same time out (2 sec)
and retry count (3) as default. Our network includes approx 500
interfaces on different routers.
On some links we have found that link down events appear altought
the link/interface is actually up (manual ping test). This can
occur when there is a timeout in the polling cycle because no icmp reply
whitin the time limits (slow link,hight util router, icmp low priority,
recalculating routertables etc).
Now we have increased the number off retries to 10 for some of
our routers to really be sure that the link is down when the event
is triggered (we actually start other processes to create enterprise
error messages to helpdesk etc)
Now, that seems to result in a very slow update time (10-15min) for
links/routers that comes up after a down state.
My question is about the polling process.
When increasing the retry count the time to flag the interface
'down' will of course increase. Does that affect (delay) the polling
frequency of the other nodes in the polling list ?
(is every poll a separate process not depending on the previous one)
If the answer is yes, that would seriously affect the polling cycle
and the time when a new state of an interface is detected.
If for example we have 10 down interfaces that would result in
10*10*2 sec delay wich will hold back the polling cycle for every
Is this correct ?
In that case one should really keep the retry count low and polling
interval at more than 2 min so that every interface can be checked
whitin the polling interval.
Have I got this wrong or right ?
Any recommendations ?
(BTW, AIX 4.2.1 Netview 5.1)
Erik Nilsson (email@example.com)
Network Management tcpip AB
Archive operated by Skills 1st Ltd
See also: The NetView Web