(Sigh..)
The approach I have been trying lately is to ignore Unreachable. That is
the idea of RFI,
after all. Act on Router Down, a root-cause event. I clear it with any of
the following:
Router Up, Node Up, Interface Up for the same node.
On core switches, act on Interface Down, cleared only by matching Interface
Up.
The one wrinkle I have to watch out for: If the core switch shuts down its
interface when
the remote router fails, as some devices do on some serials, then Netview
will decide
that the serial network is unreachable and the remote router is
unreachable, rather
than Down, and act as if the switch interface is the root cause. How could
it know, right?
In that case I do have to deal with Router Unreachable, but only from the
first ring of
routers. Router Unreachable is cleared by any of these: router up, node up,
interface up.
And of course plain old Node Down might have to be dealt with on occasion.
Cordially,
Leslie A. Clark
IBM Global Services - Systems Mgmt & Networking
Detroit
"Barr, Scott"
<Scott_Barr@csgsy To:
<nv-l@lists.tivoli.com>
stems.com> cc:
Subject: [nv-l] Router Down vs.
Router Unreachable Traps
12/04/2002 05:08
PM
Netview v7.1.3 / Solaris 2.8
I am having problems with router down vs. router unreachable traps. The
problem is that sometimes you get one, sometimes you get the other and
sometimes you get both. I am wondering if someone can interpret how I can
employ a ruleset that processes these traps when I can't predict which ones
I recieve. Here is the rule set logic:
1. default trap stream / block
2. Trap settings - one path to router down (or unreach) one path to router
up
3. Query smartset (is this a production router)
4. Reset on match / pass on match with 5 minute timer (only act if the
router is down 5 minutes or more)
5. Kick paging / notification script if it is more than 5 minutes
Here are the current netmon parameters:
/usr/OV/bin/netmon -P -S -s/usr/OV/conf/netmon.seed -V -u -h -K1
The issue is the pass on match and reset on match logic. Since the router
up is paired with a router down / and / or router unreachable how can I
avoid leaving one of them in the unmatched trap queue? Here are some sample
trapd.log entries (all from the same router by the way)
1038917521 3 Tue Dec 03 06:12:01 2002 RouterName N Router Down.
1038917829 3 Tue Dec 03 06:17:09 2002 RouterName N Router Up.
Down / Up - no issue
1038935945 3 Tue Dec 03 11:19:05 2002 RouterName N Router RouterName
Unreachable.
1038936280 3 Tue Dec 03 11:24:40 2002 RouterName N Router Up.
Unreachable / Up - no issue
1038936527 3 Tue Dec 03 11:28:47 2002 RouterName N Router RouterName
Unreachable.
1038936839 3 Tue Dec 03 11:33:59 2002 RouterName N Router Down.
1038937473 3 Tue Dec 03 11:44:33 2002 RouterName N Router Up.
Unreachable / Down / Up - Now Down is left unmatched for reset on match
processing
1038937800 3 Tue Dec 03 11:50:00 2002 RouterName N Router RouterName
Unreachable.
1038938124 3 Tue Dec 03 11:55:24 2002 RouterName N Router Down.
1038938416 3 Tue Dec 03 12:00:16 2002 RouterName N Router RouterName
Unreachable.
1038938738 3 Tue Dec 03 12:05:38 2002 RouterName N Router Up.
Unreachable / Down / Unreachable / Up - not even sure how NetView came to
this conclusion
1038999753 3 Wed Dec 04 05:02:33 2002 RouterName N Router Down.
1039000058 3 Wed Dec 04 05:07:38 2002 RouterName N Router Up.
Down / Up - no issue
I am not entirely sure what the impact here is. I know as a result of this,
there will be a trap unmatched forever. I also know that if the router goes
marginal and then we get another router up trap, it will match the extra
one still left in the queue. This seems like a netmon design problem to me.
I just can't seem to get the exact logic behind the use of unreachable vs.
the use of router down.
Scott Barr
Network Systems Engineer
CSG Systems
Phone: 402-431-7939
Fax: 402-431-7413
Email: Scott_Barr@csgsystems.com
|