nv-l
[Top] [All Lists]

Re: Node/interface UP/Down Reset on Match

To: nv-l@lists.tivoli.com
Subject: Re: Node/interface UP/Down Reset on Match
From: "Mull, John" <jmull@HERSHEYS.COM>
Date: Mon, 9 Nov 1998 11:01:15 -0500
Reply-to: Discussion of IBM NetView and POLYCENTER Manager on NetView <NV-L@UCSBVM.UCSB.EDU>
Sender: Discussion of IBM NetView and POLYCENTER Manager on NetView <NV-L@UCSBVM.UCSB.EDU>
Ray,

In our environment, I filter out all Node_Down's, but allow all
Interface Down's to flow across the tecad_nv6k to TEC.  We have netmon
polling every 3 minutes and I the retry is set to 5.  Timeout set @ 1.0.
 When TEC receives  an "UP" event for any host that had received a
"DOWN" event it closes out the event, and drops it from the console.
This rule correlation of an UP event closing a down event comes
automatically with TEC.   From here, I wrote a simple rule in TEC with a
timer that says if the Event Class "OV_IF_Down" is still open after 10
minutes.. fire off a Ticket to our Trouble Ticket System to the proper
2nd level support.

It works great.. There have been a few occasions where the "UP" event
never made it to trapd.log.  However, if an interface does not respond
after three polling intervals, we feel there is a problem.

Just another option for you to think about.

John Mull
Hershey Foods Corporation
Information Technology & Integration
Enterprise Systems Management
(717)534-7959
email:jmull@hersheys.com
>----------
>From:  James_Shanks@TIVOLI.COM[SMTP:James_Shanks@TIVOLI.COM]
>Sent:  Friday, November 06, 1998 4:13 PM
>Subject:       Re: Node/interface UP/Down Reset on Match
>
>40 retries?  That cannot be right.  You should not increase the retries
>like that.  It would mean that netmon would never be finished with the
>polling cycle for this device.  The retry count is how many times netmon
>should try the device before he considers it down.    With a high timeout,
>he would still be waiting on timeouts from one cycle when it is time to
>begin the next, which will lead to very starnge results.  Drop that back to
>where it was.    What you want is longer timeouts but few retries.
>
>There are sample rulsesets for Node Down/UP and Interface Down/UP.  Have
>you looked at those?
>
>James Shanks
>Tivoli (NetView for UNIX) L3 Support
>
>
>
>"Stoner, Raymond" <raymond.stoner@SPCORP.COM> on 11/06/98 03:38:54 PM
>
>Please respond to Discussion of IBM NetView and POLYCENTER Manager on
>      NetView <NV-L@UCSBVM.UCSB.EDU>
>
>To:   NV-L@UCSBVM.UCSB.EDU
>cc:    (bcc: James Shanks)
>Subject:  Re: Node/interface UP/Down Reset on Match
>
>
>
>
>
>I have changed and continue to increment the polling to these devices as
>you suggested maybe my values are NG. I currently have  (just for these
>specific devices) timeout at 30 retries at 40 and Polling interval every
>10 minutes. We started @ 8 5 and 5.  I'll do some netmon tracing on
>Monday.
>
>I probably do not have the rule structured properly. (NetView rookie)
>Not quite sure how to match up the events.
>
>-----Original Message-----
>From: James_Shanks@TIVOLI.COM [mailto:James_Shanks@TIVOLI.COM]
>Sent: Friday, November 06, 1998 2:49 PM
>To: NV-L@UCSBVM.UCSB.EDU
>Subject: Re: Node/interface UP/Down Reset on Match
>
>
>Normally, I would recommend you look at polling intervals and timeouts,
>since that controls what when netmon decides that an interface is down
>and
>sends the traps.  I would suggest a separate entry in the SNMP
>Configuration for these entries with a longer timeout.  If that's not
>working, perhaps you might try a netmon trace to see what is happening
>here.  If you need help with that, I'd call Support and ask for it.
>
>The ruleset issue is more puzzling to me, because in principle, this is
>just the sort of thing Pass/Reset-On-Match should do well.  The problem
>may
>be your timing however.  Ten seconds is way too fine an increment for
>the
>daemon to handle.  The heartbeat mechanism for checking the threshold is
>set at 15 seconds, so it would be impossible to get good results lower
>than
>that.    Why not have him hold it for a minute or two?  Then if there is
>going to be an UP event, you are sure not to miss it.
>
>James Shanks
>Tivoli (NetView for UNIX) L3 Support
>
>
>
>"Stoner, Raymond" <raymond.stoner@SPCORP.COM> on 11/06/98 02:02:14 PM
>
>Please respond to Discussion of IBM NetView and POLYCENTER Manager on
>      NetView <NV-L@UCSBVM.UCSB.EDU>
>
>To:   NV-L@UCSBVM.UCSB.EDU
>cc:    (bcc: James Shanks)
>Subject:  Node/interface UP/Down Reset on Match
>
>
>
>
>
>Sometimes we receiving a Node/Interface Down event and a second or two
>later he Node/Interface Up event is received, especially on our
>International links. I have tried to adjust the timeout and retry
>intervals for these nodes but this problem still occurs. I would like to
>hold the down messages for about ten seconds to see if the up message is
>received, if not then forward the down event on to our T/EC console. A
>ruleset using the Reset on Match might be the way to go, but I'm having
>trouble getting that to work. Any suggestions on dealing with the rule
>or this situation is greatly appreciated.
>
>We are running NetView V4.1 on AIX 4.1.5
>
>Raymond Stoner
>Technical Advisor
>Schering Plough Corporation
>1011 Morris Ave. Union NJ 07083-7120
>Phone : (908)-820-6268 Fax : (908)-820-6102
>email: raymond.stoner@spcorp.com
>iloviT
>
>

<Prev in Thread] Current Thread [Next in Thread>

Archive operated by Skills 1st Ltd

See also: The NetView Web