nv-l
[Top] [All Lists]

Re: Node/interface UP/Down Reset on Match

To: nv-l@lists.tivoli.com
Subject: Re: Node/interface UP/Down Reset on Match
From: "Joel A. Gerber" <joel.gerber@USAA.COM>
Date: Mon, 9 Nov 1998 13:42:24 -0600
Reply-to: Discussion of IBM NetView and POLYCENTER Manager on NetView <NV-L@UCSBVM.UCSB.EDU>
Sender: Discussion of IBM NetView and POLYCENTER Manager on NetView <NV-L@UCSBVM.UCSB.EDU>
James is right.  You need to be careful when increasing retries.
Timeout/retries are not unique to the NetView application, but will simply
control what happens at the lower TCP/IP layers in the protocol stack.  The
most common implementation on all platforms is to double the timeout value
for every retry which is exactly what AIX does.  A timeout/retry combination
of 30/40 will result in total timeout of a million years!! (try the math
yourself: take 2 to the 40th power times 30 seconds).  You need to be
especially careful when increasing retries, but you should be careful with
the timeout value, too.  For example, changing the timeout from 1 to 10
seconds with a retries of 5 means you increased the total timeout from 63
seconds to 630 seconds.

We use a global default of 5.0 second timeout and 3 retries.  For resources
that need a longer timeout we use 9.0 seconds and 4 retries.

        -----Original Message-----
        From:   James_Shanks@TIVOLI.COM [SMTP:James_Shanks@TIVOLI.COM]
        Sent:   Friday, November 06, 1998 15:14
        To:     NV-L@UCSBVM.UCSB.EDU
        Subject:        Re: Node/interface UP/Down Reset on Match

        40 retries?  That cannot be right.  You should not increase the
retries
        like that.  It would mean that netmon would never be finished with
the
        polling cycle for this device.  The retry count is how many times
netmon
        should try the device before he considers it down.    With a high
timeout,
        he would still be waiting on timeouts from one cycle when it is time
to
        begin the next, which will lead to very starnge results.  Drop that
back to
        where it was.    What you want is longer timeouts but few retries.

        There are sample rulsesets for Node Down/UP and Interface Down/UP.
Have
        you looked at those?

        James Shanks
        Tivoli (NetView for UNIX) L3 Support



        "Stoner, Raymond" <raymond.stoner@SPCORP.COM> on 11/06/98 03:38:54
PM

        Please respond to Discussion of IBM NetView and POLYCENTER Manager
on
              NetView <NV-L@UCSBVM.UCSB.EDU>

        To:   NV-L@UCSBVM.UCSB.EDU
        cc:    (bcc: James Shanks)
        Subject:  Re: Node/interface UP/Down Reset on Match





        I have changed and continue to increment the polling to these
devices as
        you suggested maybe my values are NG. I currently have  (just for
these
        specific devices) timeout at 30 retries at 40 and Polling interval
every
        10 minutes. We started @ 8 5 and 5.  I'll do some netmon tracing on
        Monday.

        I probably do not have the rule structured properly. (NetView
rookie)
        Not quite sure how to match up the events.

        -----Original Message-----
        From: James_Shanks@TIVOLI.COM [mailto:James_Shanks@TIVOLI.COM]
        Sent: Friday, November 06, 1998 2:49 PM
        To: NV-L@UCSBVM.UCSB.EDU
        Subject: Re: Node/interface UP/Down Reset on Match


        Normally, I would recommend you look at polling intervals and
timeouts,
        since that controls what when netmon decides that an interface is
down
        and
        sends the traps.  I would suggest a separate entry in the SNMP
        Configuration for these entries with a longer timeout.  If that's
not
        working, perhaps you might try a netmon trace to see what is
happening
        here.  If you need help with that, I'd call Support and ask for it.

        The ruleset issue is more puzzling to me, because in principle, this
is
        just the sort of thing Pass/Reset-On-Match should do well.  The
problem
        may
        be your timing however.  Ten seconds is way too fine an increment
for
        the
        daemon to handle.  The heartbeat mechanism for checking the
threshold is
        set at 15 seconds, so it would be impossible to get good results
lower
        than
        that.    Why not have him hold it for a minute or two?  Then if
there is
        going to be an UP event, you are sure not to miss it.

        James Shanks
        Tivoli (NetView for UNIX) L3 Support



        "Stoner, Raymond" <raymond.stoner@SPCORP.COM> on 11/06/98 02:02:14
PM

        Please respond to Discussion of IBM NetView and POLYCENTER Manager
on
              NetView <NV-L@UCSBVM.UCSB.EDU>

        To:   NV-L@UCSBVM.UCSB.EDU
        cc:    (bcc: James Shanks)
        Subject:  Node/interface UP/Down Reset on Match





        Sometimes we receiving a Node/Interface Down event and a second or
two
        later he Node/Interface Up event is received, especially on our
        International links. I have tried to adjust the timeout and retry
        intervals for these nodes but this problem still occurs. I would
like to
        hold the down messages for about ten seconds to see if the up
message is
        received, if not then forward the down event on to our T/EC console.
A
        ruleset using the Reset on Match might be the way to go, but I'm
having
        trouble getting that to work. Any suggestions on dealing with the
rule
        or this situation is greatly appreciated.

        We are running NetView V4.1 on AIX 4.1.5

        Raymond Stoner
        Technical Advisor
        Schering Plough Corporation
        1011 Morris Ave. Union NJ 07083-7120
        Phone : (908)-820-6268 Fax : (908)-820-6102
        email: raymond.stoner@spcorp.com
        iloviT

<Prev in Thread] Current Thread [Next in Thread>

Archive operated by Skills 1st Ltd

See also: The NetView Web