nv-l
[Top] [All Lists]

Re: Trapd questions

To: nv-l@lists.tivoli.com
Subject: Re: Trapd questions
From: James Shanks <James_Shanks@TIVOLI.COM>
Date: Tue, 18 May 1999 12:20:51 -0400
Reply-to: Discussion of IBM NetView and POLYCENTER Manager on NetView <NV-L@UCSBVM.UCSB.EDU>
Sender: Discussion of IBM NetView and POLYCENTER Manager on NetView <NV-L@UCSBVM.UCSB.EDU>
Hmmm.  What exactly is your application queue size?  10,000?   25,000?

If trapd goes down, all those others will go too.  Is that what happened?
Did trapd core or what?   Who died first?


Basically the application queue size is a mechanism for people to use when
they have configured their agents to send more traps more frequently than
the daemons can usually handle.   So adjusting this is how they can be kept
up, at the cost of a lot more storage and slower performance.  The boys and
girls on the Tivoli performance team were able to handle 100 traps/sec for
a few hours, but they had to boost the appl queue size to 35,000 and it
took NetView many more hours to recover and process all those traps. But
they didn't lose any daemons.

So I have to ask.  What exactly is the point of getting so many traps?  Can
not these Cisco agents be configured to send one or two instead of dozens
per minute?  Or is that what they did, but you have 40,000 Cisco devices
sending them at one time?  Why be so verbose?  You cannot be helping your
outage by flooding what is left of the network with traps.

Personally, in my view (of course I'm the management vendor) the only traps
that should be sent to NetView are ones you intend to do something about.
And one is enough.  Couldn't you get one trap from the FEP or a few from
key routers and stifle the rest?  Lots of folks implement a tiered
solution, where routers in one tier send one kind of trap and others do
not.

After all, it's just one UNIX box receiving all that stuff.

Just my two cents.

James Shanks
Tivoli (NetView for UNIX) L3 Support



Art DeBuigny <debuigny@DALLAS.NET> on 05/18/99 11:09:59 AM

Please respond to Discussion of IBM NetView and POLYCENTER Manager on
      NetView <NV-L@UCSBVM.UCSB.EDU>

To:   NV-L@UCSBVM.UCSB.EDU
cc:    (bcc: James Shanks/Tivoli Systems)
Subject:  Trapd questions





On occasion, we have been getting traps from Cisco routers when the state
of the DLSW connection resets, in this case due to a reset at the FEP.

Recently, due to a major outage, we started getting these traps from every
single router on the network.  It crashed netmon, ovtopmd, trapd, and even
ovactiond.

I've tried setting the event customization to 'Do not log or display' but
that didn't seem to help.  The situation only stablizes once all the
routers DLSW connections have been restored, and traps are no longer
flooding into the netview machine.

Since this can always happen again in the event of an outage, can anyone
think of a way to 'protect' NetView's daemons from such a flood without
actually stopping the trap at the source?  I've tried adjusting the
connected applications queue size, but that apparently wasn't enough.

Thanks

Art DeBuigny
debuigny@dallas.net
Bank of America Network Operations
On occasion, we have been getting traps from Cisco routers when the state of the DLSW connection resets, in this case due to a reset at the FEP.
 
Recently, due to a major outage, we started getting these traps from every single router on the network.  It crashed netmon, ovtopmd, trapd, and even ovactiond.
 
I've tried setting the event customization to 'Do not log or display' but that didn't seem to help.  The situation only stablizes once all the routers DLSW connections have been restored, and traps are no longer flooding into the netview machine.
 
Since this can always happen again in the event of an outage, can anyone think of a way to 'protect' NetView's daemons from such a flood without actually stopping the trap at the source?  I've tried adjusting the connected applications queue size, but that apparently wasn't enough.
 
Thanks
 
Art DeBuigny
Bank of America Network Operations
<Prev in Thread] Current Thread [Next in Thread>

Archive operated by Skills 1st Ltd

See also: The NetView Web