nv-l
[Top] [All Lists]

Re: [nv-l] Ruleset Lag Time

To: nv-l@lists.tivoli.com
Subject: Re: [nv-l] Ruleset Lag Time
From: James Shanks <jshanks@us.ibm.com>
Date: Fri, 21 Mar 2003 15:30:01 -0500
Delivered-to: mailing list nv-l@lists.tivoli.com
Delivery-date: Fri, 21 Mar 2003 20:41:00 +0000
Envelope-to: nv-l-archive@lists.skills-1st.co.uk
List-help: <mailto:nv-l-help@lists.tivoli.com>
List-post: <mailto:nv-l@lists.tivoli.com>
List-subscribe: <mailto:nv-l-subscribe@lists.tivoli.com>
List-unsubscribe: <mailto:nv-l-unsubscribe@lists.tivoli.com>
Mailing-list: contact nv-l-help@lists.tivoli.com; run by ezmlm
Yes, indeed, daemons can get behind, and if you get a trap storm, then the 
trapd processing daemons such as trapd and nvcorrd can literally get hours 
behind.  Remember that the start of each ruleset is Event Stream -- the 
flow of events from trapd -- and nvcorrd sees them all -- including those 
marked "Log Only" or "Don't Log or Display".   If you suddenly double the 
usual flow, it will take nvcorrd longer to weed out the event he doesn't 
want from those he does.

And since there is only one single-threaded nvcorrd daemon,  if he gets 
bogged down in one ruleset, then that can affect all rulesets.

Also, event windows add to the burden, since each of those represent 
another ruleset being run by nvcorrd.  Ditto for web clients running 
events.

Based on what I have seen, 14 rulesets run out of ESE.automation makes you 
one of the top ten per cent of users of this facility, so it would stand 
to reason that if circumstances change you could experience performance 
problems others don't.   The rulesets may not have changed but the 
environment in which they operate probably has.  So what kind of changes 
have you seen in the three years since you started running these rulesets? 
 Bigger network?  Are you getting more events?  Did you start configuring 
all your Cisco routers to send Cisco syslog events or create other verbose 
resources?  Do you have more users running event windows?  Do your 
rulesets involve smartset queries?  Then the size of your database matters 
too. 

Some years ago I wrote a short paper on suggestions for ruleset design, 
and I have posted it to the list just about every year since.   Have you 
seen it?  If not, then I can do that again.  Perhaps it will help.

But ultimately, we are talking about performance tuning.  And since there 
is no way you can alter how much time nvcorrd needs to process a 
particular ruleset node, you will be looking for ways to reduce the number 
of events he looks at, or at rewriting your ruleset to be more efficient. 
You may need to be mindful of how many traps you are getting and perhaps 
should take steps to reduce those you don't care about, to help speed up 
the processing for those you do.   One popular way to do that is to put an 
MLM between your network and trapd, letting the MLM screen out those traps 
you don't want and only passing along those you do. 

And you might want to take a look at how long it takes to process your 
traps through the rulesets even when things are running good, so that you 
can find bottlenecks when they aren't. The only way you can see what is 
going on is to start running the nvcorrd trace (nvcdebug -d all) and start 
examining the nvcorrd logs (alog and blog -- it toggles every 10,000 
lines) to see how long it is taking for your rulesets to process. 

And by the way, 6.0.3 is out of support everywhere but Asia, and they end 
in April.  And their support for 6.0.3 is basically limited to 
language-related issues.  I cannot say off hand whether there are any 
serious performance issues resolved by any of the fixes we have made to 
trapd since 6.0.3 shipped, but there have been fixes.  I'd suggest you 
start planning for a migration, if you haven't already.  You might want to 
do that even before you get very serious about figuring out why the 
performance sometimes suffers, since then you can get Support in the boat 
with you. 

Hope this helps


James Shanks
Level 3 Support  for Tivoli NetView for UNIX and NT
Tivoli Software / IBM Software Group




"Bursik, Scott {PBSG}" <Scott.Bursik@pbsg.com>
03/21/2003 02:50 PM

 
        To:     nv-l@lists.tivoli.com
        cc: 
        Subject:        [nv-l] Ruleset Lag Time



NetView 6.0.3 AIX 4.3.3


I have about 14 different rulsets starting in ESE.automation. I have been
running this configuration for almost 3 years. In several of these 
rulesets
I have timers that hold a node down event for 15 minutes and send it if 
the
node up doesn't come before the timer expires. It was recently brought to 
my
attention that we are getting events quite some time after the actual 
event
occurs. Here is an example:

In the trapd.log I see that hostx had a Node Down event at 20:35:06 and 
the
Node Up event is 21:25:43

In my ruleset I am grabbing the node down event and holding it for 15
minutes and then issuing a logger -p command to send the trap attributes 
to
the syslog daemon.

In the syslog the timestamp of the node down event is 23:44:28 and the 
node
up is 00:30:21

I have been checking other events in other rulesets as well and I have 
seen
similar symptoms but the time lag seems to vary. Can the daemons get that
far behind?

Has anyone seen this sort of behavior happen before? 

Scott Bursik
Pepsico Business Solutions Group
Event Systems Management
972-334-3757
scott.bursik@pbsg.com
 


---------------------------------------------------------------------
To unsubscribe, e-mail: nv-l-unsubscribe@lists.tivoli.com
For additional commands, e-mail: nv-l-help@lists.tivoli.com

*NOTE*
This is not an Offical Tivoli Support forum. If you need immediate
assistance from Tivoli please call the IBM Tivoli Software Group
help line at 1-800-TIVOLI8(848-6548)





---------------------------------------------------------------------
To unsubscribe, e-mail: nv-l-unsubscribe@lists.tivoli.com
For additional commands, e-mail: nv-l-help@lists.tivoli.com

*NOTE*
This is not an Offical Tivoli Support forum. If you need immediate
assistance from Tivoli please call the IBM Tivoli Software Group
help line at 1-800-TIVOLI8(848-6548)


<Prev in Thread] Current Thread [Next in Thread>

Archive operated by Skills 1st Ltd

See also: The NetView Web