James,
Thank you for your suggestions James. I would like to see the document you
reference. Our network has grown tremendously in the past year and the
number of traps has grown as well. I thing the MLM idea, along with the
other suggestions you make are a great idea. I will be getting off of this
version of NetView in the very recent future so we will see what happens
after that. We recently underwent a hardware migration to a bigger better
box and now I am free to upgrade.
Group - Thanks for all of your suggestions.
Scott Bursik
Pepsico Business Solutions Group
Event Systems Management
972-334-3757
scott.bursik@pbsg.com
-----Original Message-----
From: James Shanks [mailto:jshanks@us.ibm.com]
Sent: Friday, March 21, 2003 2:30 PM
To: nv-l@lists.tivoli.com
Subject: Re: [nv-l] Ruleset Lag Time
Yes, indeed, daemons can get behind, and if you get a trap storm, then the
trapd processing daemons such as trapd and nvcorrd can literally get hours
behind. Remember that the start of each ruleset is Event Stream -- the
flow of events from trapd -- and nvcorrd sees them all -- including those
marked "Log Only" or "Don't Log or Display". If you suddenly double the
usual flow, it will take nvcorrd longer to weed out the event he doesn't
want from those he does.
And since there is only one single-threaded nvcorrd daemon, if he gets
bogged down in one ruleset, then that can affect all rulesets.
Also, event windows add to the burden, since each of those represent
another ruleset being run by nvcorrd. Ditto for web clients running
events.
Based on what I have seen, 14 rulesets run out of ESE.automation makes you
one of the top ten per cent of users of this facility, so it would stand
to reason that if circumstances change you could experience performance
problems others don't. The rulesets may not have changed but the
environment in which they operate probably has. So what kind of changes
have you seen in the three years since you started running these rulesets?
Bigger network? Are you getting more events? Did you start configuring
all your Cisco routers to send Cisco syslog events or create other verbose
resources? Do you have more users running event windows? Do your
rulesets involve smartset queries? Then the size of your database matters
too.
Some years ago I wrote a short paper on suggestions for ruleset design,
and I have posted it to the list just about every year since. Have you
seen it? If not, then I can do that again. Perhaps it will help.
But ultimately, we are talking about performance tuning. And since there
is no way you can alter how much time nvcorrd needs to process a
particular ruleset node, you will be looking for ways to reduce the number
of events he looks at, or at rewriting your ruleset to be more efficient.
You may need to be mindful of how many traps you are getting and perhaps
should take steps to reduce those you don't care about, to help speed up
the processing for those you do. One popular way to do that is to put an
MLM between your network and trapd, letting the MLM screen out those traps
you don't want and only passing along those you do.
And you might want to take a look at how long it takes to process your
traps through the rulesets even when things are running good, so that you
can find bottlenecks when they aren't. The only way you can see what is
going on is to start running the nvcorrd trace (nvcdebug -d all) and start
examining the nvcorrd logs (alog and blog -- it toggles every 10,000
lines) to see how long it is taking for your rulesets to process.
And by the way, 6.0.3 is out of support everywhere but Asia, and they end
in April. And their support for 6.0.3 is basically limited to
language-related issues. I cannot say off hand whether there are any
serious performance issues resolved by any of the fixes we have made to
trapd since 6.0.3 shipped, but there have been fixes. I'd suggest you
start planning for a migration, if you haven't already. You might want to
do that even before you get very serious about figuring out why the
performance sometimes suffers, since then you can get Support in the boat
with you.
Hope this helps
James Shanks
Level 3 Support for Tivoli NetView for UNIX and NT
Tivoli Software / IBM Software Group
"Bursik, Scott {PBSG}" <Scott.Bursik@pbsg.com>
03/21/2003 02:50 PM
To: nv-l@lists.tivoli.com
cc:
Subject: [nv-l] Ruleset Lag Time
NetView 6.0.3 AIX 4.3.3
I have about 14 different rulsets starting in ESE.automation. I have been
running this configuration for almost 3 years. In several of these
rulesets
I have timers that hold a node down event for 15 minutes and send it if
the
node up doesn't come before the timer expires. It was recently brought to
my
attention that we are getting events quite some time after the actual
event
occurs. Here is an example:
In the trapd.log I see that hostx had a Node Down event at 20:35:06 and
the
Node Up event is 21:25:43
In my ruleset I am grabbing the node down event and holding it for 15
minutes and then issuing a logger -p command to send the trap attributes
to
the syslog daemon.
In the syslog the timestamp of the node down event is 23:44:28 and the
node
up is 00:30:21
I have been checking other events in other rulesets as well and I have
seen
similar symptoms but the time lag seems to vary. Can the daemons get that
far behind?
Has anyone seen this sort of behavior happen before?
Scott Bursik
Pepsico Business Solutions Group
Event Systems Management
972-334-3757
scott.bursik@pbsg.com
---------------------------------------------------------------------
To unsubscribe, e-mail: nv-l-unsubscribe@lists.tivoli.com
For additional commands, e-mail: nv-l-help@lists.tivoli.com
*NOTE*
This is not an Offical Tivoli Support forum. If you need immediate
assistance from Tivoli please call the IBM Tivoli Software Group
help line at 1-800-TIVOLI8(848-6548)
---------------------------------------------------------------------
To unsubscribe, e-mail: nv-l-unsubscribe@lists.tivoli.com
For additional commands, e-mail: nv-l-help@lists.tivoli.com
*NOTE*
This is not an Offical Tivoli Support forum. If you need immediate
assistance from Tivoli please call the IBM Tivoli Software Group
help line at 1-800-TIVOLI8(848-6548)
---------------------------------------------------------------------
To unsubscribe, e-mail: nv-l-unsubscribe@lists.tivoli.com
For additional commands, e-mail: nv-l-help@lists.tivoli.com
*NOTE*
This is not an Offical Tivoli Support forum. If you need immediate
assistance from Tivoli please call the IBM Tivoli Software Group
help line at 1-800-TIVOLI8(848-6548)
|