nv-l
[Top] [All Lists]

Re: [nv-l] Trap Processing Performance Issue

To: nv-l@lists.tivoli.com
Subject: Re: [nv-l] Trap Processing Performance Issue
From: James Shanks <jshanks@us.ibm.com>
Date: Wed, 7 May 2003 15:22:17 -0400
Delivered-to: mailing list nv-l@lists.tivoli.com
Delivery-date: Wed, 07 May 2003 20:24:24 +0100
Envelope-to: nv-l-archive@lists.skills-1st.co.uk
List-help: <mailto:nv-l-help@lists.tivoli.com>
List-post: <mailto:nv-l@lists.tivoli.com>
List-subscribe: <mailto:nv-l-subscribe@lists.tivoli.com>
List-unsubscribe: <mailto:nv-l-unsubscribe@lists.tivoli.com>
Mailing-list: contact nv-l-help@lists.tivoli.com; run by ezmlm
A few observations . . . 
Does your netstat show anything queued for trapd or nvcorrd? If those 
queues aren't zero, then you have  a backup there.

Are you running a trapd.trace?  Turn on the "hex dump of all packets" 
option  for trapd and turn on the trapd trace by toggling it with "trapd 
-T".
This will show you when trapd pulled the trap in off the socket and what 
he did with it.  You can try matching when he sends it to all appls with 
the input time stamp in nvcorrd to make sure you are dealing with the same 
instance of the trap.

Also  you haven't explained what happens in the ruleset which is supposed 
to alter the test file.  Who does that?  Actionsvr?
Have you looked in his log?  You can see in there when he got the action 
from nvcorrd.  The very same timestamped transfer  will appear in the 
nvcorrd log. 
Actionsvr will launch the action as a script, which calls your executable 
after exporting the variables. 
Depending on how it ends you might be able to see that in 
nvaction.alog/blog as well.  Could your scripts be stepping on one another 
trying to write to the same file? 

James Shanks
Level 3 Support  for Tivoli NetView for UNIX and NT
Tivoli Software / IBM Software Group




"Barr, Scott" <Scott_Barr@csgsystems.com>
05/07/2003 02:44 PM

 
        To:     <nv-l@lists.tivoli.com>
        cc: 
        Subject:        [nv-l] Trap Processing Performance Issue



Greetings, I have a problem I can't for the life of me figure out and I am 
hoping the answer here will be speedier and less painful  than the answer 
I'll get from support.
NetView v7.1.3 
Solaris 2.8 
Basically, we have a piece of automation who's function is to determine if 
trap processing is functioning. It works like this:
1. crontab executes a script every 10 minutes that issues an SNMP trap for 
NetView to consume. 
2. A ruleset exists that catches the trap, and writes an empty file out 
(i.e. touches the time stamp) 
3. The crontab script goes to sleep for 30 seconds. 
4. When the script wakes back up, it examines the time on the file, and if 
it is more than 30 seconds old (i.e. automation hasn't touched the file 
and he should have already)
5. If this occurs, the script cycles nvcorrd. 
When the problem happens, the diagnostics indicate the trap was sent in 
and NetView did not touch the file for >30 seconds. This implies that the 
trap was hung up in processing for a good long time. 
Some things I have observed/checked: 
1. I looked at CPU performance. Although ovwdb spikes pretty high, 
everything seems to be responding okay, i.e user interface etc. Could be a 
cpu issue, but I don't think so.
2. Trap volume: I snooped on port 162 during this time, and traps were 
coming in roughly 1 every few seconds. Nothing scary there.
3. No cores. 
4. There seems to be a delay in the event showing up in the nvevents 
window. I consider this to be an important point. 
5. The problem disappeared for weeks. Now it happened 5 times in a two 
hour period this morning. 
6. We use a large number of rulesets (34). In general, they do not overlap 
but there are a few cases where traps must be handled twice.
7. We use a large number of smartsets (25) 
8. Netstat -a does not show any ports associated with nvcold having queued 
data 
I am open to suggestion, this is actually occuring on two different 
servers in two different parts of the county. The configuration is not 
new. but the trap-checking automation has only been running for a few 
months and this problem may have been occuring before we were able to 
measure the time it takes to process a trap. 30 seconds seems like a real 
long time. I can't find any error logs or anything else even with debug on 
nvcorrd that shows anything obviously wrong. Anybody got any suggestions?

Scott Barr
CSG Systems Inc.
Network Systems Engineer
Phone: 402-431-7939
Fax: 402-431-7413
Mail: scott_barr@csgsystems.com



---------------------------------------------------------------------
To unsubscribe, e-mail: nv-l-unsubscribe@lists.tivoli.com
For additional commands, e-mail: nv-l-help@lists.tivoli.com

*NOTE*
This is not an Offical Tivoli Support forum. If you need immediate
assistance from Tivoli please call the IBM Tivoli Software Group
help line at 1-800-TIVOLI8(848-6548)


<Prev in Thread] Current Thread [Next in Thread>

Archive operated by Skills 1st Ltd

See also: The NetView Web