nv-l
[Top] [All Lists]

[nv-l] Trap Processing Performance Issue

To: <nv-l@lists.tivoli.com>
Subject: [nv-l] Trap Processing Performance Issue
From: "Barr, Scott" <Scott_Barr@csgsystems.com>
Date: Wed, 7 May 2003 13:44:22 -0500
Delivered-to: mailing list nv-l@lists.tivoli.com
Delivery-date: Wed, 07 May 2003 19:46:41 +0100
Envelope-to: nv-l-archive@lists.skills-1st.co.uk
List-help: <mailto:nv-l-help@lists.tivoli.com>
List-post: <mailto:nv-l@lists.tivoli.com>
List-subscribe: <mailto:nv-l-subscribe@lists.tivoli.com>
List-unsubscribe: <mailto:nv-l-unsubscribe@lists.tivoli.com>
Mailing-list: contact nv-l-help@lists.tivoli.com; run by ezmlm
Thread-index: AcMUyKxhO+eF4LRITI2GPoMm3zsJNg==
Thread-topic: Trap Processing Performance Issue

Greetings, I have a problem I can't for the life of me figure out and I am hoping the answer here will be speedier and less painful  than the answer I'll get from support.

NetView v7.1.3
Solaris 2.8

Basically, we have a piece of automation who's function is to determine if trap processing is functioning. It works like this:

1. crontab executes a script every 10 minutes that issues an SNMP trap for NetView to consume.
2. A ruleset exists that catches the trap, and writes an empty file out (i.e. touches the time stamp)
3. The crontab script goes to sleep for 30 seconds.
4. When the script wakes back up, it examines the time on the file, and if it is more than 30 seconds old (i.e. automation hasn't touched the file and he should have already)

5. If this occurs, the script cycles nvcorrd.

When the problem happens, the diagnostics indicate the trap was sent in and NetView did not touch the file for >30 seconds. This implies that the trap was hung up in processing for a good long time.

Some things I have observed/checked:

1. I looked at CPU performance. Although ovwdb spikes pretty high, everything seems to be responding okay, i.e user interface etc. Could be a cpu issue, but I don't think so.

2. Trap volume: I snooped on port 162 during this time, and traps were coming in roughly 1 every few seconds. Nothing scary there.

3. No cores.
4. There seems to be a delay in the event showing up in the nvevents window. I consider this to be an important point.
5. The problem disappeared for weeks. Now it happened 5 times in a two hour period this morning.
6. We use a large number of rulesets (34). In general, they do not overlap but there are a few cases where traps must be handled twice.

7. We use a large number of smartsets (25)
8. Netstat -a does not show any ports associated with nvcold having queued data

I am open to suggestion, this is actually occuring on two different servers in two different parts of the county. The configuration is not new. but the trap-checking automation has only been running for a few months and this problem may have been occuring before we were able to measure the time it takes to process a trap. 30 seconds seems like a real long time. I can't find any error logs or anything else even with debug on nvcorrd that shows anything obviously wrong. Anybody got any suggestions?


Scott Barr

CSG Systems Inc.

Network Systems Engineer

Phone: 402-431-7939

Fax: 402-431-7413

Mail: scott_barr@csgsystems.com

<Prev in Thread] Current Thread [Next in Thread>

Archive operated by Skills 1st Ltd

See also: The NetView Web