nv-l
[Top] [All Lists]

Re: Trap queue buildup

To: nv-l@lists.tivoli.com
Subject: Re: Trap queue buildup
From: James_Shanks@tivoli.com
Date: Mon, 31 Jul 2000 07:50:29 -0400

You need to call Support and get some help immediately.

Those messages indicate that traps are arriving so fast that while trapd can get
them and queue them, other applications which are processing traps, such as
ovtopmd, netmon, snmpCollect, and so on, cannot process them fast enough to keep
up.  You have alredy boosted the trapd queue size to 32000 from its default of
2000, so this means that your external changes have resulted in many more traps
per second being sent to your box than it can handle.   You are experiencing
trap storms. Trapd disconnects applications which exceed their queue size so
that they do not cause him to crash for lack of memory.  So while he is not
losing traps, your other applications are not getting them.

I would try turning on the trapd trace by making trapd run with the "hex dump of
all packets option" (that adds a -x flag to him) and then start the trace
(issue trapd -T from the command line) so that you can analyze what these traps
are and where they are coming from.  But as the results will be in hex, you will
probably need help deciphering it.  The trace will also show you the process id
of the applications being disconnected.

James Shanks
Team Leader, Level 3 Support
 Tivoli NetView for UNIX and NT



"Rama, R. (Reggie)" <ReggieR@nedcor.co.za> on 07/31/2000 04:10:41 AM

Please respond to IBM NetView Discussion <nv-l@tkg.com>

To:   "'nv-l@tkg.com'" <nv-l@tkg.com>
cc:   "Bhikha, P. (Prakash)" <PrakashB@nedcor.com> (bcc: James Shanks/Tivoli
      Systems)
Subject:  [NV-L] Trap queue buildup




Hello All Netviewers

We are currently running AIX 4.2.1 and Netview 5.1.2 on a F50 (4CPU & 1GB
RAM) and we are experiencing the following problem.

Over the past few days we have noticed that we receive the following message
within the trapd.log file "netmon-related Application reached maximum number
of outstanding events, disconnecting from trapd". The trapd buffer size to
32000 .i.e. trapd -b32000.We are receiving about 4 of these messages per
hour daily now.

When we monitor udp port 162 using the netstat -an command, we find that the
receive queue builds up to approx 32000 and it sits at this value for a few
minutes and then only does it get cleared and starts it building up again. I
have looked at all the various Netview configurations and they all seem OK.
I have searched the Netview Archives and could find a suitable reply for the
questions I have.My questions are :-

 1. Are there application(s) that are not reading the traps from the queue
fast enough that is the cause of the problem.
 2. When we get the above message, does it mean that all the traps that were
on the queue are discarded (lost).
 3. How does one determine which application(s) are not reading the traps
from the queue and are the cause of the problem.
 4. How does one determine / verify that traps are not being lost .i.e how
does one verify if the data within trapd.log is correct.
 5. Also, we have made no changes to the system at all recently. Are there
any external changes .i.e.many more traps from devices that can cause this
to occurr.

Thanks in advance for the assistance.

Regards
Reggie Rama
ESM - Technology & Operations Division
Nedcor Bank Limited (South Africa)

Tel : +27 - 011 - 8813989
Fax : +27 - 011 -  8814113
e-mail : reggier@nedcor.co.za



Hello All Netviewers

We are currently running AIX 4.2.1 and Netview 5.1.2 on a F50 (4CPU & 1GB RAM) and we are experiencing the following problem.

Over the past few days we have noticed that we receive the following message within the trapd.log file "netmon-related Application reached maximum number of outstanding events, disconnecting from trapd". The trapd buffer size to 32000 .i.e. trapd -b32000.We are receiving about 4 of these messages per hour daily now.

When we monitor udp port 162 using the netstat -an command, we find that the receive queue builds up to approx 32000 and it sits at this value for a few minutes and then only does it get cleared and starts it building up again. I have looked at all the various Netview configurations and they all seem OK. I have searched the Netview Archives and could find a suitable reply for the questions I have.My questions are :-

 1. Are there application(s) that are not reading the traps from the queue fast enough that is the cause of the problem.

 2. When we get the above message, does it mean that all the traps that were on the queue are discarded (lost).
 3. How does one determine which application(s) are not reading the traps from the queue and are the cause of the problem.

 4. How does one determine / verify that traps are not being lost .i.e how does one verify if the data within trapd.log is correct.

 5. Also, we have made no changes to the system at all recently. Are there any external changes .i.e.many more traps from devices that can cause this to occurr.

Thanks in advance for the assistance.

Regards
Reggie Rama
ESM - Technology & Operations Division
Nedcor Bank Limited (South Africa)

Tel : +27 - 011 - 8813989
Fax : +27 - 011 -  8814113
e-mail : reggier@nedcor.co.za






<Prev in Thread] Current Thread [Next in Thread>

Archive operated by Skills 1st Ltd

See also: The NetView Web