I think you may need to contact Support for assistance.
Your diagnosis here is not quite correct.
The nettl message you have here is from collmap. It says he is exiting
because he cannot contact ovtopmd, So it tells you nothing abut what
happened to ovtopmd.
When you see a message that such-and-such daemon has lost connection with
trapd, that is the place to start, with trapd. Look in your trapd.log.
Are you getting a lot of traps? I'll bet you are. I'll bet you are having
periodic traps storms where you get so many so fast that trapd has no
choice but to just queue them for later processing. Then, when the storm
subsides, he processes them like crazy but the other applications, such a
ovtopmd, cannot keep up. When their internal queues in trapd fill up,
trapd forces them off to protect himself. The result? ovtopmd goes down,
then netmon. You can confirm this with a trapd.trace. You can start it
at the command line with "trapd -T" and toggle it off and on again with the
same command.
If your trap rates are not too high you can obtain some relief by adjusting
the application queue buffer size in trapd. The default is something like
2000 traps, but you can bump that number much higher, as high as 20K or
30K if you have the memory and storage to spare. That may keep he daemons
up, but it won't improve throughput. They will be slow, ad your map status
will suffer, because they still have to process all those traps. The only
good way to deal with a trap storm is to find the culprits, usually
overactive agents in routers sending the same trap over and over again
every second, and reconfigure them so that they don't flood the NetView
box. A good rule of thumb to use is to ask yourself who at the NetView
location is going to see this trap and do something about it? If the
answer is, "no one" then don't send it.
In NetView 6.0 much work was down to reduce the amount of traps going to
the other daemons internally to help avoid this problem. But it can still
happen with high enough trap rates for a sustained period of time.
Tracking this all down is often difficult, which is why you my want to get
Support to help.
James Shanks
Team Leader, Level 3 Support
Tivoli NetView for UNIX and NT
"Pretorius, Vynita" <VPretorius@fnb.co.za>@tkg.com on 01/23/2001 03:02:15
AM
Please respond to IBM NetView Discussion <nv-l@tkg.com>
Sent by: owner-nv-l@tkg.com
To: "NV-L@tkg. com (E-mail)" <NV-L@tkg.com>
cc:
Subject: [NV-L] Problem with ovtopmd (lost connection with trapd)
Hi All
We are running Netview/AIX ver 5.1.2 on aix 4.2.1.
The problem is ovtopmd is failing ( lost connection to trapd) which then
causes netmon (lost connection with ovtopmd) to fail.
In the Nettl.LOG00 the following message appears.
Initialization done.
************************************ NetView
*******************************@#%
Timestamp : Mon Jan 22 2001 16:15:38.183427
Process ID : 38022 Subsystem : COLLECTION
User ID ( UID ) : 39294 Log Class : ERROR
Device ID : -1 Path ID : -1
Connection ID : -1 Log Instance : 0
Host Dropped Msgs : 40 Host Dropped Data: 16048
Software : /usr/OV/bin/collmap
Hostname : Netvaix
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
collmap: topoEventHandler(): topoEventDispatch failed. Return code = -3:
Could not receive topo
logy event: sys 2: No such file or directory.
************************************ NetView
*******************************@#%
Please could someone assist me in rectifying this problem.
Also is there another log file that will tell me why ovtopmd is falling
over?
Thanks
Vynita
_________________________________________________________________________
NV-L List information and Archives: http://www.tkg.com/nv-l
|