James,
We
finally had missed heartbeats to track. We can see the heartbeat trap in
trapd.log, but no corresponding entry in nvserverd. This appears to confirm the
holdup is on the NV side, and again, we had an increase in Cisco traps (one
every 5 seconds for about 2 hours prior to missing the first
heartbeat), but nothing near NV's limit. Trapd.log shows it is
starting to fall behind as well during this period--as an example, the missed
heartbeat TEC event for 6 PM last night did not show in trapd until 6:48
PM. The 7 PM heartbeat shows in trapd at 7:21 PM and is in nvserverd at
7:45, so it had almost caught up by then.
So
the TEC adapter never stopped, but we've got to figure out why trapd and the
processes in between seem to stumble under load, but not a heavy one. We know
Cisco devices can send some traps at rates faster than one per second. Is it
possible devices are machine gunning traps even though NV shows one every 5
seconds or so? That's the only thing I can think of that could set trapd
behind based on what we are seeing.
Thanks everyone--Drew
So what's
different? Is your wpostemsg to @EventServer like your tecint.conf file?
We are back to this being a TEC
issue and not a NetView one. So unless you want to open a problem to TEC
support, you'll have to do some more detective work yourself.
If both the wpostemsg and the
tecint.conf have @EventServer, then I don't know what to tell you. If
not, then reconfigure your tecint.conf using serversetup to use the non-TME
method (which requires that a different daemon be started than when you use
the TME method). For non-TME forwarding, /usr/OV/bin/nvserverd is
started. For TME forwarding, it is /usr/OV/bin/spmsur, who then starts
/usr/OV/bin/tme_nvserverd. To which from one to the other requires that
you go through serversetup, which will reconfigure this automatically,
or that you manually alter the /usr/OV/conf/ovsuf file to start the
correct daemons. But note that when you go through serversetup, your
special customization to the Nvserverd entries is lost.
The fact that events are going to the cache means that
nvserverd got the event, formatted it, did his tec_put_event( ) and all went
fine, but then TEC library code in trying to send to the TEC server found that
it could not, that it has lost connection to the TEC server, for some reason
known only to those internal routines. And without a diag (as in
"diagnosis") file configured in here so that the internal TEC library code
will trace itself, no one can tell you what it's doing or why. And you
have to get that diag file, called ".ed_diag_config" from TEC Support and they
are the ones who have to look at the traces. No one on the NetView side
can assist at this point.
James
Shanks Level 3 Support for Tivoli NetView for UNIX and
Windows Tivoli Software / IBM Software Group
"Edwards, JT - ESM"
<JEdwards3@wm.com> Sent by: owner-nv-l@lists.us.ibm.com
09/16/2004 04:00 PM
|
To
| "'nv-l@lists.us.ibm.com'"
<nv-l@lists.us.ibm.com>
|
cc
|
|
Subject
| RE: [nv-l] nvtecia
still hanging or falling behind processing TEC
_ITS.rs |
|
Yes it does. -----Original Message----- From: owner-nv-l@lists.us.ibm.com
[mailto:owner-nv-l@lists.us.ibm.com]On Behalf Of James
Shanks Sent: Thursday, September 16, 2004 2:32 PM To:
nv-l@lists.us.ibm.com Subject: RE: [nv-l] nvtecia still hanging or
falling behind processing TEC _ITS.rs
Wpostemsg does not go through the internal adapter. Does that
get to the TEC server?
James Shanks Level 3 Support for Tivoli NetView for
UNIX and Windows Tivoli Software / IBM Software Group
"Edwards, JT - ESM"
<JEdwards3@wm.com> Sent by:
owner-nv-l@lists.us.ibm.com
09/16/2004 03:17 PM
|
To
| "'nv-l@lists.us.ibm.com'"
<nv-l@lists.us.ibm.com>
|
cc
|
|
Subject
| RE: [nv-l] nvtecia
still hanging or falling behind processing TEC
_ITS.rs |
|
Well at this
point. We are now getting events caching. From there what can we
do? A wpostemsg does not clear the cache.
-----Original
Message----- From: owner-nv-l@lists.us.ibm.com
[mailto:owner-nv-l@lists.us.ibm.com]On Behalf Of James
Shanks Sent: Wednesday, September 15, 2004 10:16 PM To:
nv-l@lists.us.ibm.com Subject: RE: [nv-l] nvtecia still hanging or
falling behind processing TEC _ITS.rs
No. The errno 827 indicates that there is a problem
initializing the JVM -- Java Virtual Machine. In almost every case I have seen
this indicates that the nvserverd daemon does not have the correct library
path for Java or the ZCE_CLASSPATH variable is not set. Since it is only set
in /etc/netnmrc, if you ovstop all the daemons and restart them with just
ovstart, you will lose it. So Mike is right. The usual fix is to ovstop nvsecd
and then restart with /etc/netnmrc (/etc/init.d/netnmrc on Solaris or Linux).
This issue has been fixed in the upcoming FixPack 2 (FP02) by updating the
NVenvironment script so that if you run that before you do ovstart, it will
source the correct environment for you, and then the daemons will inherit it
when you do the ovtstart.
But I still don't know why you are not
getting an nvserverd.log which shows the same tec_create_handle failure that
you see in the formatted nettl. We do get that here.
James
Shanks Level 3 Support for Tivoli NetView for UNIX and Windows Tivoli
Software / IBM Software Group
This message (including any attachments) contains confidential information intended for a specific individual and purpose, and is protected by law. If you are not the intended recipient, you should delete this message. Any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited.
|