RE: [nv-l] nvtecia still hanging or falling behind processing TEC

To:	<nv-l@lists.us.ibm.com>
Subject:	RE: [nv-l] nvtecia still hanging or falling behind processing TEC _ITS.rs
From:	"Van Order, Drew \(US - Hermitage\)" <dvanorder@deloitte.com>
Date:	Wed, 22 Sep 2004 06:09:59 -0500
Delivery-date:	Wed, 22 Sep 2004 12:18:18 +0100
Envelope-to:	nv-l-archive@lists.skills-1st.co.uk
Importance:	normal
Reply-to:	nv-l@lists.us.ibm.com
Sender:	owner-nv-l@lists.us.ibm.com
Thread-index:	AcSgeCb9DfsC3rAfQOW12+4IlnbSUAAHGROg
Thread-topic:	[nv-l] nvtecia still hanging or falling behind processing TEC _ITS.rs
Inquiring minds want to know! We opened yet another PMR for our
slowdowns.

-----Original Message-----
From: owner-nv-l@lists.us.ibm.com [mailto:owner-nv-l@lists.us.ibm.com]
On Behalf Of Jane Curry
Sent: Wednesday, September 22, 2004 2:38 AM
To: nv-l@lists.us.ibm.com
Subject: Re: [nv-l] nvtecia still hanging or falling behind processing
TEC _ITS.rs


Come on JT - what was the answer???????
Cheers,
Jane

Edwards, JT - ESM wrote:

> Folks this problem is resolved.
>  
> Many thanks to everyone involved. The NV - TEC troubleshooting guide 
> hit the proverbial nail on the head. Now we are cooking.
>  
> But another question (Netview) looms. :-)
>  
> JT
>
>     -----Original Message-----
>     *From:* owner-nv-l@lists.us.ibm.com
>     [mailto:owner-nv-l@lists.us.ibm.com]*On Behalf Of *Van Order, Drew
>     (US - Hermitage)
>     *Sent:* Friday, September 17, 2004 4:47 PM
>     *To:* nv-l@lists.us.ibm.com
>     *Subject:* RE: [nv-l] nvtecia still hanging or falling behind
>     processing TEC_ITS.rs
>
>     Thank you for a very thoughtful and detailed reply, James.
>     Hopefully we'll get this one figured out.
>      
>     Drew
>
>         -----Original Message-----
>         *From:* owner-nv-l@lists.us.ibm.com
>         [mailto:owner-nv-l@lists.us.ibm.com] *On Behalf Of *James
Shanks
>         *Sent:* Friday, September 17, 2004 4:11 PM
>         *To:* nv-l@lists.us.ibm.com
>         *Subject:* RE: [nv-l] nvtecia still hanging or falling behind
>         processing TEC_ITS.rs
>
>
>         Drew,
>
>         Performance problems are notoriously difficult to diagnose,
>         especially remotely.  Remember too that the benchmarks you are
>         thinking of are for optimally configured systems running in
>         the lab, not real-world results.   But here's a couple of
>         points you might investigate.
>
>         (1) What you see in trapd.log is not necessarily what is
>         coming in.  It's what trapd.processed and logged.  Logging is
>         the last thing trapd does with the trap, after he's processed
>         it in every other way.  It does record that a particular trap
>         was received and processed at a particular time, but that's
>         about all.  So seeing Cisco traps in trapd.log 5 seconds apart
>         means that's how fast trap is processing them, not how fast
>         they are arriving.   What might you not see in the log?  Any
>         traps configured to "Don't Log or Display" in xnmtrap.  That
>         action puts the trap category in trapd.conf to "Ignore".  So
>         you could go to /usr/OV/conf/C  (don't forget the "C" here)
>         and do "grep Ignore trapd.conf " and see whether you have any
>         of those.  If you do, then you are not seeing those in the
>         log.  For diagnostic purposes you should alter those entries
>         to "Log Only" so you can get a better idea of the work trapd
>         is actually doing.  
>
>         (2) To get closer to what is coming in, you could turn on the
>         trapd.trace.  You'll see a message about each  trap being
>         received from address so-and-so every time one is pulled off
>         the queue for processing.  If you want to see the contents of
>         those incoming traps, then you also need to have trapd running
>         with the -x option to hex dump an incoming packets.  Now I
>         said closer to what is coming in, because obviously trapd will
>         cannot trace a trap until he has started to read it. When
>         won't he read immediately?  When there is no break between
>          incoming traps.  If traps arrive too quickly, rather than
>         pull them off one at a time and process them, trapd queues
>         them so that he doesn't lose any.  He won't start processing
>         them again until there's a break in the incoming flow.  In
>         that case you should see a bunch of trap queued messages but
>         no intervening processing in the trace.  I suspect that this
>         is really what's going on.  You get a big burst of traps, so
>         all trap processing slows while we queue them, and then once
>         the burst subsides, processing starts up again.  But now the
>         bottle neck is going to be in nvcorrd and nvserverd, who have
>         been idle for awhile, and now have a lot to do.  It's like a
>         snake swallowing an egg;  you see a big lump moving along
>         until it is totally digested.   You have to turn on the
>         nvcorrd trace (nvcdebug -d all) to see what nvcorrd's doing,
>         and one benefit of that is that you can see how long it takes
>         him to process just one trap, given the rulesets and event
>         windows you have going at the time.   Look for the eye-catcher
>         "Received a trap" and "Finished with the trap".  From the one
>         to the other is the transit time through nvcorrd.   Not much
>         you can do if you don't like it, other than to reduce the
load.
>
>         (3) Obviously if you want to assess what the real incoming
>         trap rate is, you need an outside analysis tool, such as an
>         iptrace for port 162.  Then you can use ipreport of the data
>         and see. Those are AIX commands by the way -- there are
>         similar tools on Solaris and Linux but I haven't used them
much.
>
>         (4)  If you cannot reduce the incoming rates to keep
>         processing from being overloaded then you might consider
>         installing an MLM and using it as a trap filter, tossing out
>         duplicates and only passing on to trapd what you really want
>         to see.
>
>         HTH
>
>
>
>         James Shanks
>         Level 3 Support  for Tivoli NetView for UNIX and Windows
>         Tivoli Software / IBM Software Group
>
>
>         *"Van Order, Drew \(US - Hermitage\)"
<dvanorder@deloitte.com>*
>         Sent by: owner-nv-l@lists.us.ibm.com
>
>         09/17/2004 10:12 AM
>         Please respond to
>         nv-l
>
>
>               
>         To
>               <nv-l@lists.us.ibm.com>
>         cc
>               
>         Subject
>               RE: [nv-l] nvtecia still hanging or falling behind
processing
>         TEC_ITS.rs
>
>
>
>               
>
>
>
>
>
>         James,
>          
>         We finally had missed heartbeats to track. We can see the
>         heartbeat trap in trapd.log, but no corresponding entry in
>         nvserverd. This appears to confirm the holdup is on the NV
>         side, and again, we had an increase in Cisco traps (one every
>         5 seconds for about 2 hours prior to missing the first
>         heartbeat), but nothing near NV's limit. Trapd.log shows it is
>         starting to fall behind as well during this period--as an
>         example, the missed heartbeat TEC event for 6 PM last night
>         did not show in trapd until 6:48 PM. The 7 PM heartbeat shows
>         in trapd at 7:21 PM and is in nvserverd at 7:45, so it had
>         almost caught up by then.
>          
>         So the TEC adapter never stopped, but we've got to figure out
>         why trapd and the processes in between seem to stumble under
>         load, but not a heavy one. We know Cisco devices can send some
>         traps at rates faster than one per second. Is it possible
>         devices are machine gunning traps even though NV shows one
>         every 5 seconds or so? That's the only thing I can think of
>         that could set trapd behind based on what we are seeing.
>          
>         Thanks everyone--Drew
>          
>          
>         -----Original Message-----*
>         From:* owner-nv-l@lists.us.ibm.com
>         [mailto:owner-nv-l@lists.us.ibm.com] *On Behalf Of *James
Shanks*
>         Sent:* Thursday, September 16, 2004 3:40 PM*
>         To:* nv-l@lists.us.ibm.com*
>         Subject:* RE: [nv-l] nvtecia still hanging or falling behind
>         processing TEC_ITS.rs
>
>
>         So what's different?  Is your wpostemsg to @EventServer like
>         your tecint.conf file?  
>         We are back to this being a TEC issue and not a NetView one.
>          So unless you want to open a problem to TEC support, you'll
>         have to do some more detective work yourself.
>
>         If both the wpostemsg and the tecint.conf have @EventServer,
>         then I don't know what to tell you.  If not, then reconfigure
>         your tecint.conf using serversetup to use the non-TME method
>         (which requires that a different daemon be started than when
>         you use the TME method).   For non-TME forwarding,
>         /usr/OV/bin/nvserverd is started.  For TME forwarding, it is
>         /usr/OV/bin/spmsur, who then starts /usr/OV/bin/tme_nvserverd.
>          To which from one to the other requires that you go through
>         serversetup, which will  reconfigure this automatically, or
>          that you manually alter the /usr/OV/conf/ovsuf file to start
>         the correct daemons.  But note that when you go through
>         serversetup, your special customization to the Nvserverd
>         entries is lost.  
>          
>
>         The fact that events are going to the cache means that
>         nvserverd got the event, formatted it, did his tec_put_event(
>         ) and all went fine, but then TEC library code in trying to
>         send to the TEC server found that it could not, that it has
>         lost connection to the TEC server, for some reason known only
>         to those internal routines.  And without a diag (as in
>         "diagnosis") file configured in here so that the internal TEC
>         library code will trace itself, no one can tell you what it's
>         doing or why.  And you have to get that diag file, called
>         ".ed_diag_config" from TEC Support and they are the ones who
>         have to look at the traces.  No one on the NetView side can
>         assist at this point.
>
>         James Shanks
>         Level 3 Support  for Tivoli NetView for UNIX and Windows
>         Tivoli Software / IBM Software Group
>
>         *"Edwards, JT - ESM" <JEdwards3@wm.com>*
>         Sent by: owner-nv-l@lists.us.ibm.com
>
>         09/16/2004 04:00 PM
>         Please respond to
>         nv-l
>
>               
>         To
>               "'nv-l@lists.us.ibm.com'" <nv-l@lists.us.ibm.com>
>         cc
>               
>         Subject
>               RE: [nv-l] nvtecia still hanging or falling behind
processing
>         TEC                                        _ITS.rs
>
>
>
>
>               
>
>
>
>
>
>
>         Yes it does.
>         -----Original Message-----*
>         From:* owner-nv-l@lists.us.ibm.com
>         [mailto:owner-nv-l@lists.us.ibm.com]*On Behalf Of *James
Shanks*
>         Sent:* Thursday, September 16, 2004 2:32 PM*
>         To:* nv-l@lists.us.ibm.com*
>         Subject:* RE: [nv-l] nvtecia still hanging or falling behind
>         processing TEC _ITS.rs
>
>
>         Wpostemsg does not go through the internal adapter.  Does that
>         get to the TEC server?
>
>         James Shanks
>         Level 3 Support  for Tivoli NetView for UNIX and Windows
>         Tivoli Software / IBM Software Group
>         *"Edwards, JT - ESM" <JEdwards3@wm.com>*
>         Sent by: owner-nv-l@lists.us.ibm.com
>
>         09/16/2004 03:17 PM
>         Please respond to
>         nv-l
>
>
>               
>         To
>               "'nv-l@lists.us.ibm.com'" <nv-l@lists.us.ibm.com>
>         cc
>               
>         Subject
>               RE: [nv-l] nvtecia still hanging or falling behind
processing
>         TEC                                _ITS.rs
>
>
>
>
>
>               
>
>
>
>
>
>
>
>         Well at this point. We are now getting events caching. From
>         there what can we do?
>
>         A wpostemsg does not clear the cache.
>
>
>         -----Original Message-----*
>         From:* owner-nv-l@lists.us.ibm.com
>         [mailto:owner-nv-l@lists.us.ibm.com]*On Behalf Of *James
Shanks*
>         Sent:* Wednesday, September 15, 2004 10:16 PM*
>         To:* nv-l@lists.us.ibm.com*
>         Subject:* RE: [nv-l] nvtecia still hanging or falling behind
>         processing TEC _ITS.rs
>
>         No. The errno 827 indicates that there is a problem
>         initializing the JVM -- Java Virtual Machine. In almost every
>         case I have seen this indicates that the nvserverd daemon does
>         not have the correct library path for Java or the
>         ZCE_CLASSPATH variable is not set. Since it is only set in
>         /etc/netnmrc, if you ovstop all the daemons and restart them
>         with just ovstart, you will lose it. So Mike is right. The
>         usual fix is to ovstop nvsecd and then restart with
>         /etc/netnmrc (/etc/init.d/netnmrc on Solaris or Linux). This
>         issue has been fixed in the upcoming FixPack 2 (FP02) by
>         updating the NVenvironment script so that if you run that
>         before you do ovstart, it will source the correct environment
>         for you, and then the daemons will inherit it when you do the
>         ovtstart.
>
>         But I still don't know why you are not getting an
>         nvserverd.log which shows the same tec_create_handle failure
>         that you see in the formatted nettl. We do get that here.
>
>         James Shanks
>         Level 3 Support for Tivoli NetView for UNIX and Windows
>         Tivoli Software / IBM Software Group
>
>         This message (including any attachments) contains confidential
>         information intended for a specific individual and purpose,
>         and is protected by law. If you are not the intended
>         recipient, you should delete this message. Any disclosure,
>         copying, or distribution of this message, or the taking of any
>         action based on it, is strictly prohibited.
>
>
>     This message (including any attachments) contains confidential
>     information intended for a specific individual and purpose, and is
>     protected by law. If you are not the intended recipient, you
>     should delete this message. Any disclosure, copying, or
>     distribution of this message, or the taking of any action based on
>     it, is strictly prohibited.
>

-- 
Tivoli Certified Consultant & Instructor
Skills 1st Limited, 2 Cedar Chase, Taplow, Bucks, SL6 0EU, UK
Tel: +44 (0)1628 782565
Copyright (c) 2004 Jane Curry <jane.curry@skills-1st.co.uk>.  All rights
reserved.




This message (including any attachments) contains confidential information 
intended for a specific individual and purpose, and is protected by law.  If 
you are not the intended recipient, you should delete this message.  Any 
disclosure, copying, or distribution of this message, or the taking of any 
action based on it, is strictly prohibited.
<Prev in Thread]	Current Thread	[Next in Thread>
RE: [nv-l] nvtecia still hanging or falling behind processing TEC _ITS.rs, Van Order, Drew \(US - Hermitage\) <= RE: [nv-l] nvtecia still hanging or falling behind processing TEC _ITS.rs, Van Order, Drew \(US - Hermitage\)
Previous by Date:	[nv-l] Ricardo Durán está en vacaciones, RDURAN
Next by Date:	Re: [nv-l] Mib2trap, James Shanks
Previous by Thread:	[nv-l] oid_to_type, Bhayeti
Next by Thread:	RE: [nv-l] nvtecia still hanging or falling behind processing TEC _ITS.rs, Van Order, Drew \(US - Hermitage\)
Indexes:	[Date] [Thread] [Top] [All Lists]
RE: [nv-l] nvtecia still hanging or falling behind processing TEC _ITS.r