nv-l
[Top] [All Lists]

Re: [nv-l] Slow Interface Down events (was Netview Traps - Time to post)

To: Peter_Chow@TD.COM
Subject: Re: [nv-l] Slow Interface Down events (was Netview Traps - Time to post)
From: "Leslie Clark" <lclark@us.ibm.com>
Date: Fri, 17 May 2002 14:56:49 -0400
Cc: nv-l@lists.tivoli.com
No. Well. Normally if it were managing all of those devices it would
probably still make the cycle in 5-10 minutes; events should be coming
back immediately when the ping comes back or the  timeout occurrs.
It does not wait until it has completed the cycle to report. Because
normally, a large number of those nodes would be up.

I would conclude, therefore, that you actually are monitoring all of
those nodes, but since none of them are reachable, netmon has to wait
through all of the timeouts and retries for all of those 'down' nodes.
My suspicion is that you have more than one map, and while those
nodes are unmanaged in the map you are looking at, they are
mananged in some other map, therefore netmon still must do his
duty. Pick one of the unmanaged nodes and display the object
info (Tools). There will be two fields: Maps Exist, and Maps Managed.
If you only have one map, it will say Maps Exist 1, Maps Managed 0.
If the Maps Managed value is greater than 0, then netmon has to poll.
If this is out of sync, do 'ovmapcount -a' to clear it up.

That's one thing. But if, as you say, the backlog clears up in a matter
of minutes, then how do you explain the 2 hours to get an
Interface Down event when you unplug the device?

Since you are in a lab environment, you might consider using a lab
database. You can make a tar file of the existing database
(tar -cvf nvdb.bak /usr/OV/databases/openview)
and regenerate the database with only the lab environment.
Try your test again. Of course you want to know that it will work in
a large environment as well, and won't take my word for it. But
the test environement you have there is also nullifying your test.

I'm assuming you are on AIX here....

Cordially,

Leslie A. Clark
IBM Global Services - Systems Mgmt & Networking
(248) 552-4968 Voicemail, Fax, Pager



                                                                                
                                       
                      Peter_Chow@TD.COM                                         
                                       
                                               To:       Leslie 
Clark/Southfield/IBM@IBMUS                             
                      05/17/2002 02:16         cc:                              
                                       
                      PM                       Subject:  Re: [nv-l] Netview 
Traps - Time to post                       
                                                                                
                                       
                                                                                
                                       
                                                                                
                                       




The ping list shows that netmon is thousands (10,000 - 20,000) behind, but
it clears relatively quickly  (within minutes) back down to zero.

Our managed lab network is 4 routers and about 20 interfaces.  The rest of
the network is unmanaged and unreachable as the lab is isolated.
Why is netmon trying to ping the unmanaged devices?  If it pings thousands
of unreachable devices/interfaces, will this contribute to the problem?
I assume it must process all of these failed pings and then perform a
status update and generate a trap!?

Regards, Peter.




                    "Leslie

                    Clark"               To:     Peter_Chow@TD.COM

                    <lclark@us.ib        cc:

                    m.com>               Subject:     Re: [nv-l] Netview
Traps - Time to post

                    05/17/02

                    09:09 AM







Peter, the discussion on the listserver has gotten pretty far afield from
your
problem with status change events taking hours to get posted. Have you made
progress? I hope you will report your findings on the listserver.

Cordially,

Leslie A. Clark
IBM Global Services - Systems Mgmt & Networking
Detroit

__________________


This is starting to sound like a general performance problem.
What version, what platform, how many objects? Seedfile with
oids in it? Are you having other performance problems with the box?
Is you ovwdb cache size set high enough for the number of objects?

I guess you have to assume that netmon really is that far behind, and
it is not just this router. That can be caused by slow name resolution,
among other things, so some tuning is in order. Try using this tool to
check and see how many nodes it is behind (if you are on unix).


#!/bin/ksh
#set -x
cat /dev/null > /usr/OV/log/netmon.trace
netmon -a 12
sleep 6
if [ -f /usr/OV/log/netmon.trace ]; then
  echo "Netmon is " `grep [-].*[:] /usr/OV/log/netmon.trace | wc -l `
"behind in status pinging";
else
  echo "Netmon is too busy to report now. Try later."
fi
exit


See the man page for netmon for the various -a options.
It is not unusual for it to report that it is a thousand or so behind,
and then catch up quickly. If netmon has just started, it could be much
further behind.

I would turn on tracing in netmon at startup to see what it is doing.
Once you are familiar with what is in there and how far behind it
really is, if you don't figure it out, you should consider calling
Support.


Cordially,

Leslie A. Clark
IBM Global Services - Systems Mgmt & Networking
Detroit

Cordially,

Leslie A. Clark
IBM Global Services - Systems Mgmt & Networking
(248) 552-4968 Voicemail, Fax, Pager




                      Peter_Chow@TD.COM
                                               To:
nv-l@lists.tivoli.com
                      05/15/2002 03:34         cc:
                      PM                       Subject:  [nv-l] Netview
Traps - Time to post






Demand poll indicates that a ping is being used.
When I log on to the router and status the interface in question,  it does
indicate that the interface is down.

The only thing peculiar about our lab setup is that there is a very large
amount of unmanaged devices.  Could this be causing the delay?

What is the difference between polling with SNMP as opposed to ping?  Where
is this configured?

Regards, Peter.



------------------------------

Date: Mon, 13 May 2002 10:55:55 -0400
To: nv-l@lists.tivoli.com
From: Peter_Chow@TD.COM
Subject: Netview Traps - Time to post
Message-ID: <OF18B017A7.C9A112B7-ON85256BB8.0051DF29@dms.ops.tdbank.ca>

We're performing some testing with Netview traps in a lab environment.
One of the tests was to pull out the interface cable on a router and see
how long it would take to receive the interface down trap on netview.
We expected to receive the trap within minutes but instead received the
trap almost two and a half hours later!!!

What is going on here?  How can we minimize this 'turnaround time' to
within minutes?

------------------------------

Date: Mon, 13 May 2002 18:31:18 -0400
To: nv-l@lists.tivoli.com
From: "Leslie Clark" <lclark@us.ibm.com>
Subject: Re: [nv-l] Netview Traps - Time to post
Message-ID: <OF21AA9E5A.72FB0C2E-ON85256BB8.007B8B26@raleigh.ibm.com>

By any chance are you polling that device via SNMP as opposed to ping?
You can tell when you do a demandpoll. If it is SNMP, the status of each
interface is displayed in terms of ifAdmin and ifOper status.

Is it possible that the device itself believes that the interface is up and
reports it as up, for that long?

Cordially,

Leslie A. Clark
IBM Global Services - Systems Mgmt & Networking
Detroit


                      Peter_Chow@TD.COM
                                               To:       Leslie
Clark/Southfield/IBM@IBMUS
                      05/13/2002 03:55         cc:
                      PM                       Subject:  Re: [nv-l] Netview
Traps - Time to post




I agree that this is not normal and thank God that its not happening in our
production environment.

If I ping the interface object after I pull the cable, the status will
change to down and a trap is received right away.
If I do nothing, then no status update or trap is received for a long
period (2.5 hrs)..

SNMP Polling Info is : timeout 4.0; retry 3 ; polling 3m.

The IP interface is represented and managed.

Any suggestions on what may be wrong?

                    "Leslie

                    Clark"               To:     Peter_Chow@TD.COM

                    <lclark@us.ib        cc:

                    m.com>               Subject:     Re: [nv-l] Netview
Traps - Time to post

                    05/13/02

                    12:35 PM

Well, that's not good. Fortunately it is not normal, either!

When you pull the cable, can you still ping the address
of that interface from the Netview box? And what is the
polling interval set to? That is in Options...SNMP configuration
and defaults to 5 minutes. Is the IP address of the interface
represented on the map, and is it managed? You should see
at least an Interface Down event in trapd.log for that interface
within one polling cyle.

I would have to say there is a lot more to this story than
what you have told us so far. Tell us more...

Cordially,

Leslie A. Clark
IBM Global Services - Systems Mgmt & Networking
Detroit

                      Peter_Chow@TD.COM
                                               To:
nv-l@lists.tivoli.com
                      05/13/2002 10:55         cc:
                      AM                       Subject:  [nv-l] Netview
Traps - Time to post

We're performing some testing with Netview traps in a lab environment.
One of the tests was to pull out the interface cable on a router and see
how long it would take to receive the interface down trap on netview.
We expected to receive the trap within minutes but instead received the
trap almost two and a half hours later!!!

What is going on here?  How can we minimize this 'turnaround time' to
within minutes?

---------------------------------------------------------------------
To unsubscribe, e-mail: nv-l-unsubscribe@lists.tivoli.com
For additional commands, e-mail: nv-l-help@lists.tivoli.com

*NOTE*
This is not an Offical Tivoli Support forum. If you need immediate
assistance from Tivoli please call the IBM Tivoli Software Group
help line at 1-800-TIVOLI8(848-6548)

------------------------------



---------------------------------------------------------------------
To unsubscribe, e-mail: nv-l-unsubscribe@lists.tivoli.com
For additional commands, e-mail: nv-l-help@lists.tivoli.com

*NOTE*
This is not an Offical Tivoli Support forum. If you need immediate
assistance from Tivoli please call the IBM Tivoli Software Group
help line at 1-800-TIVOLI8(848-6548)





---------------------------------------------------------------------
To unsubscribe, e-mail: nv-l-unsubscribe@lists.tivoli.com
For additional commands, e-mail: nv-l-help@lists.tivoli.com

*NOTE*
This is not an Offical Tivoli Support forum. If you need immediate
assistance from Tivoli please call the IBM Tivoli Software Group
help line at 1-800-TIVOLI8(848-6548)








<Prev in Thread] Current Thread [Next in Thread>
  • Re: [nv-l] Slow Interface Down events (was Netview Traps - Time to post), Leslie Clark <=

Archive operated by Skills 1st Ltd

See also: The NetView Web