RE: [nv-l] 45 minute difference in server response

To:	nv-l@lists.us.ibm.com
Subject:	RE: [nv-l] 45 minute difference in server response
From:	"Oliver Bruchhaeuser" <oliver.bruchhaeuser@de.ibm.com>
Date:	Tue, 9 Dec 2003 11:30:26 +0100
Delivery-date:	Tue, 09 Dec 2003 10:38:54 +0000
Envelope-to:	nv-l-archive@lists.skills-1st.co.uk
Reply-to:	nv-l@lists.us.ibm.com
Sender:	owner-nv-l@lists.us.ibm.com

Scott,

this doesn't sounds really good.
The script (the author was Leslie) count the number of interfaces for which
netmon is behind in ping/snmp polling.
I would carefully check again your polling intervalls.


I modified Leslies script "slightly" so that it shows us also the time
netmon is behind:
---------------------
#!/bin/ksh
#
# pingstatus.sh
#
# A script to check whether netmon can keep up with the polling
# frequency scheduled. Can be called from the Reports menu.
# Output: a messages to stdout
#
#set -x

cat /dev/null > /usr/OV/log/netmon.trace
/usr/OV/bin/netmon -a 12
sleep 3

if [ -f /usr/OV/log/netmon.trace ]
then
  ListID=`grep -- "---------- pingList" /usr/OV/log/netmon.trace | cut -d[
-f2 | cut -d] -f1`
  NetmonBehind=`grep $ListID$ /usr/OV/log/netmon.trace | head -1 | cut -d:
-f1 | grep - | sed "s/ //g;s/-//g"`
  if [ -n "$NetmonBehind" ]
  then
    AffectedInterfaces=`egrep "[-].*[:].*$ListID$" /usr/OV/log/netmon.trace
| wc -l | sed "s/ //g"`
    echo "Netmon is $NetmonBehind seconds behind in ping polling."
    echo "There are $AffectedInterfaces interfaces affected."
  else
    echo "Netmon isn't behind in ping polling."
  fi
else
  echo "Netmon is too busy to report now. Try later."
fi

exit
---------------------

Of course you can also do this manual:

cat /dev/null > /usr/OV/log/netmon.trace # empty netmon.trace
/usr/OV/bin/netmon -a 12 # dump the pingList to netmon.trace

Open /usr/OV/log/netmon.trace with your favorite editor.
Search for the lines like:
>>>>>
---------- pingList [0x302ad428] ----------
** xxx elements on the IF list **
  15: 10.100.200.2 (tivnvswitch1) list = 0x302ad428
  20: 10.100.200.3 (tivnvpcping) list = 0x302ad428
....
<<<<<
The number before the colon in the first line of the pingList (in this case
15)
is the number of seconds netmon will wait until it fires the next ping.
If this number is negative, netmon is behind.
The number of lines starting with a negative number is the count of
affected interfaces.

Kind regards

Oliver Bruchhaeuser
Tivoli NetView EMEA L2 Support



|---------+---------------------------->
|         |           "Bursik, Scott   |
|         |           {PBSG}"          |
|         |           <Scott.Bursik@pbs|
|         |           g.com>           |
|         |           Sent by:         |
|         |           owner-nv-l@lists.|
|         |           us.ibm.com       |
|         |                            |
|         |                            |
|         |           08.12.2003 17:39 |
|         |           Please respond to|
|         |           nv-l             |
|         |                            |
|---------+---------------------------->
  
>-------------------------------------------------------------------------------------------------------------------|
  |                                                                             
                                      |
  |       To:       "'nv-l@lists.us.ibm.com'" <nv-l@lists.us.ibm.com>           
                                      |
  |       cc:                                                                   
                                      |
  |       Subject:  RE: [nv-l] 45 minute difference in server response          
                                      |
  |                                                                             
                                      |
  |                                                                             
                                      |
  
>-------------------------------------------------------------------------------------------------------------------|



What number would be considered too big?

When I modify that script to run netmon -a 12 I get a number of 21526. That
seems really high.

Scott Bursik
PepsiCo Business Solutions Group
Enterprise Systems Management
scott.bursik@pbsg.com
(972) 963-1400

-----Original Message-----
From: Oliver Bruchhaeuser [mailto:oliver.bruchhaeuser@de.ibm.com]
Sent: Monday, December 08, 2003 3:45 AM
To: nv-l@lists.us.ibm.com
Subject: Re: [nv-l] 45 minute difference in server response


Scott,

the script only checks if netmon is behind in snmp polling.
I would also check from time to time if netmon is behind in ping polling
(netmon -a 12).

Kind regards

Oliver Bruchhaeuser
Tivoli NetView EMEA L2 Support



|---------+---------------------------->
|         |           "Bursik, Scott   |
|         |           {PBSG}"          |
|         |           <Scott.Bursik@pbs|
|         |           g.com>           |
|         |           Sent by:         |
|         |           owner-nv-l@lists.|
|         |           us.ibm.com       |
|         |                            |
|         |                            |
|         |           04.12.2003 16:03 |
|         |           Please respond to|
|         |           nv-l             |
|         |                            |
|---------+---------------------------->

>
---------------------------------------------------------------------------
---------------------------------------|
  |
|
  |       To:       "Nv-L (nv-l@lists.us.ibm.com)" <nv-l@lists.us.ibm.com>
|
  |       cc:
|
  |       Subject:  [nv-l] 45 minute difference in server response
|
  |
|
  |
|

>
---------------------------------------------------------------------------
---------------------------------------|



NetView 7.1.3 AIX 4.3.3

I have been experiencing issues not detecting a node down in a timely
manner. I have a production and a development machine so I discovered the
node on both machines. They are both at the same level of code and the same
level for the OS. Both NV machines "saw" the same thing when the machine
went down, but there was a difference of about 45 minutes when I looked at
the trapd.logs on both machines. The major difference in these nodes is the
size of the network that they have in the DB. The development machine has
just a few networks discovered so it gets around a lot faster. I am at a
loss. The polling for the nodes is set the same in xnmsnmpconf on both
servers. When ever I check to see the status of netmon with the following
script, netmon isn't behind:


============================================================================


===================
#!/bin/ksh
#set -x
cat /dev/null > /usr/OV/log/netmon.trace

/usr/OV/bin/netmon -a 16

sleep 3

if [ -f /usr/OV/log/netmon.trace ]; then
  echo "Netmon is " `grep [-].*[:] /usr/OV/log/netmon.trace | wc -l `
"behind in snmp polling";

else
  echo "Netmon is too busy to report now. Try later."

fi

exit
============================================================================


==================

Any thoughts what I should look for?

Development Server

Wed Dec 03 19:13:45 2003 cscf03.pepsi.com       N Interface Intel(R) down.
Wed Dec 03 19:13:45 2003 cscf03.pepsi.com       N Node marginal.
Wed Dec 03 19:14:33 2003 cscf03.pepsi.com       N Interface Intel(R) down.
Wed Dec 03 19:14:34 2003 cscf03.pepsi.com       N Interface Broadcom down.
Wed Dec 03 19:14:59 2003 cscf03.pepsi.com       N Interface Intel(R) down.
Wed Dec 03 19:14:59 2003 cscf03.pepsi.com       N Node Down.
Wed Dec 03 20:21:45 2003 cscf03.pepsi.com       N Interface Intel(R) up.
Wed Dec 03 20:21:45 2003 cscf03.pepsi.com       N Node marginal.
Wed Dec 03 20:22:33 2003 cscf03.pepsi.com       N Interface Intel(R) up.
Wed Dec 03 20:22:34 2003 cscf03.pepsi.com       N Interface Broadcom up.
Wed Dec 03 20:23:00 2003 cscf03.pepsi.com       N Interface Intel(R) up.
Wed Dec 03 20:23:00 2003 cscf03.pepsi.com       N Node Up.


Production Server

Wed Dec 03 19:57:39 2003 cscf03.pepsi.com          N Interface Intel(R)
down.
Wed Dec 03 19:57:39 2003 cscf03.pepsi.com          N Node marginal.
Wed Dec 03 20:05:51 2003 cscf03.pepsi.com          N Interface Intel(R)
down.
Wed Dec 03 20:09:50 2003 cscf03.pepsi.com          N Interface Broadcom
down.
Wed Dec 03 20:17:20 2003 cscf03.pepsi.com          N Interface Intel(R)
down.
Wed Dec 03 20:17:20 2003 cscf03.pepsi.com          N Node Down.
Wed Dec 03 20:48:19 2003 cscf03.pepsi.com          N Interface Intel(R) up.
Wed Dec 03 20:48:19 2003 cscf03.pepsi.com          N Node marginal.
Wed Dec 03 20:54:05 2003 cscf03.pepsi.com          N Interface Intel(R) up.
Wed Dec 03 20:58:36 2003 cscf03.pepsi.com          N Interface Broadcom up.
Wed Dec 03 21:04:52 2003 cscf03.pepsi.com          N Interface Intel(R) up.
Wed Dec 03 21:04:52 2003 cscf03.pepsi.com          N Node Up.

<Prev in Thread]	Current Thread	[Next in Thread>
[nv-l] 45 minute difference in server response, Bursik, Scott {PBSG} Re: [nv-l] 45 minute difference in server response, Stephen Hochstetler Re: [nv-l] 45 minute difference in server response, James Shanks RE: [nv-l] 45 minute difference in server response, Bursik, Scott {PBSG} Re: [nv-l] 45 minute difference in server response, Paul Stroud RE: [nv-l] 45 minute difference in server response, Bursik, Scott {PBSG} RE: [nv-l] 45 minute difference in server response, Bursik, Scott {PBSG} Re: [nv-l] 45 minute difference in server response, Oliver Bruchhaeuser RE: [nv-l] 45 minute difference in server response, Bursik, Scott {PBSG} RE: [nv-l] 45 minute difference in server response, Oliver Bruchhaeuser <=

Previous by Date:	[nv-l] Unknown Trap with Enterprise ID 1.3.6.1.4.1.9.9.127.2.0, Georg Gangl
Next by Date:	Re: [nv-l] Unknown Trap with Enterprise ID 1.3.6.1.4.1.9.9.127.2.0, Oliver Bruchhaeuser
Previous by Thread:	RE: [nv-l] 45 minute difference in server response, Bursik, Scott {PBSG}
Next by Thread:	[nv-l] Ruleset action node problem, Pritesh Jewan
Indexes:	[Date] [Thread] [Top] [All Lists]