Scott,
this doesn't sounds really good.
The script (the author was Leslie) count the number of interfaces for which
netmon is behind in ping/snmp polling.
I would carefully check again your polling intervalls.
I modified Leslies script "slightly" so that it shows us also the time
netmon is behind:
---------------------
#!/bin/ksh
#
# pingstatus.sh
#
# A script to check whether netmon can keep up with the polling
# frequency scheduled. Can be called from the Reports menu.
# Output: a messages to stdout
#
#set -x
cat /dev/null > /usr/OV/log/netmon.trace
/usr/OV/bin/netmon -a 12
sleep 3
if [ -f /usr/OV/log/netmon.trace ]
then
ListID=`grep -- "---------- pingList" /usr/OV/log/netmon.trace | cut -d[
-f2 | cut -d] -f1`
NetmonBehind=`grep $ListID$ /usr/OV/log/netmon.trace | head -1 | cut -d:
-f1 | grep - | sed "s/ //g;s/-//g"`
if [ -n "$NetmonBehind" ]
then
AffectedInterfaces=`egrep "[-].*[:].*$ListID$" /usr/OV/log/netmon.trace
| wc -l | sed "s/ //g"`
echo "Netmon is $NetmonBehind seconds behind in ping polling."
echo "There are $AffectedInterfaces interfaces affected."
else
echo "Netmon isn't behind in ping polling."
fi
else
echo "Netmon is too busy to report now. Try later."
fi
exit
---------------------
Of course you can also do this manual:
cat /dev/null > /usr/OV/log/netmon.trace # empty netmon.trace
/usr/OV/bin/netmon -a 12 # dump the pingList to netmon.trace
Open /usr/OV/log/netmon.trace with your favorite editor.
Search for the lines like:
>>>>>
---------- pingList [0x302ad428] ----------
** xxx elements on the IF list **
15: 10.100.200.2 (tivnvswitch1) list = 0x302ad428
20: 10.100.200.3 (tivnvpcping) list = 0x302ad428
....
<<<<<
The number before the colon in the first line of the pingList (in this case
15)
is the number of seconds netmon will wait until it fires the next ping.
If this number is negative, netmon is behind.
The number of lines starting with a negative number is the count of
affected interfaces.
Kind regards
Oliver Bruchhaeuser
Tivoli NetView EMEA L2 Support
|---------+---------------------------->
| | "Bursik, Scott |
| | {PBSG}" |
| | <Scott.Bursik@pbs|
| | g.com> |
| | Sent by: |
| | owner-nv-l@lists.|
| | us.ibm.com |
| | |
| | |
| | 08.12.2003 17:39 |
| | Please respond to|
| | nv-l |
| | |
|---------+---------------------------->
>-------------------------------------------------------------------------------------------------------------------|
|
|
| To: "'nv-l@lists.us.ibm.com'" <nv-l@lists.us.ibm.com>
|
| cc:
|
| Subject: RE: [nv-l] 45 minute difference in server response
|
|
|
|
|
>-------------------------------------------------------------------------------------------------------------------|
What number would be considered too big?
When I modify that script to run netmon -a 12 I get a number of 21526. That
seems really high.
Scott Bursik
PepsiCo Business Solutions Group
Enterprise Systems Management
scott.bursik@pbsg.com
(972) 963-1400
-----Original Message-----
From: Oliver Bruchhaeuser [mailto:oliver.bruchhaeuser@de.ibm.com]
Sent: Monday, December 08, 2003 3:45 AM
To: nv-l@lists.us.ibm.com
Subject: Re: [nv-l] 45 minute difference in server response
Scott,
the script only checks if netmon is behind in snmp polling.
I would also check from time to time if netmon is behind in ping polling
(netmon -a 12).
Kind regards
Oliver Bruchhaeuser
Tivoli NetView EMEA L2 Support
|---------+---------------------------->
| | "Bursik, Scott |
| | {PBSG}" |
| | <Scott.Bursik@pbs|
| | g.com> |
| | Sent by: |
| | owner-nv-l@lists.|
| | us.ibm.com |
| | |
| | |
| | 04.12.2003 16:03 |
| | Please respond to|
| | nv-l |
| | |
|---------+---------------------------->
>
---------------------------------------------------------------------------
---------------------------------------|
|
|
| To: "Nv-L (nv-l@lists.us.ibm.com)" <nv-l@lists.us.ibm.com>
|
| cc:
|
| Subject: [nv-l] 45 minute difference in server response
|
|
|
|
|
>
---------------------------------------------------------------------------
---------------------------------------|
NetView 7.1.3 AIX 4.3.3
I have been experiencing issues not detecting a node down in a timely
manner. I have a production and a development machine so I discovered the
node on both machines. They are both at the same level of code and the same
level for the OS. Both NV machines "saw" the same thing when the machine
went down, but there was a difference of about 45 minutes when I looked at
the trapd.logs on both machines. The major difference in these nodes is the
size of the network that they have in the DB. The development machine has
just a few networks discovered so it gets around a lot faster. I am at a
loss. The polling for the nodes is set the same in xnmsnmpconf on both
servers. When ever I check to see the status of netmon with the following
script, netmon isn't behind:
============================================================================
===================
#!/bin/ksh
#set -x
cat /dev/null > /usr/OV/log/netmon.trace
/usr/OV/bin/netmon -a 16
sleep 3
if [ -f /usr/OV/log/netmon.trace ]; then
echo "Netmon is " `grep [-].*[:] /usr/OV/log/netmon.trace | wc -l `
"behind in snmp polling";
else
echo "Netmon is too busy to report now. Try later."
fi
exit
============================================================================
==================
Any thoughts what I should look for?
Development Server
Wed Dec 03 19:13:45 2003 cscf03.pepsi.com N Interface Intel(R) down.
Wed Dec 03 19:13:45 2003 cscf03.pepsi.com N Node marginal.
Wed Dec 03 19:14:33 2003 cscf03.pepsi.com N Interface Intel(R) down.
Wed Dec 03 19:14:34 2003 cscf03.pepsi.com N Interface Broadcom down.
Wed Dec 03 19:14:59 2003 cscf03.pepsi.com N Interface Intel(R) down.
Wed Dec 03 19:14:59 2003 cscf03.pepsi.com N Node Down.
Wed Dec 03 20:21:45 2003 cscf03.pepsi.com N Interface Intel(R) up.
Wed Dec 03 20:21:45 2003 cscf03.pepsi.com N Node marginal.
Wed Dec 03 20:22:33 2003 cscf03.pepsi.com N Interface Intel(R) up.
Wed Dec 03 20:22:34 2003 cscf03.pepsi.com N Interface Broadcom up.
Wed Dec 03 20:23:00 2003 cscf03.pepsi.com N Interface Intel(R) up.
Wed Dec 03 20:23:00 2003 cscf03.pepsi.com N Node Up.
Production Server
Wed Dec 03 19:57:39 2003 cscf03.pepsi.com N Interface Intel(R)
down.
Wed Dec 03 19:57:39 2003 cscf03.pepsi.com N Node marginal.
Wed Dec 03 20:05:51 2003 cscf03.pepsi.com N Interface Intel(R)
down.
Wed Dec 03 20:09:50 2003 cscf03.pepsi.com N Interface Broadcom
down.
Wed Dec 03 20:17:20 2003 cscf03.pepsi.com N Interface Intel(R)
down.
Wed Dec 03 20:17:20 2003 cscf03.pepsi.com N Node Down.
Wed Dec 03 20:48:19 2003 cscf03.pepsi.com N Interface Intel(R) up.
Wed Dec 03 20:48:19 2003 cscf03.pepsi.com N Node marginal.
Wed Dec 03 20:54:05 2003 cscf03.pepsi.com N Interface Intel(R) up.
Wed Dec 03 20:58:36 2003 cscf03.pepsi.com N Interface Broadcom up.
Wed Dec 03 21:04:52 2003 cscf03.pepsi.com N Interface Intel(R) up.
Wed Dec 03 21:04:52 2003 cscf03.pepsi.com N Node Up.
|