nv-l
[Top] [All Lists]

RE: [nv-l] netmon problem

To: nv-l@lists.tivoli.com
Subject: RE: [nv-l] netmon problem
From: "Oliver Bruchhaeuser" <oliver.bruchhaeuser@de.ibm.com>
Date: Mon, 29 Jul 2002 16:51:12 +0200
Jason,

another thing to check ... remembering those 5.1 problems ...

Do you have a file called
/usr/OV/bin/netnmrc
with the contents:
------------
ulimit -c unlimited
ulimit -f unlimited
ulimit -m unlimited
ulimit -t unlimited
ulimit -d unlimited
ulimit -s unlimited
------------
?

If not ... create it ...
stop all daemons with
ovstop nvsecd
and restart them with
/etc/netnmrc

(see NetView Release Notes 5.1, page 10,
NetView Release Notes 5.1.1, page 11,
NetView Release Notes 5.1.2, page 15,
NetView Release Notes 5.1.3, page 17)

Kind regards

Oliver Bruchhaeuser
Tivoli NetView EMEA L2 Support

IBM Deutschland GmbH - ITS Tivoli - Dept. 7977 - Hechtsheimer Str. 2 -
55131 Mainz - Germany
Phone: +49-6131-84-5108 - Fax: +49-6131-84-6585 - email:
bruchhae@de.ibm.com

Need help with Tivoli Software Products?
Ask Tivoli!
http://www.tivoli.com/asktivoli (login with your customer account)


|---------+---------------------------->
|         |           James            |
|         |           Shanks/Raleigh/IB|
|         |           M@IBMUS          |
|         |                            |
|         |           29.07.2002 15:30 |
|         |                            |
|---------+---------------------------->
  
>-------------------------------------------------------------------------------------------------------------------------|
  |                                                                             
                                            |
  |       To:       nv-l@lists.tivoli.com                                       
                                            |
  |       cc:                                                                   
                                            |
  |       Subject:  RE: [nv-l] netmon problem                                   
                                            |
  |                                                                             
                                            |
  |                                                                             
                                            |
  
>-------------------------------------------------------------------------------------------------------------------------|




I have been on vacation for the last ten days so I am not certain what has
transpired with your problem since this update.  I didn't see anything
newer.
As you are running 5.1.3, diagnosing this will be difficult because that
code base has been removed even from IBM's own machines.
I would suggest three courses of action.

(1) Does your trapd .log show messages that netmon or other daemons are
disconnecting from trapd?  If so, then I would try raising the application
queue size in trapd to a much larger size from the default of 2000, say
"-B 10000" or even "-B 20000".   You can set this in any of the usual ways
by adding that option to the ovsuf file directly or by editing the lrf
files and using /ovdelobj/ovaddobj trapd.lrf.  SMIT/serversetup  will do
it for you too.

(2) Since you suspect your script is the culprit, change it.  Try sending
out only a 100 traps at a time, waiting 15 or twenty minutes, and sending
the next hundred.  The idea is to either slow things down so that the
problem goes away or  by being more and more selective about what you
send, find the trap which is causing netmon to go wonky.  Since the
problem only happens when you run your script, then you are in control of
it.

(3)  browse your /usr/OV/PD/cores/netmon/core.report.   Find the heading
"Functions" which shows the function stack and post it here.  Maybe it
will be familiar enough to someone to ring a bell about some problem they
had or indicate some APAR that was fixed.  This last deal is very "iffy"
since even IBM could not send you a fix for that problem now even if they
wanted to.  That's what being "out of support" means.

Good luck.

James Shanks
Level 3 Support  for Tivoli NetView for UNIX and NT
Tivoli Software / IBM Software Group
----- Forwarded by James Shanks/Raleigh/IBM on 07/29/2002 09:15 AM -----


"Allison, Jason (JALLISON)" <JALLISON@arinc.com>
07/19/2002 03:29 PM


        To:     "'nv-l'" <nv-l@lists.tivoli.com>
        cc:     "Allison, Jason (JALLISON)" <JALLISON@arinc.com>
        Subject:        RE: [nv-l] netmon problem



Leslie, thanks for the response.  There are at least three reasons I have
not called support:

1.  This is occuring on a development machine, not one in the production
environment.
2.  I am running Netview 5.1.3
3.  I am pretty sure I caused these symptoms by running my
'flood-trap.ksh'
script which parses our 300 billion line (sarcasm) trapd.conf file to send
out as quick as possible, 'dummy traps'.  It sends out ~1k using the
snmptrap command.

This is the end exerpt of my netmon.trace file.  I ran:
# sudo ovstart netmon
# sudo netmon -M -1
---------------------------------------------------------------
19:06:50 : nl_main.c[512] : ** waiting for 1 19:06:51 : nl_main.c[546] :
--
timed out **
19:06:51 : nl_pinger.c[234] : sending ping to 192.168.226.251 seqnum =
11225
ident = 11138 timeout = 3
19:06:51 : nl_pinger.c[234] : sending ping to 172.16.1.22 seqnum = 11226
ident = 11138 timeout = 2
19:06:51 : nl_main.c[512] : ** waiting for 1 19:06:51 : nl_main.c[552] :
--
received stuff 0x40 **
19:06:51 : nl_pinger.c[916] : -> received ping from 172.16.1.22 (hp5m)
19:06:51 : nl_main.c[512] : ** waiting for 1 19:06:52 : nl_main.c[546] :
--
timed out **
19:06:52 : nl_pinger.c[234] : sending ping to 172.16.1.23 seqnum = 11227
ident = 11138 timeout = 2
19:06:52 : nl_main.c[512] : ** waiting for 1 19:06:52 : nl_main.c[552] :
--
received stuff 0x40 **
19:06:52 : nl_pinger.c[916] : -> received ping from 172.16.1.23
(DialBackUp)
19:06:52 : nl_main.c[512] : ** waiting for 1 19:06:53 : nl_main.c[546] :
--
timed out **
19:06:53 : nl_pinger.c[234] : sending ping to 144.243.222.9 seqnum = 11228
ident = 11138 timeout = 2
19:06:53 : nl_main.c[512] : ** waiting for 1 19:06:53 : nl_main.c[552] :
--
received stuff 0x40 **
19:06:53 : nl_pinger.c[916] : -> received ping from 144.243.222.9
(172.16.1.62)
19:06:54 : nl_pinger.c[413] :  expired ping to 192.168.226.251
(172.16.1.61)
seqnum = 11225 ident = 11138
19:06:54 : nl_pinger.c[234] : sending ping to 172.16.69.254 seqnum = 11229
ident = 11138 timeout = 3
19:06:54 : nl_pinger.c[234] : sending ping to 144.243.44.156 seqnum =
11230
ident = 11138 timeout = 2
19:06:54 : nl_pinger.c[234] : sending ping to 144.243.44.166 seqnum =
11231
ident = 11138 timeout = 2
19:06:54 : nl_pinger.c[234] : sending ping to 144.243.44.131 seqnum =
11232
ident = 11138 timeout = 2
19:06:54 : nl_main.c[512] : ** waiting for 1 19:06:54 : nl_main.c[552] :
--
received stuff 0x40 **
19:06:54 : nl_pinger.c[916] : -> received ping from 144.243.44.156
(CORE_ROUTER1)
19:06:54 : nl_pinger.c[916] : -> received ping from 144.243.44.166
(CORE_ROUTER1)
19:06:54 : nl_pinger.c[916] : -> received ping from 144.243.44.131
(noc_ap)
19:06:54 : nl_main.c[512] : ** waiting for 1 19:06:55 : nl_main.c[546] :
--
timed out **
19:06:55 : nl_pinger.c[234] : sending ping to 192.168.65.9 seqnum = 11233
ident = 11138 timeout = 3
19:06:55 : nl_pinger.c[234] : sending ping to 144.243.44.155 seqnum =
11234
ident = 11138 timeout = 2
19:06:55 : nl_main.c[512] : ** waiting for 1 19:06:55 : nl_main.c[552] :
--
received stuff 0x40 **
19:06:55 : nl_pinger.c[836] : -> unexpected ICMP message 3 from
172.16.1.17
19:06:55 : nl_pinger.c[1077] : ICMP message from: 172.16.1.17
19:06:55 : nl_main.c[512] : ** waiting for 1 19:06:55 : nl_main.c[552] :
--
received stuff 0x40 **
19:06:55 : nl_pinger.c[916] : -> received ping from 144.243.44.155
(CORE_ROUTER1)
19:06:55 : nl_main.c[512] : ** waiting for 1 19:06:56 : nl_main.c[546] :
--
timed out **
19:06:56 : nl_pinger.c[234] : sending ping to 172.16.1.53 seqnum = 11235
ident = 11138 timeout = 3
19:06:56 : nl_pinger.c[234] : sending ping to 192.168.65.8 seqnum = 11236
ident = 11138 timeout = 3
19:06:56 : nl_fixup.c[279] : fixupNmNodeSnmpConf() for etb2_dms3
19:06:56 : nl_fixup.c[362] : ... interval = 30, timeout = 3, retries = 5
... community = 'public', setcommunity = 'public',proxyName = '<none>'
... proxyAddr = 0.0.0.0, checked proxy address = FALSE
... Node Down = 604800, Discovery Poll? = YES, Auto Adjust? = YES, Fixed
Interval = 900, Daily Config = 86400, Route Entries = 800
19:06:56 : nl_snmper.c[189] : xmitone_snmp(before send):size of snmpWait =
0
19:06:56 : nl_snmper.c[258] : sending SNMP to 172.16.1.28 op = DAILY req =
Objid reqid = 11138
19:06:56 : nl_snmper.c[291] : xmitone_snmp(after send): size of snmpWait =
1
19:06:56 : nl_fixup.c[279] : fixupNmNodeSnmpConf() for etb2_dms4
19:06:56 : nl_fixup.c[362] : ... interval = 30, timeout = 3, retries = 5
... community = 'public', setcommunity = 'public',proxyName = '<none>'
... proxyAddr = 0.0.0.0, checked proxy address = FALSE
... Node Down = 604800, Discovery Poll? = YES, Auto Adjust? = YES, Fixed
Interval = 900, Daily Config = 86400, Route Entries = 800
19:06:56 : nl_snmper.c[189] : xmitone_snmp(before send):size of snmpWait =
1
19:06:56 : nl_snmper.c[258] : sending SNMP to 172.16.1.29 op = DAILY req =
Objid reqid = 11139
19:06:56 : nl_snmper.c[291] : xmitone_snmp(after send): size of snmpWait =
2
19:06:56 : nl_fixup.c[279] : fixupNmNodeSnmpConf() for CORE_ROUTER2
19:06:56 : nl_fixup.c[362] : ... interval = 30, timeout = 3, retries = 5
... community = 'public', setcommunity = 'public',proxyName = '<none>'
... proxyAddr = 0.0.0.0, checked proxy address = FALSE
... Node Down = 604800, Discovery Poll? = YES, Auto Adjust? = YES, Fixed
Interval = 900, Daily Config = 86400, Route Entries = 800
19:06:56 : nl_snmper.c[189] : xmitone_snmp(before send):size of snmpWait =
2
19:06:56 : nl_snmper.c[258] : sending SNMP to 172.16.1.18 op = DAILY req =
Objid reqid = 11140
19:06:56 : nl_snmper.c[291] : xmitone_snmp(after send): size of snmpWait =
3
19:06:56 : nl_fixup.c[279] : fixupNmNodeSnmpConf() for CORE_ROUTER1
19:06:56 : nl_fixup.c[362] : ... interval = 30, timeout = 3, retries = 5
... community = 'public', setcommunity = 'public',proxyName = '<none>'
... proxyAddr = 0.0.0.0, checked proxy address = FALSE
... Node Down = 604800, Discovery Poll? = YES, Auto Adjust? = YES, Fixed
Interval = 900, Daily Config = 86400, Route Entries = 800
19:06:56 : nl_snmper.c[189] : xmitone_snmp(before send):size of snmpWait =
3
19:06:56 : nl_snmper.c[258] : sending SNMP to 172.16.1.17 op = DAILY req =
Objid reqid = 11141
19:06:56 : nl_snmper.c[291] : xmitone_snmp(after send): size of snmpWait =
4
19:06:56 : nl_fixup.c[279] : fixupNmNodeSnmpConf() for DialBackUp
19:06:56 : nl_fixup.c[362] : ... interval = 30, timeout = 3, retries = 5
... community = 'public', setcommunity = 'public',proxyName = '<none>'
... proxyAddr = 0.0.0.0, checked proxy address = FALSE
... Node Down = 604800, Discovery Poll? = YES, Auto Adjust? = YES, Fixed
Interval = 900, Daily Config = 86400, Route Entries = 800
19:06:56 : nl_snmper.c[189] : xmitone_snmp(before send):size of snmpWait =
4
19:06:56 : nl_snmper.c[258] : sending SNMP to 172.16.1.23 op = DAILY req =
Objid reqid = 11142
19:06:56 : nl_snmper.c[291] : xmitone_snmp(after send): size of snmpWait =
5
19:06:56 : nl_fixup.c[279] : fixupNmNodeSnmpConf() for CPS_HUB1
19:06:56 : nl_fixup.c[362] : ... interval = 30, timeout = 3, retries = 5
... community = 'public', setcommunity = 'public',proxyName = '<none>'
... proxyAddr = 0.0.0.0, checked proxy address = FALSE
... Node Down = 604800, Discovery Poll? = YES, Auto Adjust? = YES, Fixed
Interval = 900, Daily Config = 86400, Route Entries = 800
19:06:56 : nl_snmper.c[189] : xmitone_snmp(before send):size of snmpWait =
5
19:06:56 : nl_snmper.c[258] : sending SNMP to 172.16.1.24 op = DAILY req =
Objid reqid = 11143
19:06:56 : nl_snmper.c[291] : xmitone_snmp(after send): size of snmpWait =
6
19:06:56 : nl_fixup.c[279] : fixupNmNodeSnmpConf() for CPS_HUB2
19:06:56 : nl_fixup.c[362] : ... interval = 30, timeout = 3, retries = 5
... community = 'public', setcommunity = 'public',proxyName = '<none>'
... proxyAddr = 0.0.0.0, checked proxy address = FALSE
... Node Down = 604800, Discovery Poll? = YES, Auto Adjust? = YES, Fixed
Interval = 900, Daily Config = 86400, Route Entries = 800
19:06:56 : nl_snmper.c[189] : xmitone_snmp(before send):size of snmpWait =
6
19:06:56 : nl_snmper.c[258] : sending SNMP to 172.16.1.25 op = DAILY req =
Objid reqid = 11144
19:06:56 : nl_snmper.c[291] : xmitone_snmp(after send): size of snmpWait =
7
19:06:56 : nl_fixup.c[279] : fixupNmNodeSnmpConf() for CORE_SWITCH2
19:06:56 : nl_fixup.c[362] : ... interval = 30, timeout = 3, retries = 5
... community = 'public', setcommunity = 'public',proxyName = '<none>'
... proxyAddr = 0.0.0.0, checked proxy address = FALSE
... Node Down = 604800, Discovery Poll? = YES, Auto Adjust? = YES, Fixed
Interval = 900, Daily Config = 86400, Route Entries = 800
19:06:56 : nl_snmper.c[189] : xmitone_snmp(before send):size of snmpWait =
7
19:06:56 : nl_snmper.c[258] : sending SNMP to 172.16.1.36 op = DAILY req =
Objid reqid = 11145
19:06:56 : nl_snmper.c[291] : xmitone_snmp(after send): size of snmpWait =
8
19:06:56 : nl_fixup.c[279] : fixupNmNodeSnmpConf() for CORE_SWITCH1
19:06:56 : nl_fixup.c[362] : ... interval = 30, timeout = 3, retries = 5
... community = 'public', setcommunity = 'public',proxyName = '<none>'
... proxyAddr = 0.0.0.0, checked proxy address = FALSE
... Node Down = 604800, Discovery Poll? = YES, Auto Adjust? = YES, Fixed
Interval = 900, Daily Config = 86400, Route Entries = 800
19:06:56 : nl_snmper.c[189] : xmitone_snmp(before send):size of snmpWait =
8
19:06:56 : nl_snmper.c[258] : sending SNMP to 172.16.1.35 op = DAILY req =
Objid reqid = 11146
19:06:56 : nl_snmper.c[291] : xmitone_snmp(after send): size of snmpWait =
9
19:06:56 : nl_fixup.c[279] : fixupNmNodeSnmpConf() for WAN_NIR4
19:06:56 : nl_fixup.c[362] : ... interval = 30, timeout = 3, retries = 5
... community = 'public', setcommunity = 'public',proxyName = '<none>'
... proxyAddr = 0.0.0.0, checked proxy address = FALSE
... Node Down = 604800, Discovery Poll? = YES, Auto Adjust? = YES, Fixed
Interval = 900, Daily Config = 86400, Route Entries = 800
19:06:56 : nl_snmper.c[189] : xmitone_snmp(before send):size of snmpWait =
9
19:06:56 : nl_snmper.c[258] : sending SNMP to 172.16.15.70 op = DAILY req
=
Objid reqid = 11147
19:06:56 : nl_snmper.c[291] : xmitone_snmp(after send): size of snmpWait =
10
19:06:56 : nl_fixup.c[279] : fixupNmNodeSnmpConf() for WAN_NIR3
19:06:56 : nl_fixup.c[362] : ... interval = 30, timeout = 3, retries = 5
... community = 'public', setcommunity = 'public',proxyName = '<none>'
... proxyAddr = 0.0.0.0, checked proxy address = FALSE
... Node Down = 604800, Discovery Poll? = YES, Auto Adjust? = YES, Fixed
Interval = 900, Daily Config = 86400, Route Entries = 800
19:06:56 : nl_snmper.c[189] : xmitone_snmp(before send):size of snmpWait =
10
19:06:56 : nl_snmper.c[258] : sending SNMP to 172.16.15.69 op = DAILY req
=
Objid reqid = 11148
19:06:56 : nl_snmper.c[291] : xmitone_snmp(after send): size of snmpWait =
11
19:06:56 : nl_fixup.c[279] : fixupNmNodeSnmpConf() for WAN_NIR2
19:06:56 : nl_fixup.c[362] : ... interval = 30, timeout = 3, retries = 5
... community = 'public', setcommunity = 'public',proxyName = '<none>'
... proxyAddr = 0.0.0.0, checked proxy address = FALSE
... Node Down = 604800, Discovery Poll? = YES, Auto Adjust? = YES, Fixed
Interval = 900, Daily Config = 86400, Route Entries = 800
19:06:56 : nl_snmper.c[189] : xmitone_snmp(before send):size of snmpWait =
11
19:06:56 : nl_snmper.c[258] : sending SNMP to 172.16.15.68 op = DAILY req
=
Objid reqid = 11149
19:06:56 : nl_snmper.c[291] : xmitone_snmp(after send): size of snmpWait =
12
---------------------------------------------------------------

Then netmon dies.  I really dont know what I am looking at, I am far from
a
netmon expert.

Thanks for the email,

Jason Allison
Principal Engineer
ARINC Incorporated
Office:  (410) 266-2006
FAX:  (410) 573-3026


-----Original Message-----
From: Leslie Clark [mailto:lclark@us.ibm.com]
Sent: Friday, July 19, 2002 1:57 PM
To: nv-l@lists.tivoli.com
Subject: RE: [nv-l] netmon problem


Jason, you do not want to involve Support, but you would have solved it by
now if you had. This is not a support channel.

I will tell you one thing for free: you should be looking in the
netmon.trace file
to see what netmon is doing when it hangs up and dies. You can toggle
tracing
on with 'netmon -M -1' and off with 'netmon -M 0'. If it never starts
talking, then
it is too busy to get the message. In that case, turn it on at startup by
setting it
with daemon configuration.

Reading the netmon.trace is a whole lot more fun than reading the core
report.

Cordially,

Leslie A. Clark
IBM Global Services - Systems Mgmt & Networking
Detroit


---------------------------------------------------------------------
To unsubscribe, e-mail: nv-l-unsubscribe@lists.tivoli.com
For additional commands, e-mail: nv-l-help@lists.tivoli.com

*NOTE*
This is not an Offical Tivoli Support forum. If you need immediate
assistance from Tivoli please call the IBM Tivoli Software Group
help line at 1-800-TIVOLI8(848-6548)

---------------------------------------------------------------------
To unsubscribe, e-mail: nv-l-unsubscribe@lists.tivoli.com
For additional commands, e-mail: nv-l-help@lists.tivoli.com

*NOTE*
This is not an Offical Tivoli Support forum. If you need immediate
assistance from Tivoli please call the IBM Tivoli Software Group
help line at 1-800-TIVOLI8(848-6548)





---------------------------------------------------------------------
To unsubscribe, e-mail: nv-l-unsubscribe@lists.tivoli.com
For additional commands, e-mail: nv-l-help@lists.tivoli.com

*NOTE*
This is not an Offical Tivoli Support forum. If you need immediate
assistance from Tivoli please call the IBM Tivoli Software Group
help line at 1-800-TIVOLI8(848-6548)





<Prev in Thread] Current Thread [Next in Thread>

Archive operated by Skills 1st Ltd

See also: The NetView Web