The 20 seconds is how long it takes netmon to get around to responding
to your request that he dump the report. The script counts the number of
records, such as the one below, that were scheduled for times that have
already passed. Note that near the top of the netmon.trace they have
negatives, and hopefully, at the bottom, there will be some with positive
numbers. The positive numbers tell you that this node is schedule to be
pinged that many seconds in the future. Or some such time unit. That looks
like you have some nodes that have not been pinged in hours, so maybe
the units are not seconds.
-21138: 162.131.115.1 (tower1-feth-5-0.net.principal.com) ...
When you first start netmon up, it is always behind. It has a lot to do at
startup.
Try letting it run for half an hour or so, and see if it catches up. What
it says
at startup is not something I worry about. It is the steady-state behavior
that
you want to tune for. I suspect that you do have some performance issues to
deal with, though, because it sort of looks from this as if it is never
catching up.
Which is why you are writing, I guess. What does it say after it has been
running
for an hour? How often are you sending it off to do the configuration poll
(which
defaults to once a day)? How good is your name resolution? Oh - and how
many names are in your seedfile? All of them? I have seen that delay
netmon startup by half an hour unneccessarily.
Cordially,
Leslie A. Clark
IBM Global Services - Systems Mgmt & Networking
Detroit
"Treptow, Craig" <Treptow.Craig@principal.com>@tkg.com on 06/01/2001
08:58:30 AM
Please respond to IBM NetView Discussion <nv-l@tkg.com>
Sent by: owner-nv-l@tkg.com
To: "'IBM NetView Discussion'" <nv-l@tkg.com>
cc:
Subject: RE: [NV-L] Confirmation of Netview pinging
Thanks Leslie!
Just to clarify, 1700 was not the number of interfaces, it was the number
of hubs/routers/switches/servers. I used the script you provided and had
to increase the sleep time to 20 seconds. Anything less just resulted in
the "Netmon is too busy" message. At 20 seconds, I typically get:
Netmon is 3427 behind in status pinging
I don't really understand what I'm looking at, though.
The box itself is currently located on one of the backbone switches. I
haven't taken any other action, because I wanted to understand what I was
looking at. In netmon.trace I see things such as:
1043: 162.131.203.57 () list = 0x202aa858
or
-21138: 162.131.115.1 (tower1-feth-5-0.net.principal.com) list = 0x202aa7b8
Can you explain these any more?
Thanks!
Craig
> -----Original Message-----
> From: Leslie Clark [mailto:lclark@us.ibm.com]
> Sent: Wednesday, May 30, 2001 8:04 PM
> To: IBM NetView Discussion
> Subject: Re: [NV-L] Confirmation of Netview pinging
>
>
> A couple of things.
>
> Remember your normal response is 40ms, not 1 sec. Yes, it
> will take a while
> to make the rounds if everything is down. But I hope your
> normal state is
> that everything is up.
>
> The number of outstanding pings is configurable. I think the current
> default
> is 16 (it was 10, years ago). It has been tested at up to 64.
> That means it
> can send off pings to 64 nodes at once, and as they repond,
> send out more.
> That number is the number of nodes it can be waiting on at one time
> (waiting
> an average of 40ms, you say). Set it in
> /usr/OV/lrf/netmon.lrf, adding the
> '-q' parameter. Use -q 32 to set the ping queue, and -Q 32 to
> set the snmp
> request queue. Experiment to see if you have the CPU and
> interface speed to
> back it up. I have never seen it overrun the adapter, but I
> have seen it
> use
> up all of the cpu.
>
> 1700 interfaces is not a lot. You should be able to handle that in 5
> minutes
> easily on just about any box, using the default timeout/retry
> of 2 and 3.
> Some caveats: If you have a lot of unpingable interfaces in
> your map, clear
> them up. They clog up the ping queue (or increase the ping queue).
> Acknowledged counts, too, since they are still pinged. Make
> sure your name
> resolution method is really fast. That slows everything down
> more than you
> would expect. If you are having problems with false alarms,
> make note of
> them
> and tune them individually to accomodate normal variations in
> the network,
> rather than increase the timeout across the board. Make sure
> you box is
> centrally located in the network, with the most reliable connection
> available,
> and make sure that connection is running at full-duplex if
> the connection
> supports it.
>
> Here's a little script to help you monitor how well netmon is keeping
> up with the status polling. See how fast it catches up when
> it gets behind.
>
> #!/bin/ksh
> #
> # pingstatus.sh
> #
> # A script to check whether netmon can keep up with the polling
> # frequency scheduled. Can be called from the Reports menu.
> # Output: a messages to stdout
> # Note: not reliable if netmon tracing is going on!
> #
> #set -x
> rm /usr/OV/log/netmon.trace
> netmon -a 12
> sleep 3
> if [ -f /usr/OV/log/netmon.trace ]; then
> echo "Netmon is " `grep [-].*[:] /usr/OV/log/netmon.trace |
> wc -l ` \
> "behind in status pinging";
> else
> echo "Netmon is too busy to report now. Try later."
> fi
> exit
>
>
> Cordially,
>
> Leslie A. Clark
> IBM Global Services - Systems Mgmt & Networking
> Detroit
>
>
> "Treptow, Craig" <Treptow.Craig@principal.com>@tkg.com on 05/30/2001
> 04:39:22 PM
>
> Please respond to IBM NetView Discussion <nv-l@tkg.com>
>
> Sent by: owner-nv-l@tkg.com
>
>
> To: "NetView List (E-mail)" <nv-l@tkg.com>
> cc:
> Subject: [NV-L] Confirmation of Netview pinging
>
>
>
> Hi. We are running Netview 6.0.2 on AIX 4.3. We are wanting
> to move to a
> more proactive approach to problem notifications. Our hope is to ping
> servers/hubs/switches/routers and generate events when they aren't
> reachable. This would make use of the Netview features to reduce the
> "noisy" pages, etc. In preparation for this, I was running
> some numbers
> and would like some input to see if I am flawed somewhere:
>
> Average response time for pings = 40ms (includes LAN and WAN)
> Total devices to ping 1700. (and growing at about 30 per month)
> # outstanding pings = 10 (Is this true? Does it affect my
> numbers? If so,
> how?)
> Retries = 0
> Timeout = 1 sec
> One Netview machine.
>
> Netview could only ping 2 devices per second for a total of
> 120 per minute.
> 1700 / 120 = 14 minutes to complete one ping cycle.
>
> So this would mean that using this method, we would only find
> out about a
> down device after 14 minutes at best? I don't think anybody
> would accept
> this long of a window.
>
> Assuming the above is true, it appears that it is time for
> use to look into
> a different Netview architecture that could achieve our goals?
>
> I'm just looking for some insight into how Netview pings and
> if my numbers
> are even reasonable, etc. Thanks for any help you can provide.
>
> Craig
>
> P.S. I have searched the archives, but there appears to be many open
> questions on this topic. Also, no form of netmon -a ?, or
> any other flag
> produced output in the netmon.trace file.
> ______________________________________________________________
> ___________
> NV-L List information and Archives: http://www.tkg.com/nv-l
>
>
> ______________________________________________________________
> ___________
> NV-L List information and Archives: http://www.tkg.com/nv-l
>
_________________________________________________________________________
NV-L List information and Archives: http://www.tkg.com/nv-l
|