Thanks Leslie!
Just to clarify, 1700 was not the number of interfaces, it was the number of
hubs/routers/switches/servers. I used the script you provided and had to
increase the sleep time to 20 seconds. Anything less just resulted in the
"Netmon is too busy" message. At 20 seconds, I typically get:
Netmon is 3427 behind in status pinging
I don't really understand what I'm looking at, though.
The box itself is currently located on one of the backbone switches. I haven't
taken any other action, because I wanted to understand what I was looking at.
In netmon.trace I see things such as:
1043: 162.131.203.57 () list = 0x202aa858
or
-21138: 162.131.115.1 (tower1-feth-5-0.net.principal.com) list = 0x202aa7b8
Can you explain these any more?
Thanks!
Craig
> -----Original Message-----
> From: Leslie Clark [mailto:lclark@us.ibm.com]
> Sent: Wednesday, May 30, 2001 8:04 PM
> To: IBM NetView Discussion
> Subject: Re: [NV-L] Confirmation of Netview pinging
>
>
> A couple of things.
>
> Remember your normal response is 40ms, not 1 sec. Yes, it
> will take a while
> to make the rounds if everything is down. But I hope your
> normal state is
> that everything is up.
>
> The number of outstanding pings is configurable. I think the current
> default
> is 16 (it was 10, years ago). It has been tested at up to 64.
> That means it
> can send off pings to 64 nodes at once, and as they repond,
> send out more.
> That number is the number of nodes it can be waiting on at one time
> (waiting
> an average of 40ms, you say). Set it in
> /usr/OV/lrf/netmon.lrf, adding the
> '-q' parameter. Use -q 32 to set the ping queue, and -Q 32 to
> set the snmp
> request queue. Experiment to see if you have the CPU and
> interface speed to
> back it up. I have never seen it overrun the adapter, but I
> have seen it
> use
> up all of the cpu.
>
> 1700 interfaces is not a lot. You should be able to handle that in 5
> minutes
> easily on just about any box, using the default timeout/retry
> of 2 and 3.
> Some caveats: If you have a lot of unpingable interfaces in
> your map, clear
> them up. They clog up the ping queue (or increase the ping queue).
> Acknowledged counts, too, since they are still pinged. Make
> sure your name
> resolution method is really fast. That slows everything down
> more than you
> would expect. If you are having problems with false alarms,
> make note of
> them
> and tune them individually to accomodate normal variations in
> the network,
> rather than increase the timeout across the board. Make sure
> you box is
> centrally located in the network, with the most reliable connection
> available,
> and make sure that connection is running at full-duplex if
> the connection
> supports it.
>
> Here's a little script to help you monitor how well netmon is keeping
> up with the status polling. See how fast it catches up when
> it gets behind.
>
> #!/bin/ksh
> #
> # pingstatus.sh
> #
> # A script to check whether netmon can keep up with the polling
> # frequency scheduled. Can be called from the Reports menu.
> # Output: a messages to stdout
> # Note: not reliable if netmon tracing is going on!
> #
> #set -x
> rm /usr/OV/log/netmon.trace
> netmon -a 12
> sleep 3
> if [ -f /usr/OV/log/netmon.trace ]; then
> echo "Netmon is " `grep [-].*[:] /usr/OV/log/netmon.trace |
> wc -l ` \
> "behind in status pinging";
> else
> echo "Netmon is too busy to report now. Try later."
> fi
> exit
>
>
> Cordially,
>
> Leslie A. Clark
> IBM Global Services - Systems Mgmt & Networking
> Detroit
>
>
> "Treptow, Craig" <Treptow.Craig@principal.com>@tkg.com on 05/30/2001
> 04:39:22 PM
>
> Please respond to IBM NetView Discussion <nv-l@tkg.com>
>
> Sent by: owner-nv-l@tkg.com
>
>
> To: "NetView List (E-mail)" <nv-l@tkg.com>
> cc:
> Subject: [NV-L] Confirmation of Netview pinging
>
>
>
> Hi. We are running Netview 6.0.2 on AIX 4.3. We are wanting
> to move to a
> more proactive approach to problem notifications. Our hope is to ping
> servers/hubs/switches/routers and generate events when they aren't
> reachable. This would make use of the Netview features to reduce the
> "noisy" pages, etc. In preparation for this, I was running
> some numbers
> and would like some input to see if I am flawed somewhere:
>
> Average response time for pings = 40ms (includes LAN and WAN)
> Total devices to ping 1700. (and growing at about 30 per month)
> # outstanding pings = 10 (Is this true? Does it affect my
> numbers? If so,
> how?)
> Retries = 0
> Timeout = 1 sec
> One Netview machine.
>
> Netview could only ping 2 devices per second for a total of
> 120 per minute.
> 1700 / 120 = 14 minutes to complete one ping cycle.
>
> So this would mean that using this method, we would only find
> out about a
> down device after 14 minutes at best? I don't think anybody
> would accept
> this long of a window.
>
> Assuming the above is true, it appears that it is time for
> use to look into
> a different Netview architecture that could achieve our goals?
>
> I'm just looking for some insight into how Netview pings and
> if my numbers
> are even reasonable, etc. Thanks for any help you can provide.
>
> Craig
>
> P.S. I have searched the archives, but there appears to be many open
> questions on this topic. Also, no form of netmon -a ?, or
> any other flag
> produced output in the netmon.trace file.
> ______________________________________________________________
> ___________
> NV-L List information and Archives: http://www.tkg.com/nv-l
>
>
> ______________________________________________________________
> ___________
|