nv-l
[Top] [All Lists]

Re: Confirmation of Netview pinging

To: nv-l@lists.tivoli.com
Subject: Re: Confirmation of Netview pinging
From: "Ray" <westphal@accessus.net>
Date: Tue, 5 Jun 2001 18:07:49 -0500
Craig,

Have you checked your ethernet adapter statistics? Have you confirmed that
the F50's adapters and the switch(es) they connect are set correctly?

I had lots of grief with a larger IBM H70 server and improper receive buffer
and unmatched switch settings. I never set the adapters and Cisco switch for
Auto. I match the speed and duplex of the switch and server. You can check
the adapter receive buffers with the entstat -d enx command. Check the No
Receive Buffers line.

There was a mention of arp cache settings on this forum back in March. It
may apply to you as well.

Ray.


----- Original Message -----
From: "Treptow, Craig" <Treptow.Craig@principal.com>
To: "'IBM NetView Discussion'" <nv-l@tkg.com>
Sent: Tuesday, June 05, 2001 1:23 PM
Subject: RE: [NV-L] Confirmation of Netview pinging


> Thank you again for all of the insight.
>
> Let me start with the straight forward stuff:
>
> Polling cycle is 5 minutes.
> ovobjprint -S = Number of objects defined in the database: 15347
> ovtopodump -l = NUMBER OF INTERFACES: 3107
> ovwdb cache = 10000
>
> The machine is a 4 CPU F50 with 1.5GB memory.  The vmstat shows:
>
> # vmstat 5 5
> kthr     memory             page              faults        cpu
> ----- ----------- ------------------------ ------------ -----------
>  r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id wa
>  0  0 189313  1346   0   0   0   0    0   0 116 1641 433 26  4 69  2
>  1  2 189287  1404   0   0   0   0    0   0 464 4954 2060 26  4 71  0
>  1  2 189188  1507   0   0   0   0    0   0 451 6136 2230 27  5 69  0
>  2  2 189191  1497   0   0   0   0    0   0 461 5690 3466 26 27 46  1
>  2  2 189191  1485   0   0   0   0    0   0 458 5486 2530 26 11 62  0
>
> I did a ovtopofix -a last week.
>
> I see that the ovwdb cache should be a little higher, but is there
anything else obvious?
>
> The latest pingstatus shows:
>
> Netmon is  3292 behind in status pinging
>
> I looked at the netmon.trace file and this seems quite reasonable.
>
> What I'm facing, is that we have never cared about the pings before,
because our notification was all based on traps that we recieved from the
devices.  We are wanting to move to pinging to be proactive.  So I'm trying
to figure out what it is going to take for Netview to ping our devices and
what are the issues surrounding that technique.  You've been helping me
tremendously.
>
> Even if I assume the pinging is working fine, I'm not sure what I have to
do to make use of it?
>
> Is it basically dealing with the "Node Down" traps and that's it?
>
> Thanks again.
>
> Craig
>
> > -----Original Message-----
> > From: Leslie Clark [mailto:lclark@us.ibm.com]
> > Sent: Monday, June 04, 2001 5:44 PM
> > To: IBM NetView Discussion
> > Subject: RE: [NV-L] Confirmation of Netview pinging
> >
> >
> > A couple of things to think about:
> >
> > Is that script reporting correctly? Look at the netmon.trace
> > that it is
> > evaluating and see if it is reasonable. Does it look like
> > there are that
> > many entries starting with negative numbers?
> >
> > You have not said what the scheduled polling cyle is at the moment.
> > The default is 5 minutes. You expressed concern that it would not
> > do better than 14 minutes. What is it set for now?
> >
> > In your original append you stated that netmon could only ping 2
> > interfaces/sec. I assumed that was some sort of calculation you did.
> > Now I am wondering if it is something you observed it doing. Is that
> > right? How did you observe that?
> >
> > You probably cannot get netmon to turn on tracing with
> > 'netmon -M -1' because it is too busy to answer, so you would have
> > to turn it on at startup using the configuration menus. Watching the
> > output of the trace might give you a clue as to what it is
> > doing while it
> > is not getting around to pinging.
> >
> > When you do 'ovobjprint -S', how many objects are in the database?
> > When you do 'ovtopdump -l', how many real interfaces, how many nodes
> > does it say you have? Where does your 1700 fit? What is the cache
> > setting for ovwdb? Higher than the object count?
> >
> > How is the box performing otherwise? Does vmstat -5 show idle, or
> > busy? Has it ever performed well? What kind of box is this anyway,
> > cpu and memory?
> >
> > When was the last time you did an ovtopofix -a (or -A)? Since you are
> > using oids to exclude, you will have stub objects for every
> > other thing
> > in the network. Netmon will check these, via snmp, at startup and
> > at daily config poll time, to see if they have turned into
> > something that
> > it is allowed to discover. These things will show up in the
> > total object
> > count but not in the ovtopodump counts. After an ovtopofix,
> > they will be
> > gone but will start coming back as soon as you restart
> > netmon, so I'm not
> > suggesting that you do this, only explaining what will happen
> > if you do.
> > If the numbers of objects is a problem, the solution is to exclude as
> > much as possible by actual address range, for instance your DHCP
> > ranges.
> >
> > I'm about out of ideas. It sounds like your netmon is not keeping up,
> > but I am not positive that is the case. Do you think it is
> > keeping up, or
> > not?
> >
> > Cordially,
> >
> > Leslie A. Clark
> > IBM Global Services - Systems Mgmt & Networking
> > Detroit
> >
> >
> > "Treptow, Craig" <Treptow.Craig@principal.com>@tkg.com on 06/04/2001
> > 11:58:10 AM
> >
> > Please respond to IBM NetView Discussion <nv-l@tkg.com>
> >
> > Sent by:  owner-nv-l@tkg.com
> >
> >
> > To:   "'IBM NetView Discussion'" <nv-l@tkg.com>
> > cc:
> > Subject:  RE: [NV-L] Confirmation of Netview pinging
> >
> >
> >
> > Thanks for the extra info Leslie.  Regarding DNS performance, I run a
> > secondary DNS on the Netview machine for the reverse address
> > space only.
> > These consistenly respond in 3-4 ms, while the forward
> > entries take 10-20ms
> > in general.
> >
> > I let this run all weekend and tried the script again this morning.  I
> > didn't get any output from it until I increased the sleep to
> > 60 seconds.
> > When I did this I got:
> >
> > Netmon is  4335 behind in status pinging
> >
> > I have configuration polling set to 1 day, with 11 OID's to
> > include and 6
> > OID's to exclude in the seed file.
> >
> > Thanks.
> >
> > Craig
> >
> > > -----Original Message-----
> > > From: Leslie Clark [mailto:lclark@US.IBM.COM]
> > > Sent: Saturday, June 02, 2001 3:56 PM
> > > To: IBM NetView Discussion
> > > Subject: RE: [NV-L] Confirmation of Netview pinging
> > >
> > >
> > > The 20 seconds is how long it takes netmon to get around to
> > responding
> > > to your request that he dump the report.  The script counts
> > > the number of
> > > records, such as the one below, that  were scheduled for
> > > times that have
> > > already passed. Note that near the top of the netmon.trace they have
> > > negatives, and hopefully, at the bottom, there will be some
> > > with positive
> > > numbers. The positive numbers tell you that this node is
> > > schedule to be
> > > pinged that many seconds in the future. Or some such time
> > > unit. That looks
> > > like you have some nodes that have not been pinged in
> > hours, so maybe
> > > the units are not seconds.
> > >
> > > -21138: 162.131.115.1 (tower1-feth-5-0.net.principal.com) ...
> > >
> > > When you first start netmon up, it is always behind. It has a
> > > lot to do at
> > > startup.
> > > Try letting it run for half an hour or so, and see if it
> > > catches up. What
> > > it says
> > > at startup is not something I worry about. It is the
> > > steady-state behavior
> > > that
> > > you want to tune for. I suspect that you do have some
> > > performance issues to
> > > deal with, though, because it sort of looks from this as if
> > > it is never
> > > catching up.
> > > Which is why you are writing, I guess. What does it say after
> > > it has been
> > > running
> > > for an hour? How often are you sending it off to do the
> > > configuration poll
> > > (which
> > > defaults to once a day)? How good is your name resolution? Oh
> > > - and how
> > > many names are in your seedfile? All of them? I have seen that delay
> > > netmon startup by half an hour unneccessarily.
> > >
> > > Cordially,
> > >
> > > Leslie A. Clark
> > > IBM Global Services - Systems Mgmt & Networking
> > > Detroit
> > >
> > > "Treptow, Craig" <Treptow.Craig@principal.com>@tkg.com on 06/01/2001
> > > 08:58:30 AM
> > >
> > > Please respond to IBM NetView Discussion <nv-l@tkg.com>
> > >
> > > Sent by:  owner-nv-l@tkg.com
> > >
> > >
> > > To:   "'IBM NetView Discussion'" <nv-l@tkg.com>
> > > cc:
> > > Subject:  RE: [NV-L] Confirmation of Netview pinging
> > >
> > >
> > >
> > > Thanks Leslie!
> > >
> > > Just to clarify, 1700 was not the number of interfaces, it
> > > was the number
> > > of hubs/routers/switches/servers.  I used the script you
> > > provided and had
> > > to increase the sleep time  to 20 seconds.  Anything less
> > > just resulted in
> > > the "Netmon is too busy" message.  At 20 seconds, I typically get:
> > >
> > > Netmon is  3427 behind in status pinging
> > >
> > > I don't really understand what I'm looking at, though.
> > >
> > > The box itself is currently located on one of the backbone
> > > switches.  I
> > > haven't taken any other action, because I wanted to
> > > understand what I was
> > > looking at.  In netmon.trace I see things such as:
> > >
> > > 1043: 162.131.203.57 () list = 0x202aa858
> > > or
> > > -21138: 162.131.115.1 (tower1-feth-5-0.net.principal.com)
> > > list = 0x202aa7b8
> > >
> > > Can you explain these any more?
> > >
> > > Thanks!
> > >
> > > Craig
> > >
> > > > -----Original Message-----
> > > > From: Leslie Clark [mailto:lclark@us.ibm.com]
> > > > Sent: Wednesday, May 30, 2001 8:04 PM
> > > > To: IBM NetView Discussion
> > > > Subject: Re: [NV-L] Confirmation of Netview pinging
> > > >
> > > >
> > > > A couple of things.
> > > >
> > > > Remember your normal response is 40ms, not 1 sec. Yes, it
> > > > will take a while
> > > > to make the rounds if everything is down. But I hope your
> > > > normal state is
> > > > that everything is up.
> > > >
> > > > The number of outstanding pings is configurable. I think
> > the current
> > > > default
> > > > is 16 (it was 10, years ago). It has been tested at up to 64.
> > > > That means it
> > > > can send off pings to 64 nodes at once, and as they repond,
> > > > send out more.
> > > > That number is the number of nodes it can be waiting on
> > at one time
> > > > (waiting
> > > > an average of 40ms, you say). Set it in
> > > > /usr/OV/lrf/netmon.lrf, adding the
> > > > '-q' parameter. Use -q 32 to set the ping queue, and -Q 32 to
> > > > set the snmp
> > > > request queue. Experiment to see if you have the CPU and
> > > > interface speed to
> > > > back it up. I have never seen it overrun the adapter, but I
> > > > have seen it
> > > > use
> > > > up all of the cpu.
> > > >
> > > > 1700 interfaces is not a lot. You should be able to
> > handle that in 5
> > > > minutes
> > > > easily on just about any box, using the default timeout/retry
> > > > of 2 and 3.
> > > > Some caveats: If you have a lot of unpingable interfaces in
> > > > your map, clear
> > > > them up. They clog up the ping queue (or increase the ping queue).
> > > > Acknowledged counts, too, since they are still pinged. Make
> > > > sure your name
> > > > resolution method is really fast. That slows everything down
> > > > more than you
> > > > would expect. If you are having problems with false alarms,
> > > > make note of
> > > > them
> > > > and tune them individually to accomodate normal variations in
> > > > the network,
> > > > rather than increase the timeout across the board. Make sure
> > > > you box is
> > > > centrally located in the network, with the most reliable
> > connection
> > > > available,
> > > > and make sure that connection is running at full-duplex if
> > > > the connection
> > > > supports it.
> > > >
> > > > Here's a little script to help you monitor how well netmon
> > > is keeping
> > > > up with the status polling. See how fast it catches up when
> > > > it gets behind.
> > > >
> > > > #!/bin/ksh
> > > > #
> > > > # pingstatus.sh
> > > > #
> > > > # A script to check whether netmon can keep up with the polling
> > > > # frequency scheduled. Can be called from the Reports menu.
> > > > # Output: a messages to stdout
> > > > # Note: not reliable if netmon tracing is going on!
> > > > #
> > > > #set -x
> > > > rm /usr/OV/log/netmon.trace
> > > > netmon -a 12
> > > > sleep 3
> > > > if [ -f /usr/OV/log/netmon.trace ]; then
> > > >   echo "Netmon is " `grep [-].*[:] /usr/OV/log/netmon.trace |
> > > > wc -l ` \
> > > >       "behind in status pinging";
> > > > else
> > > >   echo "Netmon is too busy to report now. Try later."
> > > > fi
> > > > exit
> > > >
> > > >
> > > > Cordially,
> > > >
> > > > Leslie A. Clark
> > > > IBM Global Services - Systems Mgmt & Networking
> > > > Detroit
> > > >
> > > >
> > > > "Treptow, Craig" <Treptow.Craig@principal.com>@tkg.com on
> > 05/30/2001
> > > > 04:39:22 PM
> > > >
> > > > Please respond to IBM NetView Discussion <nv-l@tkg.com>
> > > >
> > > > Sent by:  owner-nv-l@tkg.com
> > > >
> > > >
> > > > To:   "NetView List (E-mail)" <nv-l@tkg.com>
> > > > cc:
> > > > Subject:  [NV-L] Confirmation of Netview pinging
> > > >
> > > >
> > > >
> > > > Hi.  We are running Netview 6.0.2 on AIX 4.3.  We are wanting
> > > > to move to a
> > > > more proactive approach to problem notifications.  Our hope
> > > is to ping
> > > > servers/hubs/switches/routers and generate events when they aren't
> > > > reachable.  This would make use of the Netview features to
> > > reduce the
> > > > "noisy" pages, etc.  In preparation for this, I was running
> > > > some numbers
> > > > and would like some input to see if I am flawed somewhere:
> > > >
> > > > Average response time for pings = 40ms (includes LAN and WAN)
> > > > Total devices to ping 1700. (and growing at about 30 per month)
> > > > # outstanding pings = 10 (Is this true?  Does it affect my
> > > > numbers?  If so,
> > > > how?)
> > > > Retries = 0
> > > > Timeout = 1 sec
> > > > One Netview machine.
> > > >
> > > > Netview could only ping 2 devices per second for a total of
> > > > 120 per minute.
> > > > 1700 / 120 = 14 minutes to complete one ping cycle.
> > > >
> > > > So this would mean that using this method, we would only find
> > > > out about a
> > > > down device after 14 minutes at best?  I don't think anybody
> > > > would accept
> > > > this long of a window.
> > > >
> > > > Assuming the above is true, it appears that it is time for
> > > > use to look into
> > > > a different Netview architecture that could achieve our goals?
> > > >
> > > > I'm just looking for some insight into how Netview pings and
> > > > if my numbers
> > > > are even reasonable, etc.  Thanks for any help you can provide.
> > > >
> > > > Craig
> > > >
> > > > P.S. I have searched the archives, but there appears to
> > be many open
> > > > questions on this topic.  Also, no form of netmon -a ?, or
> > > > any other flag
> > > > produced output in the netmon.trace file.
> > > > ______________________________________________________________
> > > > ___________
> > > > NV-L List information and Archives: http://www.tkg.com/nv-l
> > > >
> > > >
> > > > ______________________________________________________________
> > > > ___________
> > > > NV-L List information and Archives: http://www.tkg.com/nv-l
> > > >
> > > ______________________________________________________________
> > > ___________
> > > NV-L List information and Archives: http://www.tkg.com/nv-l
> > >
> > >
> > > ______________________________________________________________
> > > ___________
> > > NV-L List information and Archives: http://www.tkg.com/nv-l
> > >
> > ______________________________________________________________
> > ___________
> > NV-L List information and Archives: http://www.tkg.com/nv-l
> >
> >
> > ______________________________________________________________
> > ___________
> > NV-L List information and Archives: http://www.tkg.com/nv-l
> >
> _________________________________________________________________________


<Prev in Thread] Current Thread [Next in Thread>

Archive operated by Skills 1st Ltd

See also: The NetView Web