nv-l
[Top] [All Lists]

Re: ovtopmd not starting

To: nv-l@lists.tivoli.com
Subject: Re: ovtopmd not starting
From: James_Shanks@TIVOLI.COM
Date: Mon, 21 Sep 1998 13:36:53 -0400
Reply-to: Discussion of IBM NetView and POLYCENTER Manager on NetView et alia <NV-L@UCSBVM.UCSB.EDU>
Sender: Discussion of IBM NetView and POLYCENTER Manager on NetView et alia <NV-L@UCSBVM.UCSB.EDU>
Killed ovspmd?   I've never seen that to be necessary.

  When you do ovstop do you do it specifically for nvsecd?  He will not go
down unless you specifically say "ovstop nvsecd".  Many people do ovstop
twice.  Once without any arguments and then once as ovstop nvsecd.  I have
never seen ovspmd hang if nvsecd has gone down.

 Once they are all down, you might also go to /usr/OV/sockets and clear out
that directory (rm *).  That's recommend in the NetView Diagnosis Guide
under Daemon problems.   If AIX doesn't clean up the sockets before you try
to restart the daemons, then they may not get a socket at all, or worse yet
they may get a handle on the old one.

Pulling the network cable does not clear trapd's socket.  You might try a
netstat -an and look to see what kind of backup there is on 162/udp, or on
any other send or receive queue.  High numbers there mean that things are
backed up.

  And if your daemons have died, the best thing to do is stop the GUI.
There is no point in trying to keep the GUI up if everything it depends on
has been killed or hosed.  It will take just that much longer to try to
reconnect everything to the GUI and most of the time it will not be
successful any way.  The best thing to do is an orderly shutdown of
everything and then an orderly restart, daemons first.

James Shanks
Tivoli (NetView for UNIX) L3 Support



Rob Rinear <robr@dirigo.com> on 09/21/98 11:37:51 AM

Please respond to robr@dirigo.com

To:   NV-L@UCSBVM.UCSB.EDU
cc:    (bcc: James Shanks)
Subject:  Re: ovtopmd not starting





Yes...I stopped them all, and even killed OVsPMD in attempt to start
fresh.  I've also pulled the network cable and watched the Events
display and trapd.log to insure trap processing was idle, but these
daemons will not restart.

I completely agree that the real fix is to stop the flood of traps.  I'm
trying to identify these traps and modify them to not log or display to
help keep Netview alive, until the network folks straighten out the
devices.

Until then, I'm still concerned that Netview's not bouncing back as it
should.  Any other suggestions would be appreciated.




James_Shanks@TIVOLI.COM wrote:
>
> Did you take the other daemons down with an ovstop or not?
>
> If ovtopmd disconnects and goes down because he cannot connect to trapd,
he
> won't be able to re-connect if trapd is too busy to talk to him.  So it
may
> be that trapd is still processing the hundreds of (apparently worthless)
> traps that are sitting on his input queue.  The only way to flush that
> queue is to take down trapd.    Then ovtopmd can connect to him and
netmon
> can connect to both of them.
>
> The only real fix in your case is to stop those network agents from
> flooding the box.
>
> Personal opinion follows:
>
> .soapbox on
> It totally mystifies me why the defaults on some routers send identical
> traps to the trap receiver every so-many seconds.  They should send one
> trap and not another until or unless the trap condition changes; or at
> least they should send them several minutes apart.  But I see trapd logs
> from customers all the time where some box is sending the same trap every
> two or three seconds.  Multiply that by a couple dozen of these boxes and
> pretty soon the management station on which NetView resides is using most
> of its cpu to pull in traps, format them, and then throw them away.  But
> there is little NetView or any other trap receiver can do about that.
> Until you receive and decode the trap, you cannot tell what it is for.
And
> once you have done that, there are always other processes which must
> inspect those traps to decide if they work to do.  The only way out of
the
> hole is to stop it at the source and not configure remote agents to send
> traps too frequently.
> .soapbox off
>
> James Shanks
> Tivoli (NetView for UNIX) L3 Support
>
> Rob Rinear <robr@DIRIGO.COM> on 09/18/98 04:06:28 PM
>
> Please respond to Discussion of IBM NetView and POLYCENTER Manager on
>       NetView et alia <NV-L@UCSBVM.UCSB.EDU>
>
> To:   NV-L@UCSBVM.UCSB.EDU
> cc:    (bcc: James Shanks)
> Subject:  ovtopmd not starting
>
> I'm running AIX 4.2 with NV5.0 and have serious problems with the
daemons.
> I have some devices that will at times flood Netview with traps - far too
> many for it to handle, and some of the daemons will eventually stop -
> trapd,
> netmon, ovtopmd.  I understand this, per documentation in the Tivoli
> knowledge base, and have even attempted to increase the event queue, to
no
> avail.
>
> My real problem is that, once this flurry is over, I cannot get ovtopmd
to
> restart shy of a reboot. I get console messages:
> "Fatal Topology Error: Unable to connect to ovtopmd
> Reason: Cannot connect to server: sys 2: A file or directory in the path
> name does not exist."
> and
> "Fatal Topology Error: Unabale to connect to trapd
> Reason: Topology OK -- no error"
>
> Anyone out there seen such a problem or have any suggestions?
>
> Rob Rinear
> Dirigo Incorporated
> Systems and Network Management Solutions
> (513) 421-6500
> robr@dirigo.com
> http://www.dirigo.com

<Prev in Thread] Current Thread [Next in Thread>

Archive operated by Skills 1st Ltd

See also: The NetView Web