nv-l
[Top] [All Lists]

Re: [nv-l] ? about actionsvr reporting "incorrectly parented"

To: nv-l@lists.us.ibm.com
Subject: Re: [nv-l] ? about actionsvr reporting "incorrectly parented"
From: James Shanks <jshanks@us.ibm.com>
Date: Thu, 16 Mar 2006 09:24:02 -0500
Delivery-date: Thu, 16 Mar 2006 14:24:53 +0000
Envelope-to: nv-l-archive@lists.skills-1st.co.uk
In-reply-to: <AD79F859134E49439B1BF655B50EB1DE029B04E2@pccsseaex01.pemcocorp.net>
Reply-to: nv-l@lists.us.ibm.com
Sender: owner-nv-l@lists.us.ibm.com
This is a fishing expedition, Glen.

 I would start by looking at /usr/OV/PD for cores and FFDC data, because it
appears from just this that actionsvr went away.  I would also be looking
at the logs for the various daemons involved and seeing what, if anything,
I could learn about what happened.  What you are trying to establish a time
line of events - what happened, when it happened, and so on, so you can
track back to the first failure and the root cause.  And don't forget to
look at the pager.warm file

Also, do an ovstatus before you ovstop and check the PIDs.  actionsvr
spawns a separate child to execute each action and it would be helpful to
know whether the incorrectly parented process is the main daemon or one of
the children. I suspect that it's a child.  You might also check to see if
that actionsvr is the parent of any other processes which might be hung.
Call Support whenever you think you need assistance.

As an aside, have you ever thought about having your pages issued from a
script which checks a global variable or for the existence of a file before
continuing?  For example, suppose you had this in your script, before
deciding to page.


       if [ -f /usr/OV/tmp/maintenance ] ; then
            < pseudo-code: log a message somewhere >
            exit 1
      fi


It could even be in short script executed an nvcorrd in-line action if you
are doing this from a ruleset.

Then your maintenance guy could disable paging by issuing
      touch /usr/OV/tmp/maintenance
and enable it again with
      rm  /usr/OV/tmp/maintenance

Or does this give him too much power?  It's easier than giving him the
authority to ovstop nvpagerd or actionsvr.

   The trouble with turning off the modem is the retries.  nvpagerd is
going to try to send that page multiple times, and if he can't, he'll put
it back in the warm file and move on to the next one, and when it's the
last one, he'll repeat that process many times before giving up.  If the
modem gets turned back on before that time, then the page will be sent,
whether it's no longer needed or not.  Meanwhile, actionsvr is spawning
children for each new paging request.  They should just add the page to the
in-memory queue with the actPage command, but the question is, how many are
being spawned and how fast?  Depending on what you are paging for, this
could get ugly really fast. with nvpagerd and actionsvr chewing up memory
and cpu for no good reason.

HTH


James Shanks
Level 3 Support  for Tivoli NetView for UNIX and Windows
Tivoli Software / IBM Software Group


                                                                           
             "Glen Warn"                                                   
             <Glen.Warn@pemcoc                                             
             orp.com>                                                   To 
             Sent by:                  <nv-l@lists.us.ibm.com>             
             owner-nv-l@lists.                                          cc 
             us.ibm.com                                                    
                                                                   Subject 
                                       [nv-l] ? about actionsvr reporting  
             03/15/2006 07:27          "incorrectly parented"              
             PM                                                            
                                                                           
                                                                           
             Please respond to                                             
             nv-l@lists.us.ibm                                             
                   .com                                                    
                                                                           
                                                                           




Hi,

I have NV 7.1.4 FP3 on RH Linux AS 2.1
I’ve recently started having some issues – mainly noticed because paging
events aren’t being paged out (though I believe scope is probably much
bigger)  Long story short, my fix right now is to stop/start Netview.  When
I do the stop – I get this error msg:

WARNING: One or more processes are incorrectly parented.
UID        PID  PPID  C STIME TTY          TIME CMD
root     23243     1  0 Mar14 ?        00:00:00 /usr/OV/bin/actionsvr
Stopping daemons...
kill -9 23243 actionsvr

I’ve recently had some issues where off hours work was being done the
person in was powering the modem attached to Netview off to prevent waking
the oncall person but I think the queuing of the events was possibly
overloading nvpagerd or some other process because when the modem was
turned back on it wasn’t able to send any pages.  I have this script that
typically restores service – but it won’t work for a user if my X session
is active (because I get pop ups that require intervention to proceed.)

/usr/OV/bin/ovstop nvcorrd
/usr/OV/bin/ovstop nvserverd
/usr/OV/bin/ovstop nvpagerd
/usr/OV/bin/ovstart nvcorrd
/usr/OV/bin/ovstart nvserverd
/usr/OV/bin/ovstart nvpagerd
/usr/OV/bin/ovstart actionsvr
/usr/OV/bin/ovstart snmpCollect
/etc/init.d/netnmrc

Outside of that, I’m not aware of any other major things changing.  Does
anyone have any ideas what might be causing this or how to avoid it?

Thanks,
Glen


<Prev in Thread] Current Thread [Next in Thread>

Archive operated by Skills 1st Ltd

See also: The NetView Web