[Top] [All Lists]

RE: [nv-l] ? about actionsvr reporting "incorrectly parented"

To: nv-l@lists.us.ibm.com
Subject: RE: [nv-l] ? about actionsvr reporting "incorrectly parented"
From: James Shanks <jshanks@us.ibm.com>
Date: Thu, 30 Mar 2006 12:28:34 -0500
Delivery-date: Thu, 30 Mar 2006 18:29:02 +0100
Envelope-to: nv-l-archive@lists.skills-1st.co.uk
In-reply-to: <AD79F859134E49439B1BF655B50EB1DE02AEDA4A@pccsseaex01.pemcocorp.net>
Reply-to: nv-l@lists.us.ibm.com
Sender: owner-nv-l@lists.us.ibm.com
If you have a core of nvpagerd on Linux, please contact Support.  I just
fixed one last week under IY82787 and NetView Level 2 can give you a test
fix.  That may make everything else moot.

As for the relationship of the various processes, if you do just
      ps -ef | more
You'll get a title line at the top which will tell you that the first
number is that process' PID, and the second is the parent process PID (or
PPID as it is called).
So in your example
#ps -ef |grep acti
root     21451 21399  0 Mar29 ?        00:04:38 /usr/OV/bin/ovactiond
root     21452 21399  0 Mar29 ?        00:00:00 /usr/OV/bin/actionsvr
root     21454 21452  0 Mar29 ?        00:00:00 /usr/OV/bin/actionsvr
root     25517 20862  0 08:57 pts/0    00:00:00 grep acti

The actionsvr with PID 21454  was parented by the actionsvr with PID 21452.
He in turn was parented by some other process with PID 21399, as was
ovactiond.  Since ovactiond and that actionsvr were started by the same
process, 21399, that's probably ovspmd.  Which would make this actionsvr
      root     21452 21399  0 Mar29 ?        00:00:00 /usr/OV/bin/actionsvr
the main daemon and this one
      root     21454 21452  0 Mar29 ?        00:00:00 /usr/OV/bin/actionsvr
his child.  Do you follow?

When you page from a ruleset, the Pager node or icon is just a stand-in for
an Action node which calls /usr/OV/bin/actPager with the correct userid and
your pager message.  That in turn issues an nvpage command internally,
which will succeed if nvpagerd is active and fail if it is not.  But even
if it fails, that should not cause actionsvr to hang or be incorrectly
parented, so I am not really clear what is going on.  But perhaps if you
get the pager fix, the entire problem will simply vanish.

James Shanks
Level 3 Support  for Tivoli NetView for UNIX and Windows
Tivoli Software / IBM Software Group

             "Glen Warn"                                                   
             orp.com>                                                   To 
             Sent by:                  <nv-l@lists.us.ibm.com>             
             owner-nv-l@lists.                                          cc 
                                       RE: [nv-l] ? about actionsvr        
             03/30/2006 12:02          reporting "incorrectly parented"    
             Please respond to                                             

Hi James,

Thanks for the useful information (as always!)
I had quite a lot of info (some quite old) in those dirs - so I first
started by purging everything.  Since doing so I've had one failure.  I
do now have a core under nvpagerd - but I don't know what to do with it!

Also, showing just how little I know about Linux/UNIX I have to ask for
more help on the actionsvr processes.  I do see 2 processes spawned for
actionsvr - but am not positive I understand the parent/child
relationship.  Below is a sample - Is the parent the 2nd column PID
21452 and the child 21454 with a 3rd column PID reference to 21452?
Armed with this information - how would I further determine which one of
the 2 is the ailing process?

#ps -ef |grep acti
root     21451 21399  0 Mar29 ?        00:04:38 /usr/OV/bin/ovactiond
root     21452 21399  0 Mar29 ?        00:00:00 /usr/OV/bin/actionsvr
root     21454 21452  0 Mar29 ?        00:00:00 /usr/OV/bin/actionsvr
root     25517 20862  0 08:57 pts/0    00:00:00 grep acti

Apologies for being lite on LINUX skills - just couldn't get the Windows
version to do everything we're accustomed to the AIX/Linux versions
doing so have stayed the course.


PS.  Love the idea on the script for paging.  Will experiment with that
and if success roll out to all my paging rulesets.

-----Original Message-----
From: owner-nv-l@lists.us.ibm.com [mailto:owner-nv-l@lists.us.ibm.com]
On Behalf Of James Shanks
Sent: Thursday, March 16, 2006 6:24 AM
To: nv-l@lists.us.ibm.com
Subject: Re: [nv-l] ? about actionsvr reporting "incorrectly parented"

This is a fishing expedition, Glen.

 I would start by looking at /usr/OV/PD for cores and FFDC data, because
appears from just this that actionsvr went away.  I would also be
at the logs for the various daemons involved and seeing what, if
I could learn about what happened.  What you are trying to establish a
line of events - what happened, when it happened, and so on, so you can
track back to the first failure and the root cause.  And don't forget to
look at the pager.warm file

Also, do an ovstatus before you ovstop and check the PIDs.  actionsvr
spawns a separate child to execute each action and it would be helpful
know whether the incorrectly parented process is the main daemon or one
the children. I suspect that it's a child.  You might also check to see
that actionsvr is the parent of any other processes which might be hung.
Call Support whenever you think you need assistance.

As an aside, have you ever thought about having your pages issued from a
script which checks a global variable or for the existence of a file
continuing?  For example, suppose you had this in your script, before
deciding to page.

       if [ -f /usr/OV/tmp/maintenance ] ; then
            < pseudo-code: log a message somewhere >
            exit 1

It could even be in short script executed an nvcorrd in-line action if
are doing this from a ruleset.

Then your maintenance guy could disable paging by issuing
      touch /usr/OV/tmp/maintenance
and enable it again with
      rm  /usr/OV/tmp/maintenance

Or does this give him too much power?  It's easier than giving him the
authority to ovstop nvpagerd or actionsvr.

   The trouble with turning off the modem is the retries.  nvpagerd is
going to try to send that page multiple times, and if he can't, he'll
it back in the warm file and move on to the next one, and when it's the
last one, he'll repeat that process many times before giving up.  If the
modem gets turned back on before that time, then the page will be sent,
whether it's no longer needed or not.  Meanwhile, actionsvr is spawning
children for each new paging request.  They should just add the page to
in-memory queue with the actPage command, but the question is, how many
being spawned and how fast?  Depending on what you are paging for, this
could get ugly really fast. with nvpagerd and actionsvr chewing up
and cpu for no good reason.


James Shanks
Level 3 Support  for Tivoli NetView for UNIX and Windows
Tivoli Software / IBM Software Group

             "Glen Warn"


             Sent by:                  <nv-l@lists.us.ibm.com>


                                       [nv-l] ? about actionsvr
             03/15/2006 07:27          "incorrectly parented"


             Please respond to




I have NV 7.1.4 FP3 on RH Linux AS 2.1
I've recently started having some issues - mainly noticed because paging
events aren't being paged out (though I believe scope is probably much
bigger)  Long story short, my fix right now is to stop/start Netview.
I do the stop - I get this error msg:

WARNING: One or more processes are incorrectly parented.
root     23243     1  0 Mar14 ?        00:00:00 /usr/OV/bin/actionsvr
Stopping daemons...
kill -9 23243 actionsvr

I've recently had some issues where off hours work was being done the
person in was powering the modem attached to Netview off to prevent
the oncall person but I think the queuing of the events was possibly
overloading nvpagerd or some other process because when the modem was
turned back on it wasn't able to send any pages.  I have this script
typically restores service - but it won't work for a user if my X
is active (because I get pop ups that require intervention to proceed.)

/usr/OV/bin/ovstop nvcorrd
/usr/OV/bin/ovstop nvserverd
/usr/OV/bin/ovstop nvpagerd
/usr/OV/bin/ovstart nvcorrd
/usr/OV/bin/ovstart nvserverd
/usr/OV/bin/ovstart nvpagerd
/usr/OV/bin/ovstart actionsvr
/usr/OV/bin/ovstart snmpCollect

Outside of that, I'm not aware of any other major things changing.  Does
anyone have any ideas what might be causing this or how to avoid it?


<Prev in Thread] Current Thread [Next in Thread>

Archive operated by Skills 1st Ltd

See also: The NetView Web