nv-l
[Top] [All Lists]

RE: [nv-l] ? about actionsvr reporting "incorrectly parented"

To: nv-l@lists.us.ibm.com
Subject: RE: [nv-l] ? about actionsvr reporting "incorrectly parented"
From: James Shanks <jshanks@us.ibm.com>
Date: Thu, 30 Mar 2006 13:16:20 -0500
Delivery-date: Thu, 30 Mar 2006 19:16:51 +0100
Envelope-to: nv-l-archive@lists.skills-1st.co.uk
In-reply-to: <OF1D24006F.F7D9153D-ON85257141.0060C36C-85257141.0060E82A@ca.ibm.com>
Reply-to: nv-l@lists.us.ibm.com
Sender: owner-nv-l@lists.us.ibm.com
The result is no different.   But how you get there is.

When you build your ruleset with the Pager node, it forces you to define a
NetView userid in the security database for the person to page.  In that
record you store the PIN number and pager provider.  When the ruleset
fires, it calls actPager  with the NetView user id and message.  actPager
extracts the proper PIN and pager provider from the security record and
builds the nvpage command,  /usr/OV/bin/nvpage  <PIN>@<provider>  <message>
and issues that internally.

actPager isn't documented because it was designed only to be used
internally by nvcorrd. But there is nothing to stop you from using it in a
script or the command line if you like.  But you have to build that user
record first,  Two ways to do that:
(1) with the ruleset editor and a Pager node -- you don't even have to keep
the ruleset if you don't want to, or
(2)  nvsec_admin  --> Change | Add  --> Set Pager Information


James Shanks
Level 3 Support  for Tivoli NetView for UNIX and Windows
Tivoli Software / IBM Software Group


                                                                           
             Francois Le Hir                                               
             <flehir@ca.ibm.co                                             
             m>                                                         To 
             Sent by:                  nv-l@lists.us.ibm.com               
             owner-nv-l@lists.                                          cc 
             us.ibm.com                                                    
                                                                   Subject 
                                       RE: [nv-l] ? about actionsvr        
             03/30/2006 12:38          reporting "incorrectly parented"    
             PM                                                            
                                                                           
                                                                           
             Please respond to                                             
             nv-l@lists.us.ibm                                             
                   .com                                                    
                                                                           
                                                                           








James,

Can you please explain the difference between a call to
/usr/OV/bin/actPager (that I don't see documented) and the nvpage command ?

Thanks,
Salutations, / Regards,

Francois Le Hir
Network Projects & Consulting Services
IBM Global Services




             James Shanks
             <jshanks@us.ibm.c
             om>                                                        To
             Sent by:                  nv-l@lists.us.ibm.com
             owner-nv-l@lists.                                          cc
             us.ibm.com
                                                                   Subject
                                       RE: [nv-l] ? about actionsvr
             03/30/2006 12:28          reporting "incorrectly parented"
             PM


             Please respond to
             nv-l@lists.us.ibm
                   .com






If you have a core of nvpagerd on Linux, please contact Support.  I just
fixed one last week under IY82787 and NetView Level 2 can give you a test
fix.  That may make everything else moot.

As for the relationship of the various processes, if you do just
      ps -ef | more
You'll get a title line at the top which will tell you that the first
number is that process' PID, and the second is the parent process PID (or
PPID as it is called).
So in your example
#ps -ef |grep acti
root     21451 21399  0 Mar29 ?        00:04:38 /usr/OV/bin/ovactiond
-l/usr/OV/
root     21452 21399  0 Mar29 ?        00:00:00 /usr/OV/bin/actionsvr
root     21454 21452  0 Mar29 ?        00:00:00 /usr/OV/bin/actionsvr
root     25517 20862  0 08:57 pts/0    00:00:00 grep acti

The actionsvr with PID 21454  was parented by the actionsvr with PID 21452.
He in turn was parented by some other process with PID 21399, as was
ovactiond.  Since ovactiond and that actionsvr were started by the same
process, 21399, that's probably ovspmd.  Which would make this actionsvr
      root     21452 21399  0 Mar29 ?        00:00:00 /usr/OV/bin/actionsvr
the main daemon and this one
      root     21454 21452  0 Mar29 ?        00:00:00 /usr/OV/bin/actionsvr
his child.  Do you follow?

When you page from a ruleset, the Pager node or icon is just a stand-in for
an Action node which calls /usr/OV/bin/actPager with the correct userid and
your pager message.  That in turn issues an nvpage command internally,
which will succeed if nvpagerd is active and fail if it is not.  But even
if it fails, that should not cause actionsvr to hang or be incorrectly
parented, so I am not really clear what is going on.  But perhaps if you
get the pager fix, the entire problem will simply vanish.


James Shanks
Level 3 Support  for Tivoli NetView for UNIX and Windows
Tivoli Software / IBM Software Group



             "Glen Warn"
             <Glen.Warn@pemcoc
             orp.com>                                                   To
             Sent by:                  <nv-l@lists.us.ibm.com>
             owner-nv-l@lists.                                          cc
             us.ibm.com
                                                                   Subject
                                       RE: [nv-l] ? about actionsvr
             03/30/2006 12:02          reporting "incorrectly parented"
             PM


             Please respond to
             nv-l@lists.us.ibm
                   .com






Hi James,

Thanks for the useful information (as always!)
I had quite a lot of info (some quite old) in those dirs - so I first
started by purging everything.  Since doing so I've had one failure.  I
do now have a core under nvpagerd - but I don't know what to do with it!

Also, showing just how little I know about Linux/UNIX I have to ask for
more help on the actionsvr processes.  I do see 2 processes spawned for
actionsvr - but am not positive I understand the parent/child
relationship.  Below is a sample - Is the parent the 2nd column PID
21452 and the child 21454 with a 3rd column PID reference to 21452?
Armed with this information - how would I further determine which one of
the 2 is the ailing process?

#ps -ef |grep acti
root     21451 21399  0 Mar29 ?        00:04:38 /usr/OV/bin/ovactiond
-l/usr/OV/
root     21452 21399  0 Mar29 ?        00:00:00 /usr/OV/bin/actionsvr
root     21454 21452  0 Mar29 ?        00:00:00 /usr/OV/bin/actionsvr
root     25517 20862  0 08:57 pts/0    00:00:00 grep acti

Apologies for being lite on LINUX skills - just couldn't get the Windows
version to do everything we're accustomed to the AIX/Linux versions
doing so have stayed the course.

Thx,
Glen

PS.  Love the idea on the script for paging.  Will experiment with that
and if success roll out to all my paging rulesets.

-----Original Message-----
From: owner-nv-l@lists.us.ibm.com [mailto:owner-nv-l@lists.us.ibm.com]
On Behalf Of James Shanks
Sent: Thursday, March 16, 2006 6:24 AM
To: nv-l@lists.us.ibm.com
Subject: Re: [nv-l] ? about actionsvr reporting "incorrectly parented"

This is a fishing expedition, Glen.

 I would start by looking at /usr/OV/PD for cores and FFDC data, because
it
appears from just this that actionsvr went away.  I would also be
looking
at the logs for the various daemons involved and seeing what, if
anything,
I could learn about what happened.  What you are trying to establish a
time
line of events - what happened, when it happened, and so on, so you can
track back to the first failure and the root cause.  And don't forget to
look at the pager.warm file

Also, do an ovstatus before you ovstop and check the PIDs.  actionsvr
spawns a separate child to execute each action and it would be helpful
to
know whether the incorrectly parented process is the main daemon or one
of
the children. I suspect that it's a child.  You might also check to see
if
that actionsvr is the parent of any other processes which might be hung.
Call Support whenever you think you need assistance.

As an aside, have you ever thought about having your pages issued from a
script which checks a global variable or for the existence of a file
before
continuing?  For example, suppose you had this in your script, before
deciding to page.


       if [ -f /usr/OV/tmp/maintenance ] ; then
            < pseudo-code: log a message somewhere >
            exit 1
      fi


It could even be in short script executed an nvcorrd in-line action if
you
are doing this from a ruleset.

Then your maintenance guy could disable paging by issuing
      touch /usr/OV/tmp/maintenance
and enable it again with
      rm  /usr/OV/tmp/maintenance

Or does this give him too much power?  It's easier than giving him the
authority to ovstop nvpagerd or actionsvr.

   The trouble with turning off the modem is the retries.  nvpagerd is
going to try to send that page multiple times, and if he can't, he'll
put
it back in the warm file and move on to the next one, and when it's the
last one, he'll repeat that process many times before giving up.  If the
modem gets turned back on before that time, then the page will be sent,
whether it's no longer needed or not.  Meanwhile, actionsvr is spawning
children for each new paging request.  They should just add the page to
the
in-memory queue with the actPage command, but the question is, how many
are
being spawned and how fast?  Depending on what you are paging for, this
could get ugly really fast. with nvpagerd and actionsvr chewing up
memory
and cpu for no good reason.

HTH


James Shanks
Level 3 Support  for Tivoli NetView for UNIX and Windows
Tivoli Software / IBM Software Group




             "Glen Warn"

             <Glen.Warn@pemcoc

             orp.com>
To
             Sent by:                  <nv-l@lists.us.ibm.com>

             owner-nv-l@lists.
cc
             us.ibm.com


Subject
                                       [nv-l] ? about actionsvr
reporting
             03/15/2006 07:27          "incorrectly parented"

             PM





             Please respond to

             nv-l@lists.us.ibm

                   .com









Hi,

I have NV 7.1.4 FP3 on RH Linux AS 2.1
I've recently started having some issues - mainly noticed because paging
events aren't being paged out (though I believe scope is probably much
bigger)  Long story short, my fix right now is to stop/start Netview.
When
I do the stop - I get this error msg:

WARNING: One or more processes are incorrectly parented.
UID        PID  PPID  C STIME TTY          TIME CMD
root     23243     1  0 Mar14 ?        00:00:00 /usr/OV/bin/actionsvr
Stopping daemons...
kill -9 23243 actionsvr

I've recently had some issues where off hours work was being done the
person in was powering the modem attached to Netview off to prevent
waking
the oncall person but I think the queuing of the events was possibly
overloading nvpagerd or some other process because when the modem was
turned back on it wasn't able to send any pages.  I have this script
that
typically restores service - but it won't work for a user if my X
session
is active (because I get pop ups that require intervention to proceed.)

/usr/OV/bin/ovstop nvcorrd
/usr/OV/bin/ovstop nvserverd
/usr/OV/bin/ovstop nvpagerd
/usr/OV/bin/ovstart nvcorrd
/usr/OV/bin/ovstart nvserverd
/usr/OV/bin/ovstart nvpagerd
/usr/OV/bin/ovstart actionsvr
/usr/OV/bin/ovstart snmpCollect
/etc/init.d/netnmrc

Outside of that, I'm not aware of any other major things changing.  Does
anyone have any ideas what might be causing this or how to avoid it?

Thanks,
Glen










<Prev in Thread] Current Thread [Next in Thread>

Archive operated by Skills 1st Ltd

See also: The NetView Web