James,
Can you please explain the difference between a call to
/usr/OV/bin/actPager (that I don't see documented) and the nvpage command ?
Thanks,
Salutations, / Regards,
Francois Le Hir
Network Projects & Consulting Services
IBM Global Services
James Shanks
<jshanks@us.ibm.c
om> To
Sent by: nv-l@lists.us.ibm.com
owner-nv-l@lists. cc
us.ibm.com
Subject
RE: [nv-l] ? about actionsvr
03/30/2006 12:28 reporting "incorrectly parented"
PM
Please respond to
nv-l@lists.us.ibm
.com
If you have a core of nvpagerd on Linux, please contact Support. I just
fixed one last week under IY82787 and NetView Level 2 can give you a test
fix. That may make everything else moot.
As for the relationship of the various processes, if you do just
ps -ef | more
You'll get a title line at the top which will tell you that the first
number is that process' PID, and the second is the parent process PID (or
PPID as it is called).
So in your example
#ps -ef |grep acti
root 21451 21399 0 Mar29 ? 00:04:38 /usr/OV/bin/ovactiond
-l/usr/OV/
root 21452 21399 0 Mar29 ? 00:00:00 /usr/OV/bin/actionsvr
root 21454 21452 0 Mar29 ? 00:00:00 /usr/OV/bin/actionsvr
root 25517 20862 0 08:57 pts/0 00:00:00 grep acti
The actionsvr with PID 21454 was parented by the actionsvr with PID 21452.
He in turn was parented by some other process with PID 21399, as was
ovactiond. Since ovactiond and that actionsvr were started by the same
process, 21399, that's probably ovspmd. Which would make this actionsvr
root 21452 21399 0 Mar29 ? 00:00:00 /usr/OV/bin/actionsvr
the main daemon and this one
root 21454 21452 0 Mar29 ? 00:00:00 /usr/OV/bin/actionsvr
his child. Do you follow?
When you page from a ruleset, the Pager node or icon is just a stand-in for
an Action node which calls /usr/OV/bin/actPager with the correct userid and
your pager message. That in turn issues an nvpage command internally,
which will succeed if nvpagerd is active and fail if it is not. But even
if it fails, that should not cause actionsvr to hang or be incorrectly
parented, so I am not really clear what is going on. But perhaps if you
get the pager fix, the entire problem will simply vanish.
James Shanks
Level 3 Support for Tivoli NetView for UNIX and Windows
Tivoli Software / IBM Software Group
"Glen Warn"
<Glen.Warn@pemcoc
orp.com> To
Sent by: <nv-l@lists.us.ibm.com>
owner-nv-l@lists. cc
us.ibm.com
Subject
RE: [nv-l] ? about actionsvr
03/30/2006 12:02 reporting "incorrectly parented"
PM
Please respond to
nv-l@lists.us.ibm
.com
Hi James,
Thanks for the useful information (as always!)
I had quite a lot of info (some quite old) in those dirs - so I first
started by purging everything. Since doing so I've had one failure. I
do now have a core under nvpagerd - but I don't know what to do with it!
Also, showing just how little I know about Linux/UNIX I have to ask for
more help on the actionsvr processes. I do see 2 processes spawned for
actionsvr - but am not positive I understand the parent/child
relationship. Below is a sample - Is the parent the 2nd column PID
21452 and the child 21454 with a 3rd column PID reference to 21452?
Armed with this information - how would I further determine which one of
the 2 is the ailing process?
#ps -ef |grep acti
root 21451 21399 0 Mar29 ? 00:04:38 /usr/OV/bin/ovactiond
-l/usr/OV/
root 21452 21399 0 Mar29 ? 00:00:00 /usr/OV/bin/actionsvr
root 21454 21452 0 Mar29 ? 00:00:00 /usr/OV/bin/actionsvr
root 25517 20862 0 08:57 pts/0 00:00:00 grep acti
Apologies for being lite on LINUX skills - just couldn't get the Windows
version to do everything we're accustomed to the AIX/Linux versions
doing so have stayed the course.
Thx,
Glen
PS. Love the idea on the script for paging. Will experiment with that
and if success roll out to all my paging rulesets.
-----Original Message-----
From: owner-nv-l@lists.us.ibm.com [mailto:owner-nv-l@lists.us.ibm.com]
On Behalf Of James Shanks
Sent: Thursday, March 16, 2006 6:24 AM
To: nv-l@lists.us.ibm.com
Subject: Re: [nv-l] ? about actionsvr reporting "incorrectly parented"
This is a fishing expedition, Glen.
I would start by looking at /usr/OV/PD for cores and FFDC data, because
it
appears from just this that actionsvr went away. I would also be
looking
at the logs for the various daemons involved and seeing what, if
anything,
I could learn about what happened. What you are trying to establish a
time
line of events - what happened, when it happened, and so on, so you can
track back to the first failure and the root cause. And don't forget to
look at the pager.warm file
Also, do an ovstatus before you ovstop and check the PIDs. actionsvr
spawns a separate child to execute each action and it would be helpful
to
know whether the incorrectly parented process is the main daemon or one
of
the children. I suspect that it's a child. You might also check to see
if
that actionsvr is the parent of any other processes which might be hung.
Call Support whenever you think you need assistance.
As an aside, have you ever thought about having your pages issued from a
script which checks a global variable or for the existence of a file
before
continuing? For example, suppose you had this in your script, before
deciding to page.
if [ -f /usr/OV/tmp/maintenance ] ; then
< pseudo-code: log a message somewhere >
exit 1
fi
It could even be in short script executed an nvcorrd in-line action if
you
are doing this from a ruleset.
Then your maintenance guy could disable paging by issuing
touch /usr/OV/tmp/maintenance
and enable it again with
rm /usr/OV/tmp/maintenance
Or does this give him too much power? It's easier than giving him the
authority to ovstop nvpagerd or actionsvr.
The trouble with turning off the modem is the retries. nvpagerd is
going to try to send that page multiple times, and if he can't, he'll
put
it back in the warm file and move on to the next one, and when it's the
last one, he'll repeat that process many times before giving up. If the
modem gets turned back on before that time, then the page will be sent,
whether it's no longer needed or not. Meanwhile, actionsvr is spawning
children for each new paging request. They should just add the page to
the
in-memory queue with the actPage command, but the question is, how many
are
being spawned and how fast? Depending on what you are paging for, this
could get ugly really fast. with nvpagerd and actionsvr chewing up
memory
and cpu for no good reason.
HTH
James Shanks
Level 3 Support for Tivoli NetView for UNIX and Windows
Tivoli Software / IBM Software Group
"Glen Warn"
<Glen.Warn@pemcoc
orp.com>
To
Sent by: <nv-l@lists.us.ibm.com>
owner-nv-l@lists.
cc
us.ibm.com
Subject
[nv-l] ? about actionsvr
reporting
03/15/2006 07:27 "incorrectly parented"
PM
Please respond to
nv-l@lists.us.ibm
.com
Hi,
I have NV 7.1.4 FP3 on RH Linux AS 2.1
I've recently started having some issues - mainly noticed because paging
events aren't being paged out (though I believe scope is probably much
bigger) Long story short, my fix right now is to stop/start Netview.
When
I do the stop - I get this error msg:
WARNING: One or more processes are incorrectly parented.
UID PID PPID C STIME TTY TIME CMD
root 23243 1 0 Mar14 ? 00:00:00 /usr/OV/bin/actionsvr
Stopping daemons...
kill -9 23243 actionsvr
I've recently had some issues where off hours work was being done the
person in was powering the modem attached to Netview off to prevent
waking
the oncall person but I think the queuing of the events was possibly
overloading nvpagerd or some other process because when the modem was
turned back on it wasn't able to send any pages. I have this script
that
typically restores service - but it won't work for a user if my X
session
is active (because I get pop ups that require intervention to proceed.)
/usr/OV/bin/ovstop nvcorrd
/usr/OV/bin/ovstop nvserverd
/usr/OV/bin/ovstop nvpagerd
/usr/OV/bin/ovstart nvcorrd
/usr/OV/bin/ovstart nvserverd
/usr/OV/bin/ovstart nvpagerd
/usr/OV/bin/ovstart actionsvr
/usr/OV/bin/ovstart snmpCollect
/etc/init.d/netnmrc
Outside of that, I'm not aware of any other major things changing. Does
anyone have any ideas what might be causing this or how to avoid it?
Thanks,
Glen
|