nv-l
[Top] [All Lists]

RE: Nvcorrd error checking

To: nv-l@lists.tivoli.com
Subject: RE: Nvcorrd error checking
From: "James Shanks" <jshanks@us.ibm.com>
Date: Fri, 14 Dec 2001 11:38:48 -0500
This is a multipart message in MIME format.
I am going to assume that the rulesets you have in the 6.0.3 system are 
identical to those in the 7.1 system.  If not, all bets are off.

The nvserverd "not running" message is the result of a time out.  If 
nvevents, the event window, cannot talk to the nvserverd daemon, he issues 
that message.  If ovstatus shows the daemon up, and ps -ef does too, then 
he is stalled doing something else.  If you are forwarding to TEC, the 
thing to do might be to check and see if your TEC server has gone down, 
since nvserverd is the guy who forwards to TEC  when you use the internal 
adapter.  I would also check the /etc/Tivoli/tec/cache file and see if it 
is growing.  If it is,  then that means we cannot contact the TEC server 
for some reason, and nvserverd is having to try to reconnect with him on 
every event it gets, vastly slowing things down. 

James Shanks
Level 3 Support  for Tivoli NetView for UNIX and NT
Tivoli Software / IBM Software Group
 





Jorge Jiles <Jorge.Jiles@ualberta.ca>
Sent by: owner-nv-l@tkg.com
12/14/2001 10:50 AM
Please respond to IBM NetView Discussion

 
        To:     IBM NetView Discussion <nv-l@tkg.com>
        cc: 
        Subject:        RE: [NV-L] Nvcorrd error checking

 

I have seen the same problem in my system. Netview 7.1 on Solaris 8. What
is really weird is that at times I also get the message that nvserverd is
not running when according to ovstatus and logs is working OK. The Only 
way
to get going the events (and the scripts called by rulesets) is by
stop/start nvcorrd, and nvserverd. More than once I had to stopped all the
daemons and restart them. If I find any explanation for this, I let you 
know.
I don't think the rulesets are the problems as the same ones are running 
in
a production environment AIX, Netview 6.02 and they work properly.

At 01:55 PM 12/14/2001 +1100, you wrote:
>Thanks for the tip James, I will look into it. 
>
>We are running 24X7 for some boxes, so I am thinking of some sort of
>heartbeat to check the trapd, nvcorrd and nvactiond e.g(wsnmptrap to NV 
-->
>script to postemsg to TEC --> TEC touch a local file --> script to check
>touch time of file)
>
>How are Netview experts check their daemons out there? Thanks!
>
>Regards,
>
>Jack
>
>-----Original Message-----
>From: James Shanks [mailto:jshanks@us.ibm.com] 
>Sent: Friday, 14 December 2001 2:21 p.m.
>To: IBM NetView Discussion
>Subject: Re: [NV-L] Nvcorrd error checking
>
>
>Jack -
>
>Try looking in the logs.  nvcorrd writes to an alog and a blog in
>/usr/OV/log.  Errors are always written there.  nvcorrd always starts in 
the
>alog, writes a 1000 lines, switches to blog, then writes another 1000, 
and
>switches back.  And when there is an action to be run he hands that off 
to
>actionsvr, who also has a pair of logs, nvaction.alog and blog, that
>work the same way.   If you still don't see anything, then you can turn 
on
>tracing, using the command "nvcdebug -d all".  There are man pages on all
>this stuff, as well as lengthy discussions in the Admin Guide about how 
it
>works.
>
>If you read the NetView  Diagnosis Guide, you may find more hints. There
>you will learn that "Well-behaved" does not mean that the daemon is 
working.
>It is a static condition which reflects how it was built, not whether it 
is
>running correctly at this particular time.  A well-behaved daemon goes 
down
>when you do ovstop. One that is "non-well-behaved" stays up even after 
the
>others go away.
>
>James Shanks
>Level 3 Support  for Tivoli NetView for UNIX and NT
>Tivoli Software / IBM Software Group
>
>
>
>
> 
>
>                    "Chan, Jack"
>
>                    <jack.chan@nz.u       To:     "'IBM NetView 
Discussion'"
>
>                    nisys.com>             <nv-l@tkg.com>
>
>                    Sent by:              cc:
>
>                    owner-nv-l@tkg.       Subject:     [NV-L] Nvcorrd 
error
>checking 
>                    com
>
> 
>
> 
>
>                    12/13/01 06:45
>
>                    PM
>
>                    Please respond
>
>                    to IBM NetView
>
>                    Discussion
>
> 
>
> 
>
>
>
>
>Hello List,
>
>I am having a problem with nvcorrd daemon. Problem as follows:
>
>I have a NV rule to execute a script upon receiving a trap.
>I checked the trapd.log for the trap, it is there, but the script did not
>execute.
>
>ovstatus shows nvcorrd (and all the daemons) are RUNNING and well 
behaved.
>Another symptom I see is the control desktop is not updating (through 
exceed
>and Linux console as well). After I ovstop and ovstart, the script is
>executing again.
>
>I have DM profile to check for daemon up, and scripts to do ovstatus 
|grep
>RUNNING and ovstatus |grep OVs_WELL_BEHAVED. But both of these checking
>mechanism are NOT picking up the nvcorrd is not working as it is supposed 
to
>(because it still thinks it is RUNNING and well behaved)
>
>How can I check that nvcorrd is REALLY running? Some sort of heartbeat 
using
>ruleset maybe?
>
>regards,
>
>Jack.
>
>
>_________________________________________________________________________
>NV-L List information and Archives: http://www.tkg.com/nv-l
>_________________________________________________________________________
>NV-L List information and Archives: http://www.tkg.com/nv-l
>


Jorge A Jiles
Network Analyst
Computing & Network Services
University of Alberta
Edmonton, Alberta
Canada



_________________________________________________________________________
NV-L List information and Archives: http://www.tkg.com/nv-l



I am going to assume that the rulesets you have in the 6.0.3 system are identical to those in the 7.1 system.  If not, all bets are off.

The nvserverd "not running" message is the result of a time out.  If nvevents, the event window, cannot talk to the nvserverd daemon, he issues that message.  If ovstatus shows the daemon up, and ps -ef does too, then he is stalled doing something else.  If you are forwarding to TEC, the thing to do might be to check and see if your TEC server has gone down, since nvserverd is the guy who forwards to TEC  when you use the internal adapter.  I would also check the /etc/Tivoli/tec/cache file and see if it is growing.  If it is,  then that means we cannot contact the TEC server for some reason, and nvserverd is having to try to reconnect with him on every event it gets, vastly slowing things down.    

James Shanks
Level 3 Support  for Tivoli NetView for UNIX and NT
Tivoli Software / IBM Software Group




Jorge Jiles <Jorge.Jiles@ualberta.ca>
Sent by: owner-nv-l@tkg.com

12/14/2001 10:50 AM
Please respond to IBM NetView Discussion

       
        To:        IBM NetView Discussion <nv-l@tkg.com>
        cc:        
        Subject:        RE: [NV-L] Nvcorrd error checking

       


I have seen the same problem in my system. Netview 7.1 on Solaris 8. What
is really weird is that at times I also get the message that nvserverd is
not running when according to ovstatus and logs is working OK. The Only way
to get going the events (and the scripts called by rulesets) is by
stop/start nvcorrd, and nvserverd. More than once I had to stopped all the
daemons and restart them. If I find any explanation for this, I let you know.
I don't think the rulesets are the problems as the same ones are running in
a production environment AIX, Netview 6.02 and they work properly.

At 01:55 PM 12/14/2001 +1100, you wrote:
>Thanks for the tip James, I will look into it.
>
>We are running 24X7 for some boxes, so I am thinking of some sort of
>heartbeat to check the trapd, nvcorrd and nvactiond e.g(wsnmptrap to NV -->
>script to postemsg to TEC --> TEC touch a local file --> script to check
>touch time of file)
>
>How are Netview experts check their daemons out there? Thanks!
>
>Regards,
>
>Jack
>
>-----Original Message-----
>From: James Shanks [mailto:jshanks@us.ibm.com]
>Sent: Friday, 14 December 2001 2:21 p.m.
>To: IBM NetView Discussion
>Subject: Re: [NV-L] Nvcorrd error checking
>
>
>Jack -
>
>Try looking in the logs.  nvcorrd writes to an alog and a blog in
>/usr/OV/log.  Errors are always written there.  nvcorrd always starts in the
>alog, writes a 1000 lines, switches to blog, then writes another 1000, and
>switches back.  And when there is an action to be run he hands that off to
>actionsvr, who also has a pair of logs, nvaction.alog and blog, that
>work the same way.   If you still don't see anything, then you can turn on
>tracing, using the command "nvcdebug -d all".  There are man pages on all
>this stuff, as well as lengthy discussions in the Admin Guide about how it
>works.
>
>If you read the NetView  Diagnosis Guide, you may find more hints.   There
>you will learn that "Well-behaved" does not mean that the daemon is working.
>It is a static condition which reflects how it was built, not whether it is
>running correctly at this particular time.  A well-behaved daemon goes down
>when you do ovstop. One that is "non-well-behaved" stays up even after the
>others go away.
>
>James Shanks
>Level 3 Support  for Tivoli NetView for UNIX and NT
>Tivoli Software / IBM Software Group
>
>
>
>
>
>
>                    "Chan, Jack"
>
>                    <jack.chan@nz.u       To:     "'IBM NetView Discussion'"
>
>                    nisys.com>             <nv-l@tkg.com>
>
>                    Sent by:              cc:
>
>                    owner-nv-l@tkg.       Subject:     [NV-L] Nvcorrd error
>checking  
>                    com
>
>
>
>
>
>                    12/13/01 06:45
>
>                    PM
>
>                    Please respond
>
>                    to IBM NetView
>
>                    Discussion
>
>

>
>
>
>
>
>
>Hello List,
>
>I am having a problem with nvcorrd daemon. Problem as follows:
>
>I have a NV rule to execute a script upon receiving a trap.
>I checked the trapd.log for the trap, it is there, but the script did not
>execute.
>
>ovstatus shows nvcorrd (and all the daemons) are RUNNING and well behaved.
>Another symptom I see is the control desktop is not updating (through exceed
>and Linux console as well). After I ovstop and ovstart, the script is
>executing again.
>
>I have DM profile to check for daemon up, and scripts to do ovstatus |grep
>RUNNING and ovstatus |grep OVs_WELL_BEHAVED. But both of these checking
>mechanism are NOT picking up the nvcorrd is not working as it is supposed to
>(because it still thinks it is RUNNING and well behaved)
>
>How can I check that nvcorrd is REALLY running? Some sort of heartbeat using
>ruleset maybe?
>
>regards,
>
>Jack.
>
>
>_________________________________________________________________________
>NV-L List information and Archives: http://www.tkg.com/nv-l
>_________________________________________________________________________
>NV-L List information and Archives: http://www.tkg.com/nv-l
>


Jorge A Jiles
Network Analyst
Computing & Network Services
University of Alberta
Edmonton, Alberta
Canada



_________________________________________________________________________
NV-L List information and Archives: http://www.tkg.com/nv-l





<Prev in Thread] Current Thread [Next in Thread>

Archive operated by Skills 1st Ltd

See also: The NetView Web