To: | nv-l@lists.tivoli.com |
---|---|
Subject: | RE: Nvcorrd error checking |
From: | "James Shanks" <jshanks@us.ibm.com> |
Date: | Fri, 14 Dec 2001 11:38:48 -0500 |
This is a multipart message in MIME format. I am going to assume that the rulesets you have in the 6.0.3 system are identical to those in the 7.1 system. If not, all bets are off. The nvserverd "not running" message is the result of a time out. If nvevents, the event window, cannot talk to the nvserverd daemon, he issues that message. If ovstatus shows the daemon up, and ps -ef does too, then he is stalled doing something else. If you are forwarding to TEC, the thing to do might be to check and see if your TEC server has gone down, since nvserverd is the guy who forwards to TEC when you use the internal adapter. I would also check the /etc/Tivoli/tec/cache file and see if it is growing. If it is, then that means we cannot contact the TEC server for some reason, and nvserverd is having to try to reconnect with him on every event it gets, vastly slowing things down. James Shanks Level 3 Support for Tivoli NetView for UNIX and NT Tivoli Software / IBM Software Group Jorge Jiles <Jorge.Jiles@ualberta.ca> Sent by: owner-nv-l@tkg.com 12/14/2001 10:50 AM Please respond to IBM NetView Discussion To: IBM NetView Discussion <nv-l@tkg.com> cc: Subject: RE: [NV-L] Nvcorrd error checking I have seen the same problem in my system. Netview 7.1 on Solaris 8. What is really weird is that at times I also get the message that nvserverd is not running when according to ovstatus and logs is working OK. The Only way to get going the events (and the scripts called by rulesets) is by stop/start nvcorrd, and nvserverd. More than once I had to stopped all the daemons and restart them. If I find any explanation for this, I let you know. I don't think the rulesets are the problems as the same ones are running in a production environment AIX, Netview 6.02 and they work properly. At 01:55 PM 12/14/2001 +1100, you wrote: >Thanks for the tip James, I will look into it. > >We are running 24X7 for some boxes, so I am thinking of some sort of >heartbeat to check the trapd, nvcorrd and nvactiond e.g(wsnmptrap to NV --> >script to postemsg to TEC --> TEC touch a local file --> script to check >touch time of file) > >How are Netview experts check their daemons out there? Thanks! > >Regards, > >Jack > >-----Original Message----- >From: James Shanks [mailto:jshanks@us.ibm.com] >Sent: Friday, 14 December 2001 2:21 p.m. >To: IBM NetView Discussion >Subject: Re: [NV-L] Nvcorrd error checking > > >Jack - > >Try looking in the logs. nvcorrd writes to an alog and a blog in >/usr/OV/log. Errors are always written there. nvcorrd always starts in the >alog, writes a 1000 lines, switches to blog, then writes another 1000, and >switches back. And when there is an action to be run he hands that off to >actionsvr, who also has a pair of logs, nvaction.alog and blog, that >work the same way. If you still don't see anything, then you can turn on >tracing, using the command "nvcdebug -d all". There are man pages on all >this stuff, as well as lengthy discussions in the Admin Guide about how it >works. > >If you read the NetView Diagnosis Guide, you may find more hints. There >you will learn that "Well-behaved" does not mean that the daemon is working. >It is a static condition which reflects how it was built, not whether it is >running correctly at this particular time. A well-behaved daemon goes down >when you do ovstop. One that is "non-well-behaved" stays up even after the >others go away. > >James Shanks >Level 3 Support for Tivoli NetView for UNIX and NT >Tivoli Software / IBM Software Group > > > > > > > "Chan, Jack" > > <jack.chan@nz.u To: "'IBM NetView Discussion'" > > nisys.com> <nv-l@tkg.com> > > Sent by: cc: > > owner-nv-l@tkg. Subject: [NV-L] Nvcorrd error >checking > com > > > > > > 12/13/01 06:45 > > PM > > Please respond > > to IBM NetView > > Discussion > > > > > > > > >Hello List, > >I am having a problem with nvcorrd daemon. Problem as follows: > >I have a NV rule to execute a script upon receiving a trap. >I checked the trapd.log for the trap, it is there, but the script did not >execute. > >ovstatus shows nvcorrd (and all the daemons) are RUNNING and well behaved. >Another symptom I see is the control desktop is not updating (through exceed >and Linux console as well). After I ovstop and ovstart, the script is >executing again. > >I have DM profile to check for daemon up, and scripts to do ovstatus |grep >RUNNING and ovstatus |grep OVs_WELL_BEHAVED. But both of these checking >mechanism are NOT picking up the nvcorrd is not working as it is supposed to >(because it still thinks it is RUNNING and well behaved) > >How can I check that nvcorrd is REALLY running? Some sort of heartbeat using >ruleset maybe? > >regards, > >Jack. > > >_________________________________________________________________________ >NV-L List information and Archives: http://www.tkg.com/nv-l >_________________________________________________________________________ >NV-L List information and Archives: http://www.tkg.com/nv-l > Jorge A Jiles Network Analyst Computing & Network Services University of Alberta Edmonton, Alberta Canada _________________________________________________________________________ NV-L List information and Archives: http://www.tkg.com/nv-l I am going to assume that the rulesets you have in the 6.0.3 system are identical to those in the 7.1 system. If not, all bets are off. The nvserverd "not running" message is the result of a time out. If nvevents, the event window, cannot talk to the nvserverd daemon, he issues that message. If ovstatus shows the daemon up, and ps -ef does too, then he is stalled doing something else. If you are forwarding to TEC, the thing to do might be to check and see if your TEC server has gone down, since nvserverd is the guy who forwards to TEC when you use the internal adapter. I would also check the /etc/Tivoli/tec/cache file and see if it is growing. If it is, then that means we cannot contact the TEC server for some reason, and nvserverd is having to try to reconnect with him on every event it gets, vastly slowing things down. James Shanks Level 3 Support for Tivoli NetView for UNIX and NT Tivoli Software / IBM Software Group
I have seen the same problem in my system. Netview 7.1 on Solaris 8. What is really weird is that at times I also get the message that nvserverd is not running when according to ovstatus and logs is working OK. The Only way to get going the events (and the scripts called by rulesets) is by stop/start nvcorrd, and nvserverd. More than once I had to stopped all the daemons and restart them. If I find any explanation for this, I let you know. I don't think the rulesets are the problems as the same ones are running in a production environment AIX, Netview 6.02 and they work properly. At 01:55 PM 12/14/2001 +1100, you wrote: >Thanks for the tip James, I will look into it. > >We are running 24X7 for some boxes, so I am thinking of some sort of >heartbeat to check the trapd, nvcorrd and nvactiond e.g(wsnmptrap to NV --> >script to postemsg to TEC --> TEC touch a local file --> script to check >touch time of file) > >How are Netview experts check their daemons out there? Thanks! > >Regards, > >Jack > >-----Original Message----- >From: James Shanks [mailto:jshanks@us.ibm.com] >Sent: Friday, 14 December 2001 2:21 p.m. >To: IBM NetView Discussion >Subject: Re: [NV-L] Nvcorrd error checking > > >Jack - > >Try looking in the logs. nvcorrd writes to an alog and a blog in >/usr/OV/log. Errors are always written there. nvcorrd always starts in the >alog, writes a 1000 lines, switches to blog, then writes another 1000, and >switches back. And when there is an action to be run he hands that off to >actionsvr, who also has a pair of logs, nvaction.alog and blog, that >work the same way. If you still don't see anything, then you can turn on >tracing, using the command "nvcdebug -d all". There are man pages on all >this stuff, as well as lengthy discussions in the Admin Guide about how it >works. > >If you read the NetView Diagnosis Guide, you may find more hints. There >you will learn that "Well-behaved" does not mean that the daemon is working. >It is a static condition which reflects how it was built, not whether it is >running correctly at this particular time. A well-behaved daemon goes down >when you do ovstop. One that is "non-well-behaved" stays up even after the >others go away. > >James Shanks >Level 3 Support for Tivoli NetView for UNIX and NT >Tivoli Software / IBM Software Group > > > > > > > "Chan, Jack" > > <jack.chan@nz.u To: "'IBM NetView Discussion'" > > nisys.com> <nv-l@tkg.com> > > Sent by: cc: > > owner-nv-l@tkg. Subject: [NV-L] Nvcorrd error >checking > com > > > > > > 12/13/01 06:45 > > PM > > Please respond > > to IBM NetView > > Discussion > > > > > > > > >Hello List, > >I am having a problem with nvcorrd daemon. Problem as follows: > >I have a NV rule to execute a script upon receiving a trap. >I checked the trapd.log for the trap, it is there, but the script did not >execute. > >ovstatus shows nvcorrd (and all the daemons) are RUNNING and well behaved. >Another symptom I see is the control desktop is not updating (through exceed >and Linux console as well). After I ovstop and ovstart, the script is >executing again. > >I have DM profile to check for daemon up, and scripts to do ovstatus |grep >RUNNING and ovstatus |grep OVs_WELL_BEHAVED. But both of these checking >mechanism are NOT picking up the nvcorrd is not working as it is supposed to >(because it still thinks it is RUNNING and well behaved) > >How can I check that nvcorrd is REALLY running? Some sort of heartbeat using >ruleset maybe? > >regards, > >Jack. > > >_________________________________________________________________________ >NV-L List information and Archives: http://www.tkg.com/nv-l >_________________________________________________________________________ >NV-L List information and Archives: http://www.tkg.com/nv-l > Jorge A Jiles Network Analyst Computing & Network Services University of Alberta Edmonton, Alberta Canada _________________________________________________________________________ NV-L List information and Archives: http://www.tkg.com/nv-l |
<Prev in Thread] | Current Thread | [Next in Thread> |
---|---|---|
|
Previous by Date: | SUN Enterprise MIB CPU, t04706a |
---|---|
Next by Date: | RE: SUN Enterprise MIB CPU, Barr, Scott |
Previous by Thread: | RE: Nvcorrd error checking, Jorge Jiles |
Next by Thread: | RE: Nvcorrd error checking, Scott Bursik |
Indexes: | [Date] [Thread] [Top] [All Lists] |
Archive operated by Skills 1st Ltd
See also: The NetView Web