Well, I am only guessing, but the NetView for UNIX daemons are
single-threaded. Which means they can only do one thing at a time
(although they do them very fast usually). If name resoluion, which is
done by the OS function "gethostbyaddress" gets hung, then nothing will
happen until the OS returns a respose to that call. I have seen bad name
resolution take as much as 30 seconds to respond, and during that time,
nvcorrd (and all other daemons waiting on him) just stop. Since nvcdebug
is a synchronous operation, it may have timed out waiting for nvcorrd to
respond, and that would produce that message.
Just a guess.
James Shanks
Team Leader, Level 3 Support
Tivoli NetView for UNIX and NT
"Westphal, Raymond" <RWestphal@erac.com>@tkg.com on 03/19/2001 01:35:17 PM
Please respond to IBM NetView Discussion <nv-l@tkg.com>
Sent by: owner-nv-l@tkg.com
To: "NV List (E-mail)" <nv-l@tkg.com>
cc:
Subject: [NV-L] nvcorrd ?
Hello All,
NV 6.0.2 for UNIX on IBM AIX 4.3.3 with maintenance level 4.
I've been testing an inconsistent ruleset that checks for node down of
critical devices. If the node is down for over 5 minutes, then page Ray. I
found a problem with name resolution. The forward lookup of the host name
returned 5 IP addresses. The reverse resolution was not working for all 5
IP
addresses. The ruleset works correctly now.
In the process of monitoring the ruleset, I ran "nvcdebug -d all" command.
When I attempt to run the debug, I keep encountering an "unable to connect
to correlation daemon" error. I can correct the problem by stopping
actionsvr, then nvcorrd and restarting all the daemons. The same error may
be caused when I simply stop actionsvr after making a change to a ruleset.
No core dumps show in the AIX error report. nvstat shows all daemons as
running.
Has anyone else had this experience with NV 6.0.2?
Thanks.
Ray Westphal
Enterprise Rent-A-Car
_________________________________________________________________________
NV-L List information and Archives: http://www.tkg.com/nv-l
|