nv-l
[Top] [All Lists]

RE: TIPN Inventory - NV Nodes Problem

To: nv-l@lists.tivoli.com
Subject: RE: TIPN Inventory - NV Nodes Problem
From: "Cowan, Chris" <Chris.Cowan@2ndwaveinc.com>
Date: Tue, 4 Sep 2001 15:39:25 -0500
This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.
Sorry, I didn't see this earlier.

Sun has always had problems with their DNS support.    Anyone who used SunOS
4 or early versions of Solaris 2 knows what a pleasure it was to get their
systems to use DNS.   Sun basically assumed for a long while that everyone
would want to use NIS.  

I'm not sure whether it was a gethostbyname or getaddrbyname call that
caused us problems.   The actual exception (which we found using truss) was
caused by /etc/.name_service_door.    Solaris 7 and 8 (and maybe 2.6) have
this new thing called a "door" for services.   They can be identified by a
"D" (not a "d") when doing a ls -l. Anyway, this door is managed/used by
nscd, which is how we got to the root cause, nscd not running.

Recently, we ran into a second problem where the conf file for nscd was not
configured correctly for caching hosts names.   The truss output was very
similar.

To answer why nscd was turned off, it was driven by some security people
using the "turn off all unnecessary services" philosophy.   And as to
whether it has caused us other problems, perhaps it has.   The null pointer
return was something we read about on the Sun managers list, and may not be
totally accurate.    


-----Original Message-----
From: James Shanks [mailto:SHANKS@us.tivoli.com]
Sent: Friday, July 20, 2001 9:52 AM
To: 'IBM NetView Discussion'
Subject: RE: [NV-L] TIPN Inventory - NV Nodes Problem


Chris -

I'm curious, and like the cat, that will probably get me killed someday :-)

But I just have to ask about this.  I don't work on TIPN nor know very much
about it, but my real ignornace is the internals of Solaris.
On AIX, where TIPN was born, "gethostbyname" is an operating system call,
there is no daemon or service involved, and it never, ever returns a null
pointer so far as I know.  So what is this nscd service that you turned
off, and why did you do that?  I am curious because it seems to me that the
TIPN guys could just say, "Well, sorry, but we can't run effectively in
that environment, so what you are doing is not supported."

The reason I ask is because NetView proper, especially netmon, and the
event processing daemons, trapd, nvcorrd,nvserverd,actionsvr, do
"gethostbyname" all over the place.  If that is failing, I'm surprised that
you aren't having serious NetView problems too.  Or have you?

James Shanks
Team Leader, Level 3 Support
Tivoli NetView for UNIX and NT


---------------------- Forwarded by James Shanks/Raleigh/IBM on 07/20/2001
07:55 AM ---------------------------

"Cowan, Chris" <Chris.Cowan@2ndWaveinc.com>@tkg.com on 07/19/2001 05:44:13
PM

Please respond to IBM NetView Discussion <nv-l@tkg.com>

Sent by:  owner-nv-l@tkg.com


To:   "'IBM NetView Discussion'" <nv-l@tkg.com>
cc:
Subject:  RE: [NV-L] TIPN Inventory - NV Nodes Problem




Well,  we discovered the root cause of the problem, using truss.

We had  turned off the nscd service, as part of our hardening  procedure.
Starting nscd solved the problem.  The key was  noticing an open64() call
against /etc/.name_service_door just before a  SIGSEGV was thrown.

There  is an application code adjustment that should be made to handle a
null pointer  return from gethostbyname, under these circumstances, on
Solaris.    (There is quite a bit of discussion about this on the Sun
Managers  list).   It appears that the TIPN application code doesn't
account for  this.

We  have forwarded this information to Tivoli support.

-----Original Message-----
From: Cowan,  Chris [mailto:Chris.Cowan@2ndwaveinc.com]
Sent: Monday, July 16, 2001  10:13 AM
To: IBM NetView Discussion (E-mail)
Subject: [NV-L]  TIPN Inventory - NV Nodes Problem



Since we've had a problem with TIPN Inventory that  has been open for 3
months with support, I thought I'd share it with the list,  and see if
anyone has run into this.    In a nutshell, we  noticed of that nv_nodes
table was not being loading.     (nv_interfaces, nv_segments, and
nv_networks load just fine).  Our  platform is Solaris 2.7 with recent
kernel patches, and NV 6.0.1 or  6.0.2.

We had about 6 production setups of NetView of  which about 4 were
malfunctioning.    The ones that were  working were running TMF 3.6.2 with
patches.  We believe we have isolated  the probem in our testing to the
libtmf.so in TMF 3.6.4 (and also  3.7.x).  When we perform this upgrade, we
get an "UNHANDLED EXCEPTION  LOOKING UP FIELDS" message and the nv_nodes
table is not loaded.    Tracing the RIM does not help, BTW.  This exception
is thrown before the  upload through the RIM is attempted.

Support claims that they are unable to reproduce  this problem, by the
application of the TMF patch.    So, most  recently we tried several things
including multiple orders of installation for  the TMF, Inv, NV, and TIPN
components required.   In all cases, we  are consistently able to cause the
failure with the TMF 3.6.4  patch.

Has anyone else seen this problem, or does this  ring a bell with you in
any way.   If so, feel free to contact  me.

<<Christopher Cowan  (E-mail).vcf>>











_________________________________________________________________________
NV-L List information and Archives: http://www.tkg.com/nv-l

Sorry, I didn't see this earlier.

Sun has always had problems with their DNS support.    Anyone who used SunOS 4 or early versions of Solaris 2 knows what a pleasure it was to get their systems to use DNS.   Sun basically assumed for a long while that everyone would want to use NIS. 

I'm not sure whether it was a gethostbyname or getaddrbyname call that caused us problems.   The actual exception (which we found using truss) was caused by /etc/.name_service_door.    Solaris 7 and 8 (and maybe 2.6) have this new thing called a "door" for services.   They can be identified by a "D" (not a "d") when doing a ls -l. Anyway, this door is managed/used by nscd, which is how we got to the root cause, nscd not running.

Recently, we ran into a second problem where the conf file for nscd was not configured correctly for caching hosts names.   The truss output was very similar.

To answer why nscd was turned off, it was driven by some security people using the "turn off all unnecessary services" philosophy.   And as to whether it has caused us other problems, perhaps it has.   The null pointer return was something we read about on the Sun managers list, and may not be totally accurate.   


-----Original Message-----
From: James Shanks [mailto:SHANKS@us.tivoli.com]
Sent: Friday, July 20, 2001 9:52 AM
To: 'IBM NetView Discussion'
Subject: RE: [NV-L] TIPN Inventory - NV Nodes Problem


Chris -

I'm curious, and like the cat, that will probably get me killed someday :-)

But I just have to ask about this.  I don't work on TIPN nor know very much
about it, but my real ignornace is the internals of Solaris.
On AIX, where TIPN was born, "gethostbyname" is an operating system call,
there is no daemon or service involved, and it never, ever returns a null
pointer so far as I know.  So what is this nscd service that you turned
off, and why did you do that?  I am curious because it seems to me that the
TIPN guys could just say, "Well, sorry, but we can't run effectively in
that environment, so what you are doing is not supported."

The reason I ask is because NetView proper, especially netmon, and the
event processing daemons, trapd, nvcorrd,nvserverd,actionsvr, do
"gethostbyname" all over the place.  If that is failing, I'm surprised that
you aren't having serious NetView problems too.  Or have you?

James Shanks
Team Leader, Level 3 Support
Tivoli NetView for UNIX and NT


---------------------- Forwarded by James Shanks/Raleigh/IBM on 07/20/2001
07:55 AM ---------------------------

"Cowan, Chris" <Chris.Cowan@2ndWaveinc.com>@tkg.com on 07/19/2001 05:44:13
PM

Please respond to IBM NetView Discussion <nv-l@tkg.com>

Sent by:  owner-nv-l@tkg.com


To:   "'IBM NetView Discussion'" <nv-l@tkg.com>
cc:
Subject:  RE: [NV-L] TIPN Inventory - NV Nodes Problem




Well,  we discovered the root cause of the problem, using truss.

We had  turned off the nscd service, as part of our hardening  procedure.
Starting nscd solved the problem.  The key was  noticing an open64() call
against /etc/.name_service_door just before a  SIGSEGV was thrown.

There  is an application code adjustment that should be made to handle a
null pointer  return from gethostbyname, under these circumstances, on
Solaris.    (There is quite a bit of discussion about this on the Sun
Managers  list).   It appears that the TIPN application code doesn't
account for  this.

We  have forwarded this information to Tivoli support.

-----Original Message-----
From: Cowan,  Chris [mailto:Chris.Cowan@2ndwaveinc.com]
Sent: Monday, July 16, 2001  10:13 AM
To: IBM NetView Discussion (E-mail)
Subject: [NV-L]  TIPN Inventory - NV Nodes Problem



Since we've had a problem with TIPN Inventory that  has been open for 3
months with support, I thought I'd share it with the list,  and see if
anyone has run into this.    In a nutshell, we  noticed of that nv_nodes
table was not being loading.     (nv_interfaces, nv_segments, and
nv_networks load just fine).  Our  platform is Solaris 2.7 with recent
kernel patches, and NV 6.0.1 or  6.0.2.

We had about 6 production setups of NetView of  which about 4 were
malfunctioning.    The ones that were  working were running TMF 3.6.2 with
patches.  We believe we have isolated  the probem in our testing to the
libtmf.so in TMF 3.6.4 (and also  3.7.x).  When we perform this upgrade, we
get an "UNHANDLED EXCEPTION  LOOKING UP FIELDS" message and the nv_nodes
table is not loaded.    Tracing the RIM does not help, BTW.  This exception
is thrown before the  upload through the RIM is attempted.

Support claims that they are unable to reproduce  this problem, by the
application of the TMF patch.    So, most  recently we tried several things
including multiple orders of installation for  the TMF, Inv, NV, and TIPN
components required.   In all cases, we  are consistently able to cause the
failure with the TMF 3.6.4  patch.

Has anyone else seen this problem, or does this  ring a bell with you in
any way.   If so, feel free to contact  me.

<<Christopher Cowan  (E-mail).vcf>>











_________________________________________________________________________
NV-L List information and Archives: http://www.tkg.com/nv-l





<Prev in Thread] Current Thread [Next in Thread>
  • RE: TIPN Inventory - NV Nodes Problem, Cowan, Chris <=

Archive operated by Skills 1st Ltd

See also: The NetView Web