nv-l
[Top] [All Lists]

Re: [nv-l] netfmt

To: nv-l@lists.us.ibm.com
Subject: Re: [nv-l] netfmt
From: James Shanks <jshanks@us.ibm.com>
Date: Mon, 10 May 2004 17:00:51 -0400
Delivery-date: Mon, 10 May 2004 22:09:01 +0100
Envelope-to: nv-l-archive@lists.skills-1st.co.uk
In-reply-to: <1084219994.15957.51.camel@chibuku.ns.carilion.com>
Reply-to: nv-l@lists.us.ibm.com
Sender: owner-nv-l@lists.us.ibm.com

Mahesh,

All that's in your nettl log are messages from other processes, ipmap, ovw and ovspmd.  There is nothing from nettl itself., and nothing to indicate that the nettl process had a problem.  See where it says "Software:"?   That's how you can tell what process wrote the message.
So the nettl log itself doesn't look promising, but you should let someone else from Support look for you.

The ps output may tell us more.
This kind of output is normal.  it is what you should see:
root      2471     1  0 15:49 ?        00:00:00 /usr/OV/bin/ntl_reader 0
1 1 1 1
root      2472  2471  0 15:49 ?        00:00:00 netfmt -CF
root      8018  9132  0 15:53 pts/0    00:00:00 grep 2471


Notice how the parent process of the netfmt -CF (2471) is the ntl_reader process?


In the earlier cases, the parent process is 1, which means that the nettl process, the ntl_reader, which spawned them, has itself gone away and the netfmt then inherits the init process (1) as its parent. , since it has no parent left in the system.  These are all orphans.
root     28067     1  0 15:45 ?        00:00:00 netfmt -CF
root     23113     1  0 15:48 ?        00:00:00 netfmt -CF
root     23748     1  0 15:48 ?        00:00:00 netfmt -CF
root     24536     1  0 15:48 ?        00:00:00 netfmt -CF


This situation might indicate that the ntl_reader process is coring on your box.  Can you find any core files in the root (/) directory? Or in /usr/OV?  
I  don't believe that ntl-reader is setup to use /usr/OV/PD /cores.  

In any case, open a problem to Support, and let them help you gather some data..  I have no idea what else to tell you.


James Shanks
Level 3 Support  for Tivoli NetView for UNIX and Windows
Tivoli Software / IBM Software Group



Mahesh Tailor <mahesh.tailor@network.carilion.com>
Sent by: owner-nv-l@lists.us.ibm.com

05/10/2004 04:13 PM
Please respond to
nv-l

To
NetView User List <nv-l@lists.us.ibm.com>
cc
Subject
Re: [nv-l] netfmt





Hi, James!

Here's the output of my ps -ef:

root@netview [/usr/OV/log] # ps -ef | grep netfmt
root     28067     1  0 15:45 ?        00:00:00 netfmt -CF
root     23113     1  0 15:48 ?        00:00:00 netfmt -CF
root     23748     1  0 15:48 ?        00:00:00 netfmt -CF
root     24536     1  0 15:48 ?        00:00:00 netfmt -CF
root      2472  2471  0 15:49 ?        00:00:00 netfmt -CF
root      8020  9132  0 15:53 pts/0    00:00:00 grep netfmt
root@netview [/usr/OV/log] # ps -ef | grep 2471
root      2471     1  0 15:49 ?        00:00:00 /usr/OV/bin/ntl_reader 0
1 1 1 1
root      2472  2471  0 15:49 ?        00:00:00 netfmt -CF
root      8018  9132  0 15:53 pts/0    00:00:00 grep 2471

And, these are since I had to restart my machine 50-minutes ago.

I performed a nettl -stop and still had the netfmt processes belonging
to PID 1 running; killed them.  Restarted nettl.

Here're some of the nettl log messages . . .


************************************ NetView
*******************************@#%
 Timestamp            : Mon May 10 2004 10:06:07.308834
 Process ID           : 9774               Subsystem        : SECURITY
 User ID ( UID )      : 0                  Log Class        : ERROR
 Device ID            : -1                 Path ID          : -1
 Connection ID        : -1                 Log Instance     : 0
                                                                                                                                                     
 Software             : /usr/OV/bin/ovw
 Hostname             : netview.carilion.com
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
OVwUserSecurity() error 4 on waitpid
                                                                                                                                                     
************************************ NetView
*******************************@#%
 Timestamp            : Mon May 10 2004 15:08:45.118009
 Process ID           : 1609               Subsystem        : OVW
 User ID ( UID )      : 0                  Log Class        : ERROR
 Device ID            : -1                 Path ID          : -1
 Connection ID        : -1                 Log Instance     : 0
                                                                                                                                                     
 Software             : /usr/OV/bin/ipmap
 Hostname             : netview.carilion.com
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
IPMap error in symbolMgr::flushSymbols - OVwCreateSymbols - (OVwError =
80): Object not found.
                                                                                                                                                     
************************************ NetView
*******************************@#%
 Timestamp            : Mon May 10 2004 15:08:45.118101
 Process ID           : 1609               Subsystem        : OVW
 User ID ( UID )      : 0                  Log Class        : ERROR
 Device ID            : -1                 Path ID          : -1
 Connection ID        : -1                 Log Instance     : 0
                                                                                                                                                     
 Software             : /usr/OV/bin/ipmap
 Hostname             : netview.carilion.com
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Failed to create symbol: 172.23.6.25.  OVwError =80: Object not found.
                                                                                                                                                     
************************************ NetView
*******************************@#%
 Timestamp            : Mon May 10 2004 15:08:45.118763
 Process ID           : 1609               Subsystem        : OVW
 User ID ( UID )      : 0                  Log Class        : ERROR
 Device ID            : -1                 Path ID          : -1
 Connection ID        : -1                 Log Instance     : 0
                                                                                                                                                     
 Software             : /usr/OV/bin/ipmap
 Hostname             : netview.carilion.com
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
IPMap error in symbolMgr::flushSymbols - OVwCreateSymbols - (OVwError =
80): Object not found.
                                                                                                                                                     
************************************ NetView
*******************************@#%
 Timestamp            : Mon May 10 2004 15:08:45.118822

 Process ID           : 1609               Subsystem        : OVW
 User ID ( UID )      : 0                  Log Class        : ERROR
 Device ID            : -1                 Path ID          : -1
 Connection ID        : -1                 Log Instance     : 0
                                                                                                                                                     
 Software             : /usr/OV/bin/ipmap
 Hostname             : netview.carilion.com
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Failed to create symbol: 10.10.10.10.  OVwError =80: Object not found.
                                                                                                                                                     
************************************ NetView
*******************************@#%
 Timestamp            : Mon May 10 2004 15:17:38.349803
 Process ID           : 1394               Subsystem        : OVS
 User ID ( UID )      : 0                  Log Class        : ERROR
 Device ID            : -1                 Path ID          : -1
 Connection ID        : -1                 Log Instance     : 0
                                                                                                                                                     
 Software             : /usr/OV/bin/ovspmd
 Hostname             : netview.carilion.com
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Object manager kronos.carilion.com is not registered.  See ovaddobj(1m).


Kronos.carilion.com is 10.10.10.10 which is a Win2K cluster address and
is excluded as !kronos.carilion.com in netmon.seed.

If you see something obvious can you please drop me a reply.  If not, I
will submit a PMR.

Thanks.

Mahesh



On Mon, 2004-05-10 at 15:41, James Shanks wrote:
> Well, I don't have a clue what is wrong, but on Linux, it is the nettl
> process itself which spawns the netfmt -CF.  But only one of those is
> spawned on my system and it stays active only so long as nettl is
> active.  When I do a  "/usr/OV/bin/nettl -stop"  both nettl and the
> netfmt go away.
>
> You should be able to chase ownership of the process via ps -ef.   Who
> is | are the parents of these rogue netfmts?  Your current nettl or
> some other long gone?  What happens when or if you do nettl -stop?  
> Once the main nettl goes away, you should be able to kill those netfmt
> processes with impunity, though that will not tell you why they are
> being created.  But you can stop and restart nettl any time you wish.
> Normally it is just started once and keeps running until stopped.   If
> you stop nettl and kill all the remaining netfmts, if any, and then
> restart nettl with nettl -start, try looking with "ps -ef  |grep
> netfmt".  How many do you see? Should be just one.  Try looking again
> every few minutes.
>
> Offhand I see nothing in your status that looks out of line.  Where
> would you look for a source of the problem?  Well, I'm not sure, since
> I've never seen anything like this before, but here's what I'd do:
> (1) /usr/OV/bin/nettl -stop
> (2) ps -ef  | grep netfmt.  kill any you find
> (3) cd /usr/OV/log
> (4) ls nettl*    and see how many you have, just netttl.LOG00 or also
> nettl.LOG01
> (5) for each nettl.LOG0n you have, issue
>         /usr/OV/bin/netfmt -f  nettl.LOG0n  >  formatted.LOG0n
>         This creates ascii files you can read.  
> (6) Look in the formatted logs for interesting error messages
> (7) Call Support with what you find.
>
> James Shanks
> Level 3 Support  for Tivoli NetView for UNIX and Windows
> Tivoli Software / IBM Software Group
>
>
> Mahesh Tailor
> <mahesh.tailor@network.carilion.com>
> Sent by:
> owner-nv-l@lists.us.ibm.com
>
> 05/10/2004 03:01 PM
>          Please respond to
>                nv-l
>                To
> NetView User List
> <nv-l@lists.us.ibm.com>
>                cc
>
>           Subject
> [nv-l] netfmt
>
>
>
>
> Hi!
>
> Running NetView 7.1.3 fp 2 on RedHat Linux AS 2.1.
>
> I am having a problem with hundreds of netfmt -CF processes running
> and
> eventually disabling the system because of too many open files [system
> default open files has been set to 32K files].  How can I figure out
> what is causing all these processes to start?  Here's my nettl status
> output:
>
> Logging Information:
> Log Filename:                   /usr/OV/log/nettl.LOG0x
> User's ID:              0       Buffer Size:            8192
> Messages Dropped:       0       Messages Queued:        0
>
> Subsystem Name:                 Log Class:
> NON_IP                                                     ERROR
> DISASTER
> DISTMAN                                            WARNING ERROR
> DISASTER
> SECURITY                                           WARNING ERROR
> DISASTER
> COLLECTION                                         WARNING ERROR
> DISASTER
> SNMP                                                       ERROR
> DISASTER
> CMOT                                                       ERROR
> DISASTER
> OVE                                                        ERROR
> DISASTER
> OVC                                                        ERROR
> DISASTER
> OVW                                                        ERROR
> DISASTER
> OVD                                                        ERROR
> DISASTER
> OVS                                    INFORMATIVE         ERROR
> DISASTER
> OVCAPI                                                     ERROR
> DISASTER
> OVEXTERNAL                                                 ERROR
> DISASTER
> OVWAPI                                                     ERROR
> DISASTER
> TEST_ID_1                                                      
> DISASTER
> TEST_ID_2                                                      
> DISASTER
> FORMATTER                                                      
> DISASTER
>
>
> Tracing Information:
>
> Trace Filename:
> No Subsystems Active
>
>
> In addition to NetView the server also has the following running:
>
> - MySQL DB
> - Apache w/PHP and Perl.
> - Some ksh scripts that perform /usr/OV/bin/nvUtil on various
> smartsets
> once every 30-minutes.
>
> That is essentially it.
>
> Also, what does the netfmt -C option do?  It is not in the man page.
>
> Thanks.
>
> Mahesh
--
Mahesh Tailor
WAN/TSM/NetView Administrator
Carilion Health System
Information Services
37 Reserve Avenue
Roanoke, VA 24016
Phone: 540.224.3929
Fax: 540.224.3954



<Prev in Thread] Current Thread [Next in Thread>

Archive operated by Skills 1st Ltd

See also: The NetView Web