nv-l
[Top] [All Lists]

Re: Determining length of time a node is down

To: nv-l@lists.tivoli.com
Subject: Re: Determining length of time a node is down
From: Karin Binder <karin.binder@NWA.COM>
Date: Thu, 15 Jul 1999 17:04:42 -0500
Reply-to: Discussion of IBM NetView and POLYCENTER Manager on NetView <NV-L@UCSBVM.UCSB.EDU>
Sender: Discussion of IBM NetView and POLYCENTER Manager on NetView <NV-L@UCSBVM.UCSB.EDU>
James,

Thanks for your reply and taking the time to investigate what the event's
sysUpTime referred to.

It seems we agree on the necessary info to do the calculations: timestamps
in seconds since epoch and a human-readable format.  The difference seems to
be in what method to use to obtain the information:  parsing trapd.log or
from the event during ruleset processing.

There are a couple of reasons I opted to try obtaining the data from the
event values ($NVT and within $NV_ATTR4).  First, I found that I will not
see a 1:1 correlation between nodedown - nodeup traps.  For instance, a
node_up trap is issued immediately after an interface down trap on a device
with multiple interfaces of which some are still up (i.e. a router,
multihomed UNIX box, etc.). Another example is a node being removed from the
network. So, instead of writing a script to handle the anomolies, I wanted
to let PassOnMatch take care of it for me.  Second, I'm already doing
ruleset processing on node up/down and interface up/down traps and
forwarding info to a TEC console.  The info I need to measure system
downtime is already there during this processing, so I thought I'd take
advantage of that, skip additional file opens and parsing, and not have to
write/maintain another script.  The problem is, once I've matched my events
via PassOnMatch, I can't get at $NVT or $NV_ATTR4 for the event specified in
slot 2.  It seems that following the PassOnMatch node, the values all refer
to the slot 1 event.  Is there a way to access the values for the slot 2
event variables?  If there is, I'd like to use it.  If not, I'll be writing
a script to parse trapd.log.

Thanks again for all of your help and have a nice vacation!

Karin

----Original Message-----
From: James Shanks <James_Shanks@TIVOLI.COM>
To: NV-L@UCSBVM.UCSB.EDU <NV-L@UCSBVM.UCSB.EDU>
Date: Wednesday, July 14, 1999 6:00 PM
Subject: Re: Determining length of time a node is down


>Karen -
>
>As I am on vacation starting tomorrow, I finally had a chance to look into
the
>"sysUpTime" you mentioned in a ruleset.  I myself was ignorant of this
value,
>and as far as I know, no one else has looked into using this before.  It's
one
>of those things that seems to have gone unnoticed and unused since it was
>incorporated into the product.  I didn't know it was an option on
>pass/reset-on-match until you mentioned it.   But the code seems pretty
>unambiguous to me.  The value for "sysUpTime" is taken from the time ticks
value
>in the incoming trap, which would make it an integer value expressed in
>hundredths of a second, measuring essentially the length of time since
netmon,
>the author of Node Up and Node Down traps, was started.  Not very useful I
would
>say, but if that's what you want, it is the same value as $NVT which you
can
>access in an action or in-line action node in a ruleset.
>
>A better choice, I think, would be the timestamps in the fourth variable of
>those traps, $NVATTR_4.  The timestamp here is the same format as the one
which
>you find in trapd's log --  I believe it is seconds since Jan. 1, 1970 (the
>proverbial birth of UNIX).  That should be the time stamp when netmon
actually
>created the trap.  But be careful as the timestamp is not the only thing in
>var4.  There are also object ids  -- look in the descriptions of those
traps and
>you will see what they contain.
>
>But actually, I agree with Gary.  netmon sends his trap to trapd internally
over
>tcp -- the network is not involved  -- so generally speaking the value you
see
>in the trapd log is close enough, save perhaps a second, to the actual time
of
>the trap, perhaps less.  And trapd.log has the advantage of having a
>human-readable date-time stamp following the machine value, so it is easy
to use
>that for a reality-check on your calculations.  Very little is gained, I
think,
>in recording your own up/down log and processing that over trapd's.
>
>James Shanks
>Tivoli (NetView for UNIX) L3 Support
>
>
>
>Karin Binder <karin.binder@nwa.com> on 07/13/99 03:39:51 PM
>
>Please respond to karin.binder@nwa.com
>
>To:   NV-L@UCSBVM.UCSB.EDU
>cc:    (bcc: James Shanks/Tivoli Systems)
>Subject:  Re: Determining length of time a node is down
>
>
>
>
>
>Gary,
>
>Thanks for your reply.  You interpreted correctly - I am trying to do a
>post-mortem calculcation on the length of time a device is down.
>
>I was intending to do the same type of calculation you described (matching
>events and subtracting the differences in the timestamp).  The difference
>is that I was trying to get at the event and trap attribute values during
>ruleset processing.  Based on the configuration of "Pass on Match", it was
>apparent that the event and trap attributes for the matching events were
>being used during ruleset processing.  Since the values were already known
>at that point, I thought it would be more efficient to access them at that
>time rather than performing additional system calls, opening files, etc. in
>the subsequent script.  In this instance, I wanted the timestamps (seconds
>since epoch and converted) from each of the traps.  I'd still like to know
>if there's a way to access those variables in ruleset processing.  If not,
>I'll kick off a script and parse trapd.log.
>
>I planned to query sysUpTime to determine if a reset had taken place.  I do
>realize that sysUpTime measures the uptime of the snmp agent, not
>necessarily the device.  This seems to be as close as I can get and still
>cover soft resets, unless you have other suggestions?  I'm still curious
>what the sysUpTime in the event attribute refers to.  I suspect it is the
>uptime of one of NetView's agents, but have no confirmation on that.
>
>Thanks for the reference on the timing info via ovtopodump -l. That
>explained question 3, and will be useful for other checks in the future.
>
>
>----------
>> From: Boyles, Gary P <gary.p.boyles@INTEL.COM>
>> To: NV-L@UCSBVM.UCSB.EDU
>> Subject: Re: Determining length of time a node is down
>> Date: Friday, July 09, 1999 4:11 PM
>>
>> Karin,
>> I believe the sysUpTime value is the MIB-II sysUptime value.
>>
>> What exactly are you trying to do anyway?  If  you're trying
>> to calculate the number of seconds something is down (after the
>> fact) then an easy way is to match the node-down/node-up, or
>> interface-down/interface-up events residing in /usr/OV/log/trapd.log,
>> and use the values in the 1st column of the trapd.log file.  This
>> column denotes the timestamp (in seconds) when the event occurred.
>> Subtracting the end-event from the beginning-event will give you
>> the result (which is the time between the two events).
>>
>> If you're trying to do something else... please explain what
>> problem you're trying to solve.
>>
>> Hope this helps.
>>
>> BTW -- bits of timing information can be obtained thru ovtopodump -l,
>> and other info via ovobjprint -s.
>>
>> Regards,
>>
>> Gary Boyles
>>
>>
>> -----Original Message-----
>> From: Karin Binder [mailto:karin.binder@nwa.com]
>> Sent: Friday, July 09, 1999 12:33 PM
>> To: NV-L@UCSBVM.UCSB.EDU
>> Subject: Determining length of time a node is down
>>
>>
>> Hello all,
>>
>> I have a need to report the length of time a node is down.
>>
>> I have been trying to use ruleset processing to work with the IBM_NDWN_EV
>> and IBM_NUP_EV events.  I receive the events, and can match the events
>for
>> a given device.  So far, so good.  The difficulty I am having is in
>trying
>> to access the attribute information from both traps (once they have been
>> matched) so I can pass them to a script for further processing.  I was
>able
>> to do it by setting the correlation value from the node_down and then the
>> node_up, but it seemed kludgy. And since there's only one set of rolling
>> correlation fields, I'd rather not waste it on this.
>>
>> I've tried the documentation (manuals, online, man pages), but it didn't
>go
>> into much depth.  In fact, that and some testing led to further
>questions:
>>
>>
>> 1) Since the "pass on match" node can access event attributes from two
>> matched traps, is there any way I can access it too during ruleset
>> processing?  I'd like to obtain the $NVT and $NVATTR_4 from each of the
>> matched traps.
>>
>> 2) Can someone please clarify which agent's sysUpTime is reported in the
>> sysUpTime Event Attribute Value? (Ref. Admin Guide p. 5-35)  From my
>> testing and viewing the debug output in nvcorrd logs, it does not appear
>to
>> be the uptime of the device being reported on in the trap (the device at
>> $NVA).
>>
>> 3) When doing a demand poll of a node that is down, the output contains
>> "down since ....." and a date/time stamp.  Where is this information
>> stored, and is it accessible?
>>
>>
>> I'm curious if others are measuring this and how.  Open to any
>suggestions,
>> documentation references, etc.
>>
>> Thanks in advance,
>> Karin

<Prev in Thread] Current Thread [Next in Thread>

Archive operated by Skills 1st Ltd

See also: The NetView Web