nv-l
[Top] [All Lists]

Re: Determining length of time a node is down

To: nv-l@lists.tivoli.com
Subject: Re: Determining length of time a node is down
From: James Shanks <James_Shanks@TIVOLI.COM>
Date: Wed, 14 Jul 1999 18:56:27 -0400
Reply-to: Discussion of IBM NetView and POLYCENTER Manager on NetView <NV-L@UCSBVM.UCSB.EDU>
Sender: Discussion of IBM NetView and POLYCENTER Manager on NetView <NV-L@UCSBVM.UCSB.EDU>
Karen -

As I am on vacation starting tomorrow, I finally had a chance to look into the
"sysUpTime" you mentioned in a ruleset.  I myself was ignorant of this value,
and as far as I know, no one else has looked into using this before.  It's one
of those things that seems to have gone unnoticed and unused since it was
incorporated into the product.  I didn't know it was an option on
pass/reset-on-match until you mentioned it.   But the code seems pretty
unambiguous to me.  The value for "sysUpTime" is taken from the time ticks value
in the incoming trap, which would make it an integer value expressed in
hundredths of a second, measuring essentially the length of time since netmon,
the author of Node Up and Node Down traps, was started.  Not very useful I would
say, but if that's what you want, it is the same value as $NVT which you can
access in an action or in-line action node in a ruleset.

A better choice, I think, would be the timestamps in the fourth variable of
those traps, $NVATTR_4.  The timestamp here is the same format as the one which
you find in trapd's log --  I believe it is seconds since Jan. 1, 1970 (the
proverbial birth of UNIX).  That should be the time stamp when netmon actually
created the trap.  But be careful as the timestamp is not the only thing in
var4.  There are also object ids  -- look in the descriptions of those traps and
you will see what they contain.

But actually, I agree with Gary.  netmon sends his trap to trapd internally over
tcp -- the network is not involved  -- so generally speaking the value you see
in the trapd log is close enough, save perhaps a second, to the actual time of
the trap, perhaps less.  And trapd.log has the advantage of having a
human-readable date-time stamp following the machine value, so it is easy to use
that for a reality-check on your calculations.  Very little is gained, I think,
in recording your own up/down log and processing that over trapd's.

James Shanks
Tivoli (NetView for UNIX) L3 Support



Karin Binder <karin.binder@nwa.com> on 07/13/99 03:39:51 PM

Please respond to karin.binder@nwa.com

To:   NV-L@UCSBVM.UCSB.EDU
cc:    (bcc: James Shanks/Tivoli Systems)
Subject:  Re: Determining length of time a node is down





Gary,

Thanks for your reply.  You interpreted correctly - I am trying to do a
post-mortem calculcation on the length of time a device is down.

I was intending to do the same type of calculation you described (matching
events and subtracting the differences in the timestamp).  The difference
is that I was trying to get at the event and trap attribute values during
ruleset processing.  Based on the configuration of "Pass on Match", it was
apparent that the event and trap attributes for the matching events were
being used during ruleset processing.  Since the values were already known
at that point, I thought it would be more efficient to access them at that
time rather than performing additional system calls, opening files, etc. in
the subsequent script.  In this instance, I wanted the timestamps (seconds
since epoch and converted) from each of the traps.  I'd still like to know
if there's a way to access those variables in ruleset processing.  If not,
I'll kick off a script and parse trapd.log.

I planned to query sysUpTime to determine if a reset had taken place.  I do
realize that sysUpTime measures the uptime of the snmp agent, not
necessarily the device.  This seems to be as close as I can get and still
cover soft resets, unless you have other suggestions?  I'm still curious
what the sysUpTime in the event attribute refers to.  I suspect it is the
uptime of one of NetView's agents, but have no confirmation on that.

Thanks for the reference on the timing info via ovtopodump -l. That
explained question 3, and will be useful for other checks in the future.


----------
> From: Boyles, Gary P <gary.p.boyles@INTEL.COM>
> To: NV-L@UCSBVM.UCSB.EDU
> Subject: Re: Determining length of time a node is down
> Date: Friday, July 09, 1999 4:11 PM
>
> Karin,
> I believe the sysUpTime value is the MIB-II sysUptime value.
>
> What exactly are you trying to do anyway?  If  you're trying
> to calculate the number of seconds something is down (after the
> fact) then an easy way is to match the node-down/node-up, or
> interface-down/interface-up events residing in /usr/OV/log/trapd.log,
> and use the values in the 1st column of the trapd.log file.  This
> column denotes the timestamp (in seconds) when the event occurred.
> Subtracting the end-event from the beginning-event will give you
> the result (which is the time between the two events).
>
> If you're trying to do something else... please explain what
> problem you're trying to solve.
>
> Hope this helps.
>
> BTW -- bits of timing information can be obtained thru ovtopodump -l,
> and other info via ovobjprint -s.
>
> Regards,
>
> Gary Boyles
>
>
> -----Original Message-----
> From: Karin Binder [mailto:karin.binder@nwa.com]
> Sent: Friday, July 09, 1999 12:33 PM
> To: NV-L@UCSBVM.UCSB.EDU
> Subject: Determining length of time a node is down
>
>
> Hello all,
>
> I have a need to report the length of time a node is down.
>
> I have been trying to use ruleset processing to work with the IBM_NDWN_EV
> and IBM_NUP_EV events.  I receive the events, and can match the events
for
> a given device.  So far, so good.  The difficulty I am having is in
trying
> to access the attribute information from both traps (once they have been
> matched) so I can pass them to a script for further processing.  I was
able
> to do it by setting the correlation value from the node_down and then the
> node_up, but it seemed kludgy. And since there's only one set of rolling
> correlation fields, I'd rather not waste it on this.
>
> I've tried the documentation (manuals, online, man pages), but it didn't
go
> into much depth.  In fact, that and some testing led to further
questions:
>
>
> 1) Since the "pass on match" node can access event attributes from two
> matched traps, is there any way I can access it too during ruleset
> processing?  I'd like to obtain the $NVT and $NVATTR_4 from each of the
> matched traps.
>
> 2) Can someone please clarify which agent's sysUpTime is reported in the
> sysUpTime Event Attribute Value? (Ref. Admin Guide p. 5-35)  From my
> testing and viewing the debug output in nvcorrd logs, it does not appear
to
> be the uptime of the device being reported on in the trap (the device at
> $NVA).
>
> 3) When doing a demand poll of a node that is down, the output contains
> "down since ....." and a date/time stamp.  Where is this information
> stored, and is it accessible?
>
>
> I'm curious if others are measuring this and how.  Open to any
suggestions,
> documentation references, etc.
>
> Thanks in advance,
> Karin

<Prev in Thread] Current Thread [Next in Thread>

Archive operated by Skills 1st Ltd

See also: The NetView Web