nv-l
[Top] [All Lists]

Re: Node Up/Down Traps

To: nv-l@lists.tivoli.com
Subject: Re: Node Up/Down Traps
From: Ray Schafer <schafer@tkg.com>
Date: Tue, 07 Mar 2000 09:08:19 -0600
W.M.de.Bruin@DNB.NL wrote:

> Hi all,
>
> Important servers are monitored for availability by Netview. Status Polling is
> set to take place every 5 minutes.
> If a server does not respond, a Node Down trap is generated, and automatic
> escalation takes place. (Paging, SMS-messages, etc....)
>
> This is all fine, except that once every week or two, some of these servers 
> are
> rebooted at night, or over the weekend, and are then unavailable for approx. 
> 30
> minutes.
> At this time, we don´t want any traps, because the events are then escalated 
> to
> maintenance personnel.

I like Leslie's idea of having a cron job set the value for a field
(Maintenancewindow=YES) and then unset it later.  You will still get the node 
down,
but your ruleset can ignore it.

I had a similar problem, and used collections to do it also, but instead of 
setting
a field, I ran a script that changed the filter rules on an MLM that handled the
nodes.  The collections were named after the local TimeRange for convienience.
This script only needed to be run when the members of the collection were 
changed.
This filter rule would just filter out traps from the nodes during the time they
are scheduled for maintenance.  Since status checking is off-loaded to the MLM 
for
these nodes, the node down trap sent by the MLM can be supressed, and you will 
not
even see a node down netmon trap, so you don't have to do anything special in a
ruleset.

One issue with using either method is that if the node does not come back up 
after
maintenance for some reason, and you have supressed the node down trap, you will
not get notified of the problem.  So, I  have a cron job that simply does a 
demand
poll of all the nodes in the collection right after the maintenance window.  
This
way, if they are still down the node down trap comes in after the window and 
normal
procedures would follow.

--
Ray Schafer                   | schafer@tkg.com
The Kernel Group              | Distributed Systems Management
http://www.tkg.com


<Prev in Thread] Current Thread [Next in Thread>

Archive operated by Skills 1st Ltd

See also: The NetView Web