We have the Event Configuration set to page us when a node goes down and
again when it comes back up. The problem is when the node goes down for
only a few minutes, due to a missed poll, etc and then comes right back up
- we don't want these 'false alarms'. Is it a ping sweep that is
determining whether the node is down (I think it's set to every 5 minutes).
Can we increase the number of tries in the ping that is determining if the
node is down to decrease the sensitivity? We want to be paged if the node
is down longer than 10 minutes for instance. I noticed the ruleset editor
has some built in function for this (is obviously a common problem /
complaint), however how is the logic built to page when it is down longer
than 10 minutes. It seems that the logic does not include a specific
source - only a node down matched by a node up correlation. What if say,
three nodes go down and two come back up. What will be the source of the
page?
What is a good way to be paged only when a node is down longer than a
given (say 10 minutes) time.
|