Hello List members ;)
NetView v7.1.4
Solaris 2.8
Tavve Preview v2.2
Well, I've stumped support, I've stumped my colleagues, and I've been
stumped on this one myself for about a month. So, here it is, I'm not
making it up, this is really happening and I'd like some input from
anyone who has any ideas on what the heck is going on.
This problem has occurred twice now. It affects the snmpCollect daemon.
Basically, the first time it happened, my primary and backup NetView
servers both stopped recording SNMP at 20:03 local time. One box lives
in eastern time zone, one box lives in mountain time. Both boxes stopped
recording snmpCollect data at 20:03 LOCAL time (i.e. two hours apart.
The second occurrence happened the day before yesterday, affected only
one box, this time at 20:10 local time. No changes have been made in the
snmpCollect area other than the usual routine changes to data collection
over the last few weeks. We run snmpCollect with a 10 minute deferral
instead of the usual 60 minutes and we run with -n 50 (50 simultaneous
requests). All of this has been running for years.
Cycling snmpCollect does not fix the problem. Once the daemon is started
up, and then the deferral period elapses, no data is being written to
disk. I can truss the process on Solaris and see snmpCollect is doing
something, I can snoop trace and see packets out and in. Everything from
NetViews point of view looks just fine. Just no changes on the date/time
stamps on the snmp Collect database files.
Mysteriously, 12 hours later, it starts working again. There is a cron
entry at 20:00 that truncates snmpCollect data (a utility part of Tavve
PreView). But I can find no indication it failed and I certainly can't
explain why it started working 12 hours later. I do know that the file
system with snmp data was 84% full when the truncate feature ran on the
16th. Maybe space was an issue. (I had verbose snmp tracing on until
that day).
I am open to any speculation. I can't reproduce this.
|