At 11:05 AM 12-01-05 +0000, you wrote:
Joe,
Thanks for your reply.
I did turn on the trace and looked at the trace file. The only useful
message was
"hostname doesn't reply to xx(number) object PDU, but responds to sysUpTime.
Be sure timeouts are not set too small (SNMP interval 20.00s retry:3)."
I have checked the manual, it only says the definition (and the defaults)
but no consequences (or examples).
Regards,
David
David,
As Paul and Jason have responded, a 20 second response time is too long and
you should be checking the nodes.
From the trace file message you can deduce what the effect is going to be
on your collections.
snmpCollect's strategy is if a node does not respond to a complete retry
cycle, polling of that node gets deferred for some relatively long time.
From what you said previously your defer time is 60 minutes. So if a node
does not respond to 3 consecutive gets, it will not be polled for 60 minutes.
However the trace file message above tells you that the node is responding
with at least one MIB var but not with others. Hence this is defeating the
"defer" strategy.
You suggested in one of your replies that you believe it is a firewall
problem. I don't think so. If it was, the node would not be able to
respond with any MIB var. I don't think your firewall would filter on MIB vars.
If you issue an snmpget from the command line or MIB browser, can you
always get a (quick) response from these nodes with the specific MIB vars
and instances you are collecting? If not, you need to check the node, as
Paul suggested.
You have snmpCollect set to do 50 concurrent polls with up to 50 MIB vars
in each and a polling cycle of 15 minutes. You are polling 600 nodes and
200 of them have "lots" of interfaces. snmpCollect does not do snmp v2 Get
Bulks so this will mean more than one PDU per node for those 200.
The best case of one PDU per node is 600 snmpget cycles that have to be
performed by 50 concurrent "threads" in a 15 minute poll period. But it is
more likely to be 1000 (= 400 + 200 x 2) , 1200 (= 400 + 200 x 3)
....2400(=400+200 x 10)..... depending on what "lots" is.
Each non response is going to block one of these for 1 minute. How many of
the 600 nodes are not responding?
Joe Fernandez
Kardinia Software
jfernand@kardinia.com
www.kardinia.com
|