nv-l
[Top] [All Lists]

RE: [nv-l] Stress Testing NV, looking for opinions

To: <nv-l@lists.us.ibm.com>
Subject: RE: [nv-l] Stress Testing NV, looking for opinions
From: "Van Order, Drew \(US - Hermitage\)" <dvanorder@deloitte.com>
Date: Thu, 3 Jun 2004 15:32:12 -0500
Delivery-date: Thu, 03 Jun 2004 22:04:57 +0100
Envelope-to: nv-l-archive@lists.skills-1st.co.uk
Importance: normal
Reply-to: nv-l@lists.us.ibm.com
Sender: owner-nv-l@lists.us.ibm.com
Thread-index: AcRJkPo8y3xhpRnaTnytfIU/MXxMlwAAtX1QAAC6XZAABE5EYA==
Thread-topic: [nv-l] Stress Testing NV, looking for opinions
Update--setting the script to send 300 traps every 3 minutes finally
broke something, and it wasn't NetView--it was TEC. And TEC's not really
broken, it's just queuing events like mad to where it ultimately would
hang if the traps kept pouring in. The netview.rls TEC rule is slammed
to where the tec_rule process is eating CPU. We've seen this before and
actively monitor for queued events. 1,000 events with a 10 minute break
started the slowdown, which went downhill. By the time I finish typing
this, TEC will be caught up.

NV never flinched. Pretty cool, and I gotta give TEC credit as well,
that is the most I have ever seen it handle, and the repository isn't
under AIX, its RIM host is a SQL Server running Windows 2003.

-----Original Message-----
From: Van Order, Drew (US - Hermitage) 
Sent: Thursday, June 03, 2004 2:01 PM
To: 'nv-l@lists.us.ibm.com'
Subject: RE: [nv-l] Stress Testing NV, looking for opinions


Very true. We have a 6 CPU RS6000 w/4GB RAM, but it's also the TMR and
TEC. We manage 6000 NV objects, expect that to double. Support has
stated our hardware is more than sufficient for 12000.

Scott, looks like we have gone down similar paths in troubleshooting. We
too had a large number of _WAIT processes, and we also had queue backups
for UDP 162. This was during the first phase of troubleshooting, before
we traced nvcorrd and IBM pointed out the Smartset Query that (in
support's estimation) probably wasn't causing trouble because it
essentially queried NULL, but nevertheless was a wasted step. The
FIN_WAIT processes were Java console sessions deemed not hurting
anything either. First step was to see if sluggish DNS was the problem,
then we moved to capturing data. Support felt the issue may be in trapd,
so we prepped for hex tracing of that, only to not have a slowdown
since.

And I recently bumped the number of traps to 1,000. Still working. I am
next going to drop the number of traps to 250, but set cron for every 3
minutes. No word from IBM about the script, so here it is; works great.
This generates an IBM Authentication Trap, you can make it do anything
you want by putting in the right hex information. It looks like you
could have multiple traps ($trap1, $trap2) defined, but I am not sure
how to edit the send line accordingly. I'm guessing a Perl person would
know the answer in about 5 seconds.


-----Original Message-----
From: owner-nv-l@lists.us.ibm.com [mailto:owner-nv-l@lists.us.ibm.com]
On Behalf Of Barr, Scott
Sent: Thursday, June 03, 2004 12:55 PM
To: nv-l@lists.us.ibm.com
Subject: RE: [nv-l] Stress Testing NV, looking for opinions


Well certainly the size of the machine plays a role too. I am on Sunfire
V210s which I would think is a lot of horsepower for NetView (2 x 1.0
Ghz processors and 2 GB of memory) 

> -----Original Message-----
> From: owner-nv-l@lists.us.ibm.com 
> [mailto:owner-nv-l@lists.us.ibm.com] On Behalf Of Francois Le Hir
> Sent: Thursday, June 03, 2004 12:32 PM
> To: nv-l@lists.us.ibm.com
> Subject: RE: [nv-l] Stress Testing NV, looking for opinions
> 
> 
> 
> 
> 
> I believe netview is able to handle a lot more than 5 to 8 
> traps per second. A couple of years ago when running Netview 
> 6 (before RFI), I have seen my netview box handle some peak 
> of over 300 000 events per hour (ie about 83 events per 
> second). Of course, it all depends on the rulsets you are 
> running and such a volume of event is not normal (This was 
> during a major outage).
> 
> Salutations, / Regards,
> 
> Francois Le Hir
> Network Projects & Consulting Services
> IBM Global Services
> Phone: (514) 964 2145
> 
> 
>                                                               
>              
>              "Van Order, Drew                                 
>              
>              \(US -                                           
>              
>              Hermitage\)"                                     
>           To 
>              <dvanorder@deloit         
> <nv-l@lists.us.ibm.com>             
>              te.com>                                          
>           cc 
>              Sent by:                                         
>              
>              owner-nv-l@lists.                                
>      Subject 
>              us.ibm.com                RE: [nv-l] Stress 
> Testing NV,       
>                                        looking for opinions   
>              
>                                                               
>              
>              06/03/2004 12:35                                 
>              
>              PM                                               
>              
>                                                               
>              
>                                                               
>              
>              Please respond to                                
>              
>                    nv-l                                       
>              
>                                                               
>              
>                                                               
>              
> 
> 
> 
> 
> I got the base script from IBM support and have no problem 
> sharing if someone from IBM weighs in with no objections. We 
> are running these traps through TEC_ITS.rls, so nvcorrd, etc. 
> should be getting exercised. I would like to put a mix of 
> traps in as well, but am not a developer so I'm making do right now.
> 
> Funny you mention Query Smartset node; we are pretty sure 
> this was the major source of our trouble. Ours happened to be 
> there for no good reason, so we removed it and cycled the 
> daemons. In addition, we did minor things like configure 
> trapd to save logs for a week, and implemented a weekly 
> ovmapcount/ovtopofix process. NV has been smooth ever since. 
> Until then, NV had been hanging at least once/week, and we 
> were thinking NV was choking on the number of traps, which we 
> now believe to be bunk based on testing. MLM was considered 
> to be the solution until we learned our addressing scheme was 
> not compatible. That's when we opened a support call--been at 
> this for about a month now.
> 
> We're also ready to up the number of traps to see where NV 
> falls over. When this started, we got information from 
> support that NV could handle sustained 6-8 traps/second. I've 
> got the email somewhere...  It appears that number is conservative.
>       -----Original Message-----
>       From: owner-nv-l@lists.us.ibm.com
>       [mailto:owner-nv-l@lists.us.ibm.com] On Behalf Of Barr, Scott
>       Sent: Thursday, June 03, 2004 11:03 AM
>       To: nv-l@lists.us.ibm.com
>       Subject: RE: [nv-l] Stress Testing NV, looking for opinions
> 
>       One other thing - the use of smartsets and rulesets 
> heavily affects
>       performance. It would be beneficial if your testing included a
>       variety of traps, not the same one over and over. In addition,
>       pushing them through rulesets if possible would be a 
> real good stress
>       test especially if you have rulesets doing a "query 
> smartset" node.
> 
>       Would you be willing to share your script that 
> generates the traps? I
>       am interested in doing the same thing.
> 
>       From: owner-nv-l@lists.us.ibm.com
>       [mailto:owner-nv-l@lists.us.ibm.com] On Behalf Of Brett Coley
>       Sent: Thursday, June 03, 2004 10:44 AM
>       To: nv-l@lists.us.ibm.com
>       Subject: Re: [nv-l] Stress Testing NV, looking for opinions
> 
>       That sounds like a valid way to test, but I'm thinking
>       you may want to throw in some more randomness, maybe
>       some heavier peaks.   Sounds like the 250 in 50 secs are
>       dealt with ok in their 10 minute window, but what happens
>       with a burst of 1000 thrown into the mix?
> 
>       Regards,
>       Brett
>       bcoley@us.ibm.com
>       Tivoli Software/IBM
> 
> 
> 
> 
> 
> 
> 
> This message (including any attachments) contains 
> confidential information intended for a specific individual 
> and purpose, and is protected by law. If you are not the 
> intended recipient, you should delete this message. Any 
> disclosure, copying, or distribution of this message, or the 
> taking of any action based on it, is strictly prohibited.
> 
> 
> 
> 
> 




This message (including any attachments) contains confidential information 
intended for a specific individual and purpose, and is protected by law.  If 
you are not the intended recipient, you should delete this message.  Any 
disclosure, copying, or distribution of this message, or the taking of any 
action based on it, is strictly prohibited.


<Prev in Thread] Current Thread [Next in Thread>

Archive operated by Skills 1st Ltd

See also: The NetView Web