[Oisf-users] Performance on multiple CPUs
Anoop Saldanha
poonaatsoc at gmail.com
Mon Aug 15 16:19:45 UTC 2011
On Mon, Aug 15, 2011 at 9:30 PM, Gene Albin <gene.albin at gmail.com> wrote:
> Anoop,
> Indeed. With 48 CPU's in both runmodes in each max-pending-packets
> category I average the following across all of my runs:
>
> Runmode: Auto
> MPP Avg PPS StDev
> 50: 27160 590
> 500: 29969 1629
> 5000: 31267 356
> 50000: 31608 358
>
> Runmode: AutoFP
> MPP Avg PPS StDev
> 50: 16924 106
> 500: 56572 405
> 5000: 86683 1577
> 50000: 132936 5548
>
>
> Just reading over my email I don't think I mentioned the variables that
> I'm adjusting. 3 variables here. Runmode, Detect Thread Ratio, and Max
> Pending Packets. Each run that I mention above is at a different DTR from
> .1-1.0 then 1.2, 1.5, 1.7, and 2.0. I was expecting to see something along
> the lines of Eric LeBlond's results on his blog post:
> http://home.regit.org/2011/02/more-about-suricata-multithread-performance/
> but it doesn't look like changing the DTR gave me the significant
> performance increase that he reported. (most likely due to other
> differences in our .yaml files, i.e. cpu_affinity).
>
> Thank you for the clarification on the relationship between MPP and the
> cache. That does clear thing up a bit. So you think I should be seeing
> better performance with 48 CPU's than I'm currently getting? Where do you
> think I can make the improvements? My first guess would be in cpu_affinity,
> but that's just a guess.
>
> I don't mind filling in the table, however I don't think the attachment
> made it to my inbox. Would you mind resending?
>
Sorry. Forgot to attach it previously. Attached
> Thanks,
> Gene
>
>
> On Mon, Aug 15, 2011 at 12:20 AM, Anoop Saldanha <poonaatsoc at gmail.com>
> wrote:
>>
>> On Sun, Aug 14, 2011 at 11:43 AM, Gene Albin <gene.albin at gmail.com> wrote:
>> >
>> > Anoop,
>> > With max-pending-packets set to 50,000 and 48 CPU's I get performance
>> > around 135,000 packets/sec. With mpp at 50,000 and only 4 CPU's I get
>> > performance around 31,343 packets/sec. Both of these are with
>> > --runmode=autofp enabled.
>> >
>> > Interestingly enough, when I run 4 CPU's in autofp mode I get 31,343
>> > pps, and when I run 48 CPU's in auto mode I also get 31,343 pps.
>> >
>>
>> You get this for all runs of auto + 48 CPUs?
>>
>> >
>> > I have to admit that I don't quite follow your explanation about the
>> > thread usage below. In layman's terms how will this affect the performance
>> > of suricata?
>>
>> They probably would have meant this. Whenever a thread processes a
>> packet, whatever data the thread needs to process the packet, would be
>> cached by the thread. Now this is one thread, one packet. Let's say
>> you have more packets now. With this packet processing rate, you
>> would have threads trying to load data for too many packets into the
>> cache, which might lead to other thread overwriting the cache with
>> their data.
>>
>> Either ways I really wouldn't worry about cache behaviour based on
>> increasing max pending packets. The consumption/processing rate is
>> high with greater max pending packets, to be countered by any cache
>> performance degradation.
>>
>> All this doesn't mean you can't obtain performance based on cache
>> usage. A lot of our performance improvements is based on writing good
>> cache usage code(more on locality of reference). If you write code
>> that understands cache usage, the benefit's tenfold.
>>
>> >
>> > In my case I seem to be getting great performance increases, but I
>> > can't see what downside there might be with the cache.
>> >
>>
>> Yes, but with 48 cores, we can extract even more performance out of
>> the engine than what you are currently seeing, and cache may/may not
>> have anything to do with it. So if there are any cache performance
>> issues, it is reducing the maximum performance obtainable on 48 cores
>> and this reduced performance is what you are currently seeing as the
>> throughput, but even this lowered throughput is far greater than what
>> you would have otherwise achieved using just 50 max pending packets.
>>
>> I hope that clears it.
>>
>> I believe you are running some tests on suricata. Whenever you run
>> run suricata in a particular config, can you fill this table(have
>> attached it) up? When you are done filling it you can mail it.
>>
>>
>> > Thanks,
>> > Gene
>> >
>> > On Sat, Aug 13, 2011 at 10:05 PM, Anoop Saldanha <poonaatsoc at gmail.com>
>> > wrote:
>> >>
>> >>
>> >> On Thu, Aug 11, 2011 at 12:24 AM, Gene Albin <gene.albin at gmail.com>
>> >> wrote:
>> >>>
>> >>> So I'm running in autofp mode and I increased the max-pending-packets
>> >>> from 50 to 500, then 5000, then 50000. I saw a dramatic increase from:
>> >>> 50 to 500 (17000 packets/sec @ 450sec to 57000 pps at 140s)
>> >>> not quite as dramatic from:
>> >>> 500 to 5000 ( to 85000pps at 90s)
>> >>> and about the same from:
>> >>> 5000 to 50000 (to 135000pps at 60s)
>> >>> My question now is about the tradeoff mentioned in the config file.
>> >>> Mentions negatively impacting caching. How does it impact caching? Will I
>> >>> see this when running pcaps or in live mode?
>> >>> Thanks,
>> >>> Gene
>> >>
>> >> Probably polluting_the_cache/breaking_the_cache_coherency for the data
>> >> used by other packets. Either ways I wouldn't second guess the effects of
>> >> cache usage when it comes to multiple threads probably ruining data loaded
>> >> by some other thread. I would just be interested about locality of
>> >> reference with respect to data used by one thread for whatever time slice it
>> >> is on the cpu.
>> >>
>> >> ** I see that you have tested with max-pending-packets set to 50,000.
>> >> Can you check how Suricata scales from 4 cpu cores to 32 cpu cores, with
>> >> these 50,000 max-pending-packets, and post the results here?
>> >>
>> >>>
>> >>> On Thu, Aug 4, 2011 at 1:07 PM, saldanha <poonaatsoc at gmail.com> wrote:
>> >>>>
>> >>>> On 08/03/2011 08:50 AM, Gene Albin wrote:
>> >>>>
>> >>>> So I just installed Suricata on one of our research computers with
>> >>>> lots of cores available. I'm looking to see what kind of performance boost
>> >>>> I get as I bump up the CPU's. After my first run I was surprised to see that
>> >>>> I didn't get much of a boost when going from 8 to 32 CPUs. I was running a
>> >>>> 6GB pcap file with a about 17k rules loaded. The first run on 8 cores took
>> >>>> 190sec. The second run on 32 cores took 170 sec. Looks like something
>> >>>> other than CPU is the bottle neck.
>> >>>>
>> >>>> My first guess is Disk IO. Any recommendations on how I could
>> >>>> check/verify that guess?
>> >>>>
>> >>>> Gene
>> >>>>
>> >>>> --
>> >>>> Gene Albin
>> >>>> gene.albin at gmail.com
>> >>>>
>> >>>>
>> >>>>
>> >>>> _______________________________________________
>> >>>> Oisf-users mailing list
>> >>>> Oisf-users at openinfosecfoundation.org
>> >>>> http://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users
>> >>>>
>> >>>> * forgot to reply to the list previously
>> >>>>
>> >>>> Hey Gene.
>> >>>>
>> >>>> Can you test by increasing the max-pending-packets in the
>> >>>> suricata.yaml file to a higher value. You can try one run with a value of
>> >>>> 500 and then try higher values(2000+ suggested. More the better, as long as
>> >>>> you don't hit swap).
>> >>>>
>> >>>> Once you have set a higher max-pending-packets you can try running
>> >>>> suricata in autofp runmode. autofp mode runs suricata in flow-pinned mode.
>> >>>> To do this add this option to your suricata command line "--runmode=autofp.
>> >>>> "
>> >>>>
>> >>>> sudo suricata -c ./suricata.yaml -r your_pcap.pcap --runmode=autofp
>> >>>>
>> >>>> With max-pending-packets set to a higher value and with
>> >>>> --runmode=autofp, you can test how suricata scales from 4 to 32 cores.
>> >>>>
>> >>>>
>> >>>>
>> >>>> _______________________________________________
>> >>>> Oisf-users mailing list
>> >>>> Oisf-users at openinfosecfoundation.org
>> >>>> http://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Gene Albin
>> >>> gene.albin at gmail.com
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> Oisf-users mailing list
>> >>> Oisf-users at openinfosecfoundation.org
>> >>> http://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Anoop Saldanha
>> >>
>> >
>> >
>> >
>> > --
>> > Gene Albin
>> > gene.albin at gmail.com
>> >
>>
>>
>>
>> --
>> Anoop Saldanha
>
>
>
> --
> Gene Albin
> gene.albin at gmail.com
>
>
--
Anoop Saldanha
-------------- next part --------------
** all tests with the same pcap
|---------------------+-----+--------+------------+--------+-----------------|
| max-pending-packets | cpu | mode | time(secs) | alerts | packets per sec |
|---------------------+-----+--------+------------+--------+-----------------|
| 50 | 4 | | | | |
| | 16 | auto | | | |
| | 32 | | | | |
| | 48 | | | | |
|---------------------+-----+--------+------------+--------+-----------------|
| 50 | 4 | | | | |
| | 16 | autofp | | | |
| | 32 | | | | |
| | 48 | | | | |
|---------------------+-----+--------+------------+--------+-----------------|
|---------------------+-----+--------+------------+--------+-----------------|
|---------------------+-----+--------+------------+--------+-----------------|
| 5000 | 4 | | | | |
| | 16 | auto | | | |
| | 32 | | | | |
| | 48 | | | | |
|---------------------+-----+--------+------------+--------+-----------------|
| 5000 | 4 | | | | |
| | 16 | autofp | | | |
| | 32 | | | | |
| | 48 | | | | |
|---------------------+-----+--------+------------+--------+-----------------|
|---------------------+-----+--------+------------+--------+-----------------|
|---------------------+-----+--------+------------+--------+-----------------|
| 50000 | 4 | | | | |
| | 16 | auto | | | |
| | 32 | | | | |
| | 48 | | | | |
|---------------------+-----+--------+------------+--------+-----------------|
| 50000 | 4 | | | | |
| | 16 | autofp | | | |
| | 32 | | | | |
| | 48 | | | | |
|---------------------+-----+--------+------------+--------+-----------------|
More information about the Oisf-users
mailing list