[Oisf-users] Performance on multiple CPUs

Mon Aug 15 16:19:45 UTC 2011

On Mon, Aug 15, 2011 at 9:30 PM, Gene Albin <gene.albin at gmail.com> wrote:
> Anoop,
>   Indeed.  With 48 CPU's in both runmodes in each max-pending-packets
> category I average the following across all of my runs:
>
> Runmode: Auto
> MPP     Avg PPS   StDev
> 50:     27160     590
> 500:    29969     1629
> 5000:   31267     356
> 50000:  31608     358
>
> Runmode: AutoFP
> MPP     Avg PPS   StDev
> 50:     16924     106
> 500:    56572     405
> 5000:   86683     1577
> 50000:  132936    5548
>
>
>   Just reading over my email I don't think I mentioned the variables that
> I'm adjusting.  3 variables here.  Runmode, Detect Thread Ratio, and Max
> Pending Packets.  Each run that I mention above is at a different DTR from
> .1-1.0 then 1.2, 1.5, 1.7, and 2.0.  I was expecting to see something along
> the lines of Eric LeBlond's results on his blog post:
> http://home.regit.org/2011/02/more-about-suricata-multithread-performance/
> but it doesn't look like changing the DTR gave me the significant
> performance increase that he reported.  (most likely due to other
> differences in our .yaml files, i.e. cpu_affinity).
>
>   Thank you for the clarification on the relationship between MPP and the
> cache.  That does clear thing up a bit.  So you think I should be seeing
> better performance with 48 CPU's than I'm currently getting?  Where do you
> think I can make the improvements?  My first guess would be in cpu_affinity,
> but that's just a guess.
>
>   I don't mind filling in the table, however I don't think the attachment
> made it to my inbox.  Would you mind resending?
>

Sorry.  Forgot to attach it previously.  Attached

> Thanks,
> Gene
>
>
> On Mon, Aug 15, 2011 at 12:20 AM, Anoop Saldanha <poonaatsoc at gmail.com>
> wrote:
>>
>> On Sun, Aug 14, 2011 at 11:43 AM, Gene Albin <gene.albin at gmail.com> wrote:
>> >
>> > Anoop,
>> >   With max-pending-packets set to 50,000 and 48 CPU's I get performance
>> > around 135,000 packets/sec.  With mpp at 50,000 and only 4 CPU's I get
>> > performance around 31,343 packets/sec.  Both of these are with
>> > --runmode=autofp enabled.
>> >
>> > Interestingly enough, when I run 4 CPU's in autofp mode I get 31,343
>> > pps, and when I run 48 CPU's in auto mode I also get 31,343 pps.
>> >
>>
>> You get this for all runs of auto + 48 CPUs?
>>
>> >
>> >   I have to admit that I don't quite follow your explanation about the
>> > thread usage below.  In layman's terms how will this affect the performance
>> > of suricata?
>>
>> They probably would have meant this.  Whenever a thread processes a
>> packet, whatever data the thread needs to process the packet, would be
>> cached by the thread.  Now this is one thread, one packet.  Let's say
>> you have more packets now.  With this packet processing rate, you
>> would have threads trying to load data for too many packets into the
>> cache, which might lead to other thread overwriting the cache with
>> their data.
>>
>> Either ways I really wouldn't worry about cache behaviour based on
>> increasing max pending packets.  The consumption/processing rate is
>> high with greater max pending packets, to be countered by any cache
>> performance degradation.
>>
>> All this doesn't mean you can't obtain performance based on cache
>> usage.  A lot of our performance improvements is based on writing good
>> cache usage code(more on locality of reference).  If you write code
>> that understands cache usage, the benefit's tenfold.
>>
>> >
>> >   In my case I seem to be getting great performance increases, but I
>> > can't see what downside there might be with the cache.
>> >
>>
>> Yes, but with 48 cores, we can extract even more performance out of
>> the engine than what you are currently seeing, and cache may/may not
>> have anything to do with it.  So if there are any cache performance
>> issues, it is reducing the maximum performance obtainable on 48 cores
>> and this reduced performance is what you are currently seeing as the
>> throughput, but even this lowered throughput is far greater than what
>> you would have otherwise achieved using just 50 max pending packets.
>>
>> I hope that clears it.
>>
>> I believe you are running some tests on suricata.  Whenever you run
>> run suricata in a particular config, can you fill this table(have
>> attached it) up?  When you are done filling it you can mail it.
>>
>>
>> > Thanks,
>> > Gene
>> >
>> > On Sat, Aug 13, 2011 at 10:05 PM, Anoop Saldanha <poonaatsoc at gmail.com>
>> > wrote:
>> >>
>> >>
>> >> On Thu, Aug 11, 2011 at 12:24 AM, Gene Albin <gene.albin at gmail.com>
>> >> wrote:
>> >>>
>> >>> So I'm running in autofp mode and I increased the max-pending-packets
>> >>> from 50 to 500, then 5000, then 50000.  I saw a dramatic increase from:
>> >>> 50 to 500 (17000 packets/sec @ 450sec to 57000 pps at 140s)
>> >>> not quite as dramatic from:
>> >>> 500 to 5000 ( to 85000pps at 90s)
>> >>> and about the same from:
>> >>> 5000 to 50000 (to 135000pps at 60s)
>> >>> My question now is about the tradeoff mentioned in the config file.
>> >>>  Mentions negatively impacting caching.  How does it impact caching?  Will I
>> >>> see this when running pcaps or in live mode?
>> >>> Thanks,
>> >>> Gene
>> >>
>> >> Probably polluting_the_cache/breaking_the_cache_coherency for the data
>> >> used by other packets.  Either ways I wouldn't second guess the effects of
>> >> cache usage when it comes to multiple threads probably ruining data loaded
>> >> by some other thread.  I would just be interested about locality of
>> >> reference with respect to data used by one thread for whatever time slice it
>> >> is on the cpu.
>> >>
>> >> ** I see that you have tested with max-pending-packets set to 50,000.
>> >> Can you check how Suricata scales from 4 cpu cores to 32 cpu cores, with
>> >> these 50,000 max-pending-packets, and post the results here?
>> >>
>> >>>
>> >>> On Thu, Aug 4, 2011 at 1:07 PM, saldanha <poonaatsoc at gmail.com> wrote:
>> >>>>
>> >>>> On 08/03/2011 08:50 AM, Gene Albin wrote:
>> >>>>
>> >>>> So I just installed Suricata on one of our research computers with
>> >>>> lots of cores available.  I'm looking to see what kind of performance boost
>> >>>> I get as I bump up the CPU's. After my first run I was surprised to see that
>> >>>> I didn't get much of a boost when going from 8 to 32 CPUs.  I was running a
>> >>>> 6GB pcap file with a about 17k rules loaded.  The first run on 8 cores took
>> >>>> 190sec.  The second run on 32 cores took 170 sec.  Looks like something
>> >>>> other than CPU is the bottle neck.
>> >>>>
>> >>>> My first guess is Disk IO.  Any recommendations on how I could
>> >>>> check/verify that guess?
>> >>>>
>> >>>> Gene
>> >>>>
>> >>>> --
>> >>>> Gene Albin
>> >>>> gene.albin at gmail.com
>> >>>>
>> >>>>
>> >>>>
>> >>>> _______________________________________________
>> >>>> Oisf-users mailing list
>> >>>> Oisf-users at openinfosecfoundation.org
>> >>>> http://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users
>> >>>>
>> >>>> * forgot to reply to the list previously
>> >>>>
>> >>>> Hey Gene.
>> >>>>
>> >>>> Can you test by increasing the max-pending-packets in the
>> >>>> suricata.yaml file to a higher value.  You can try one run with a value of
>> >>>> 500 and then try higher values(2000+ suggested.  More the better, as long as
>> >>>> you don't hit swap).
>> >>>>
>> >>>> Once you have set a higher max-pending-packets you can try running
>> >>>> suricata in autofp runmode.  autofp mode runs suricata in flow-pinned mode.
>> >>>> To do this add this option to your suricata command line "--runmode=autofp.
>> >>>> "
>> >>>>
>> >>>> sudo suricata -c ./suricata.yaml -r your_pcap.pcap --runmode=autofp
>> >>>>
>> >>>> With max-pending-packets set to a higher value and with
>> >>>> --runmode=autofp, you can test how suricata scales from 4 to 32 cores.
>> >>>>
>> >>>>
>> >>>>
>> >>>> _______________________________________________
>> >>>> Oisf-users mailing list
>> >>>> Oisf-users at openinfosecfoundation.org
>> >>>> http://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Gene Albin
>> >>> gene.albin at gmail.com
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> Oisf-users mailing list
>> >>> Oisf-users at openinfosecfoundation.org
>> >>> http://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Anoop Saldanha
>> >>
>> >
>> >
>> >
>> > --
>> > Gene Albin
>> > gene.albin at gmail.com
>> >
>>
>>
>>
>> --
>> Anoop Saldanha
>
>
>
> --
> Gene Albin
> gene.albin at gmail.com
>
>

-- 
Anoop Saldanha
-------------- next part --------------
** all tests with the same pcap

|---------------------+-----+--------+------------+--------+-----------------|
| max-pending-packets | cpu | mode   | time(secs) | alerts | packets per sec |
|---------------------+-----+--------+------------+--------+-----------------|
|                  50 |   4 |        |            |        |                 |
|                     |  16 | auto   |            |        |                 |
|                     |  32 |        |            |        |                 |
|                     |  48 |        |            |        |                 |
|---------------------+-----+--------+------------+--------+-----------------|
|                  50 |   4 |        |            |        |                 |
|                     |  16 | autofp |            |        |                 |
|                     |  32 |        |            |        |                 |
|                     |  48 |        |            |        |                 |
|---------------------+-----+--------+------------+--------+-----------------|
|---------------------+-----+--------+------------+--------+-----------------|
|---------------------+-----+--------+------------+--------+-----------------|
|                5000 |   4 |        |            |        |                 |
|                     |  16 | auto   |            |        |                 |
|                     |  32 |        |            |        |                 |
|                     |  48 |        |            |        |                 |
|---------------------+-----+--------+------------+--------+-----------------|
|                5000 |   4 |        |            |        |                 |
|                     |  16 | autofp |            |        |                 |
|                     |  32 |        |            |        |                 |
|                     |  48 |        |            |        |                 |
|---------------------+-----+--------+------------+--------+-----------------|
|---------------------+-----+--------+------------+--------+-----------------|
|---------------------+-----+--------+------------+--------+-----------------|
|               50000 |   4 |        |            |        |                 |
|                     |  16 | auto   |            |        |                 |
|                     |  32 |        |            |        |                 |
|                     |  48 |        |            |        |                 |
|---------------------+-----+--------+------------+--------+-----------------|
|               50000 |   4 |        |            |        |                 |
|                     |  16 | autofp |            |        |                 |
|                     |  32 |        |            |        |                 |
|                     |  48 |        |            |        |                 |
|---------------------+-----+--------+------------+--------+-----------------|