[Oisf-users] Question about cpu-affinity

Michał Purzyński michalpurzynski1 at gmail.com
Wed Mar 7 03:03:30 UTC 2018


Minimizing cache misses was the biggest message of septun indeed. Glad people can hear the message, esp when companies like AMD focus 100% on execution units and internal bandwidth, while completely missing the point here - those resources are unused most of the time.

When we started I had IPC 0.4 - 0.7.

Through the process of understanding what’s going on and measuring cache misses we went to IPC close to 3, comparing to ideal 4 (Xeons are 4-wide). Not bad for a workload that has to experience cache misses by design (pats self and pevma on the back).

Interesting side effect - libpcap will experience two cache misses per packet, as it tries to calculate time stamp based on headers and then another cache miss for the data itself.

We recommend separating RSS from worker threads indeed. To each their own, sometimes it will work well, but I’d still separate - to minimize context switches and TLB trashing.

BTW, Linux with meltdown patches and finally using PCID on Haswell might have an interesting effect here ;)

Grab a few cores, make them do RSS with DDIO, dedicate everything else to workers, have fun :)

> On Mar 6, 2018, at 3:01 PM, Cooper F. Nelson <cnelson at ucsd.edu> wrote:
> 
> "All programming is an exercise in caching."
>     -Terje Mathisen
> 
> Regarding this deployment, since I was on old Intel hardware that is not
> very IO-friendly either, I just copied that build to the new Piledriver
> system and switched from cluster_cpu to cluster_flow.  And separated the
> detect threads from the RSS queues.  No need for the offloading features
> this time (which TBH do impact detection for some sigs) with HyperScan,
> AVX and 56 detect threads!  The system is at around 12% load @peak, even
> with the on-demand CPU frequency governor. 
> 
> I agree that the new Intel FSB innovations like DDIO are at this point
> pretty much mandatory for 10 Gb HPC IDS deployments.  I'm already
> looking at doing a 40Gb build using a modern Intel system and the new
> 40G NICs, which officially support symmetric hashing.  
> 
> -Coop
> 
>> On 3/4/2018 11:31 PM, Michał Purzyński wrote:
>> The SepTun Mark II we're about to publish should actually behave better on
>> non-IO friendly architectures, like AMD.
>> 
>> Speaking personally, this is my private opinion:
>> 
>> I don't see any deeper thought process about IO optimization on the AMD
>> side, other than increasing the throughput of every interconnect. That's
>> nice, but those aren't even close to being saturated, as we're wasting
>> cycles waiting for cache misses :/
>> 
>> Intel approached this problem in a much more systematic way.
>> 
> 
> -- 
> Cooper Nelson
> Network Security Analyst
> UCSD ITS Security Team
> cnelson at ucsd.edu x41042
> 
> 



More information about the Oisf-users mailing list