[Oisf-users] Testers: please test our initial Hyperscan support

Mon Apr 4 02:33:26 UTC 2016

Hi Cooper,

Thanks for all the details. Comments are inline below:

> My intuition of why I'm seeing similar performance using the Hyperscan
> pattern matcher vs. the 'ac' matcher is because the SIMD pipeline is shared
> among hyper-threaded cores.  

I had a chat to a colleague with deeper architectural knowledge than mine, and he said the following (between dashes):

----
This statement about how the SIMD pipeline is shared among hyper-threaded cores is not correct for modern Intel Architecture post Core 2 – or at least, there is no difference between integer, floating point and SIMD operations in this regard. There is a lot of detail in Chapter 2 of the Intel 64 and IA-32 Architectures Optimization Reference Manual:

    http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html

However, you may be correct in essence: a matcher that spends a lot of time waiting for cache misses (or other higher-latency operations) may get more benefit from HT than one that uses the execution resources (whether they are integer, FP or SIMD) intensively, as their operations can be more effectively interleaved. 

There are improvements in the HT implementation that can benefit hyper-threaded Hyperscan usage in cores more modern than Nehalem, but of course these improvements may benefit other matchers as well, so we can’t speculate about relative improvements for your particular case.
----

The profile is  very interesting -- can you share a similar profile from a version running with the AC matcher? I'm also curious about how you are measuring performance; is this an inline IPS deployment, or something different? Have you measured a "no MPM rules" baseline as well?

The time in the fdr_exec and nfaExec functions constitute the majority of the time in Hyperscan in this profile, so they add up to ~ 15% of runtime -- this looks like a lighter workload than the Web traffic traces we tested with here, but there are a lot of variables that could affect that (different rule set, the BPF filter, overhead of live network scanning vs. our PCAP trace scanning for testing, etc).

One concrete suggestion: you may see some improvement from using Hyperscan 4.1, which has some improvements to the literal matching path. It's available here:

    https://github.com/01org/hyperscan/releases/tag/v4.1.0

Feel free to contact me directly too, if you have details you'd rather keep off-list.

Best regards,
    Justin