[Oisf-users] Testers: please test our initial Hyperscan support

Tue Apr 5 19:05:28 UTC 2016

On 4/3/2016 7:33 PM, Viiret, Justin wrote:
> Hi Cooper,
> 
> Thanks for all the details. Comments are inline below:
> 
>> My intuition of why I'm seeing similar performance using the
>> Hyperscan pattern matcher vs. the 'ac' matcher is because the SIMD
>> pipeline is shared among hyper-threaded cores.
> 
> I had a chat to a colleague with deeper architectural knowledge than
> mine, and he said the following (between dashes):
> 
> ---- This statement about how the SIMD pipeline is shared among
> hyper-threaded cores is not correct for modern Intel Architecture
> post Core 2 – or at least, there is no difference between integer,
> floating point and SIMD operations in this regard. There is a lot of
> detail in Chapter 2 of the Intel 64 and IA-32 Architectures
> Optimization Reference Manual:

Thanks for the feedback/link, I see my error now.  As your reference
mentioned, I think this may have been the case with Intel architectures
prior to the i7.

>  However, you may be correct in essence: a matcher that spends a lot
> of time waiting for cache misses (or other higher-latency operations)
> may get more benefit from HT than one that uses the execution
> resources (whether they are integer, FP or SIMD) intensively, as
> their operations can be more effectively interleaved.

This is why I often say the RSS implementation of suricata's 'worker'
runmode is the poster-boy for hyperthreading.  Under real-world
workloads, you get 2x the performance due to all the IO/cache misses
involved with processing live packets.

> The profile is  very interesting -- can you share a similar profile
> from a version running with the AC matcher? I'm also curious about
> how you are measuring performance; is this an inline IPS deployment,
> or something different? Have you measured a "no MPM rules" baseline
> as well?

Well, that's the thing.  It's hard to measure real performance on modern
super-scalar architectures, as the performance is dependent on lots of
variables.  I/O, cache lines, pipelines, OOE, power-management,
hyper-threading, etc.

This is an IDS deployment so I basically look at two things.  I watch a
'top' window and try to make sure the 5-min load average is under 16 at
peak (16 HT cores) and when I restart the suricata process I check the
logs and ensure that packet loss is under 1%.

I ran the ac algo last night under a period of lighter load and it shows
that it uses much more CPU time.

> 
>    PerfTop:   61719 irqs/sec  kernel:32.6%  exact: 93.0% [4000Hz cycles:pp],  (all, 16 CPUs)
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 
>     49.02%  suricata            [.] SCACSearch
>     10.20%  [kernel]            [k] acpi_processor_ffh_cstate_enter
>      9.77%  [kernel]            [k] __bpf_prog_run
>      2.93%  suricata            [.] BoyerMoore
>      1.53%  suricata            [.] IPOnlyMatchPacket
>      1.51%  suricata            [.] SigMatchSignatures
>      1.32%  [kernel]            [k] __memcpy
>      1.27%  suricata            [.] StreamTcp

The system load was still around 14, vs 15-16 at peak using the hs algo.

One thing I have noticed is that the 'hs' matcher doesn't seem to result
in '0.0' idle times; which is a big win for suricata.  When any
core/thread is 100% utilized is when you start dropping packets.  And as
mentioned, it uses less memory as well.

> The time in the fdr_exec and nfaExec functions constitute the
> majority of the time in Hyperscan in this profile, so they add up to
> ~ 15% of runtime -- this looks like a lighter workload than the Web
> traffic traces we tested with here, but there are a lot of variables
> that could affect that (different rule set, the BPF filter, overhead
> of live network scanning vs. our PCAP trace scanning for testing,
> etc).

We are running a tweaked config for our environment/hardware, especially
for web traffic.

> One concrete suggestion: you may see some improvement from using
> Hyperscan 4.1, which has some improvements to the literal matching
> path. It's available here:
> 
> https://github.com/01org/hyperscan/releases/tag/v4.1.0

The 'make install' failed, but it looks the libraries built so I just
copied them over manually.  Performance is a little better, pertop
output copied below.  You can infer the relative amount of IP traffic by
looking at the percentage of CPU time spent running the BPF filter.

> 
>    PerfTop:   63306 irqs/sec  kernel:39.9%  exact: 90.2% [4000Hz cycles:pp],  (all, 16 CPUs)
> ------------------------------------------------------------------------------------------------------------------------------------------------------------
> 
>     12.63%  [kernel]            [k] __bpf_prog_run
>     12.32%  [kernel]            [k] acpi_processor_ffh_cstate_enter
>      7.53%  libhs.so.4.1.0      [.] fdr_exec_x86_64_s1_w128
>      4.20%  libhs.so.4.1.0      [.] nfaExecMcClellan16_B
>      3.04%  libc-2.22.so        [.] __memset_sse2
>      2.97%  suricata            [.] BoyerMoore
>      2.75%  suricata            [.] IPOnlyMatchPacket
>      2.39%  suricata            [.] SigMatchSignatures
>      2.05%  suricata            [.] StreamTcp
>      1.85%  [kernel]            [k] __memcpy
>      1.81%  libhs.so.4.1.0      [.] fdr_exec_x86_64_s2_w128
>      1.43%  gzip                [.] longest_match
>      1.35%  [ixgbe]             [k] ixgbe_configure
>      1.26%  libc-2.22.so        [.] vfprintf
>      1.10%  suricata            [.] FlowManager
>      1.07%  [kernel]            [k] tpacket_rcv
>      1.06%  libpthread-2.22.so  [.] pthread_mutex_lock
>      1.02%  [kernel]            [k] __memset
>      0.88%  suricata            [.] AFPReadFromRing
>      0.71%  suricata            [.] FlowGetFlowFromHash
>      0.68%  [kernel]            [k] __netif_receive_skb_core
>      0.68%  suricata            [.] StreamTcpPacket
>      0.66%  libhs.so.4.1.0      [.] roseBlockExec_i

-- 
Cooper Nelson
Network Security Analyst
UCSD ITS Security Team
cnelson at ucsd.edu x41042

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openinfosecfoundation.org/pipermail/oisf-users/attachments/20160405/27402a99/attachment-0002.sig>