[Oisf-users] Testers: please test our initial Hyperscan support
Michał Purzyński
michalpurzynski1 at gmail.com
Wed Apr 6 07:59:09 UTC 2016
Our you saying you use as many workers as hyper threads and not physical cores (minus some reserved for other Suricata threads). In other words - Suricata thread gets a hyper thread?
> On 05 Apr 2016, at 21:05, Cooper F. Nelson <cnelson at ucsd.edu> wrote:
>
>> On 4/3/2016 7:33 PM, Viiret, Justin wrote:
>> Hi Cooper,
>>
>> Thanks for all the details. Comments are inline below:
>>
>>> My intuition of why I'm seeing similar performance using the
>>> Hyperscan pattern matcher vs. the 'ac' matcher is because the SIMD
>>> pipeline is shared among hyper-threaded cores.
>>
>> I had a chat to a colleague with deeper architectural knowledge than
>> mine, and he said the following (between dashes):
>>
>> ---- This statement about how the SIMD pipeline is shared among
>> hyper-threaded cores is not correct for modern Intel Architecture
>> post Core 2 – or at least, there is no difference between integer,
>> floating point and SIMD operations in this regard. There is a lot of
>> detail in Chapter 2 of the Intel 64 and IA-32 Architectures
>> Optimization Reference Manual:
>
> Thanks for the feedback/link, I see my error now. As your reference
> mentioned, I think this may have been the case with Intel architectures
> prior to the i7.
>
>> However, you may be correct in essence: a matcher that spends a lot
>> of time waiting for cache misses (or other higher-latency operations)
>> may get more benefit from HT than one that uses the execution
>> resources (whether they are integer, FP or SIMD) intensively, as
>> their operations can be more effectively interleaved.
>
> This is why I often say the RSS implementation of suricata's 'worker'
> runmode is the poster-boy for hyperthreading. Under real-world
> workloads, you get 2x the performance due to all the IO/cache misses
> involved with processing live packets.
>
>> The profile is very interesting -- can you share a similar profile
>> from a version running with the AC matcher? I'm also curious about
>> how you are measuring performance; is this an inline IPS deployment,
>> or something different? Have you measured a "no MPM rules" baseline
>> as well?
>
> Well, that's the thing. It's hard to measure real performance on modern
> super-scalar architectures, as the performance is dependent on lots of
> variables. I/O, cache lines, pipelines, OOE, power-management,
> hyper-threading, etc.
>
> This is an IDS deployment so I basically look at two things. I watch a
> 'top' window and try to make sure the 5-min load average is under 16 at
> peak (16 HT cores) and when I restart the suricata process I check the
> logs and ensure that packet loss is under 1%.
>
> I ran the ac algo last night under a period of lighter load and it shows
> that it uses much more CPU time.
>
>>
>> PerfTop: 61719 irqs/sec kernel:32.6% exact: 93.0% [4000Hz cycles:pp], (all, 16 CPUs)
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> 49.02% suricata [.] SCACSearch
>> 10.20% [kernel] [k] acpi_processor_ffh_cstate_enter
>> 9.77% [kernel] [k] __bpf_prog_run
>> 2.93% suricata [.] BoyerMoore
>> 1.53% suricata [.] IPOnlyMatchPacket
>> 1.51% suricata [.] SigMatchSignatures
>> 1.32% [kernel] [k] __memcpy
>> 1.27% suricata [.] StreamTcp
>
> The system load was still around 14, vs 15-16 at peak using the hs algo.
>
> One thing I have noticed is that the 'hs' matcher doesn't seem to result
> in '0.0' idle times; which is a big win for suricata. When any
> core/thread is 100% utilized is when you start dropping packets. And as
> mentioned, it uses less memory as well.
>
>> The time in the fdr_exec and nfaExec functions constitute the
>> majority of the time in Hyperscan in this profile, so they add up to
>> ~ 15% of runtime -- this looks like a lighter workload than the Web
>> traffic traces we tested with here, but there are a lot of variables
>> that could affect that (different rule set, the BPF filter, overhead
>> of live network scanning vs. our PCAP trace scanning for testing,
>> etc).
>
> We are running a tweaked config for our environment/hardware, especially
> for web traffic.
>
>> One concrete suggestion: you may see some improvement from using
>> Hyperscan 4.1, which has some improvements to the literal matching
>> path. It's available here:
>>
>> https://github.com/01org/hyperscan/releases/tag/v4.1.0
>
> The 'make install' failed, but it looks the libraries built so I just
> copied them over manually. Performance is a little better, pertop
> output copied below. You can infer the relative amount of IP traffic by
> looking at the percentage of CPU time spent running the BPF filter.
>
>>
>> PerfTop: 63306 irqs/sec kernel:39.9% exact: 90.2% [4000Hz cycles:pp], (all, 16 CPUs)
>> ------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> 12.63% [kernel] [k] __bpf_prog_run
>> 12.32% [kernel] [k] acpi_processor_ffh_cstate_enter
>> 7.53% libhs.so.4.1.0 [.] fdr_exec_x86_64_s1_w128
>> 4.20% libhs.so.4.1.0 [.] nfaExecMcClellan16_B
>> 3.04% libc-2.22.so [.] __memset_sse2
>> 2.97% suricata [.] BoyerMoore
>> 2.75% suricata [.] IPOnlyMatchPacket
>> 2.39% suricata [.] SigMatchSignatures
>> 2.05% suricata [.] StreamTcp
>> 1.85% [kernel] [k] __memcpy
>> 1.81% libhs.so.4.1.0 [.] fdr_exec_x86_64_s2_w128
>> 1.43% gzip [.] longest_match
>> 1.35% [ixgbe] [k] ixgbe_configure
>> 1.26% libc-2.22.so [.] vfprintf
>> 1.10% suricata [.] FlowManager
>> 1.07% [kernel] [k] tpacket_rcv
>> 1.06% libpthread-2.22.so [.] pthread_mutex_lock
>> 1.02% [kernel] [k] __memset
>> 0.88% suricata [.] AFPReadFromRing
>> 0.71% suricata [.] FlowGetFlowFromHash
>> 0.68% [kernel] [k] __netif_receive_skb_core
>> 0.68% suricata [.] StreamTcpPacket
>> 0.66% libhs.so.4.1.0 [.] roseBlockExec_i
>
>
> --
> Cooper Nelson
> Network Security Analyst
> UCSD ITS Security Team
> cnelson at ucsd.edu x41042
>
> _______________________________________________
> Suricata IDS Users mailing list: oisf-users at openinfosecfoundation.org
> Site: http://suricata-ids.org | Support: http://suricata-ids.org/support/
> List: https://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users
> Suricata User Conference November 9-11 in Washington, DC: http://oisfevents.net
More information about the Oisf-users
mailing list