[Oisf-users] Testers: please test our initial Hyperscan support

Fri Apr 1 17:53:33 UTC 2016

Comments inline:

On 3/31/2016 11:57 PM, Viiret, Justin wrote:
> Hi Cooper,
> 
> I'm not an expert on the dev-detect-grouping work, so I'll leave your
> first question for others to answer, but I am interested in your
> performance results:

I should be clear that this is currently 'old' hardware, so I suspect
the newer AVX enabled Xeons perform much better.  This is what we are
running:

> model name      : Intel(R) Xeon(R) CPU           X5560  @ 2.80GHz

>> So far performance seems identical to the prior
>> "dev-detect-grouping" branch, with the caveat that memory usage is
>> currently lower (by 50% currently).  I'll leave it running
>> overnight and see if that changes.
> 
> Memory usage may be lower for a couple of reasons:
> 
> * For a given set of patterns, Hyperscan may be able to build a
> smaller matcher that constructed by other MPM algorithms; * When it's
> asked to build a previously constructed pattern set again, the
> caching code in util-mpm-hs.c will reuse a previously constructed
> database.

Ok that sounds about right.  After running overnight memory usage is
about 20% less than previous.  It's also stable at ~70% of system memory
vs ~90%.

> On master, with sgh-mpm-context set to "full", we saw many such
> duplicate MPM contexts being constructed. This may or may not be the
> case on the dev-detect-grouping branch -- I haven't looked at it yet.
> The default of "auto" or "single" only builds a small number of
> contexts and no duplicates.

I have that enabled as 'auto' currently.

> Actual performance will vary, of course, depending on both the
> traffic and the rule set. Do you have a feel for how much time you're
> spending in actual MPM pattern matching, perhaps by pointing a
> sampling profiler at Suricata?

I ran "perf top" again, output copied below.  The ACPI event is a new
thing, but googling shows this may be a bug with how perf reports CPU
idle events (would appreciate more info if anyone has it).

The __bpf_prog_run is due to a complex BPF filter I run in order to
reduce the number of packets sent to the suricata process (the current
hardware is over-subscribed),

My intuition of why I'm seeing similar performance using the Hyperscan
pattern matcher vs. the 'ac' matcher is because the SIMD pipeline is
shared among hyper-threaded cores.  So while its possible for the 'ac'
matcher to run two MPM context searches concurrently via two threads on
a single physical core, the 'hs' matcher has to run them sequentially
given the shared SIMD/SSE execution unit.  So despite being technically
more efficient via instruction-level parallelism, the end result for
older SIMD implementations will be similar given the loss of
thread-level parallelism.

>     13.07%  [kernel]            [k] __bpf_prog_run
>     11.77%  [kernel]            [k] acpi_processor_ffh_cstate_enter
>      7.73%  libhs.so.4.0.1      [.] fdr_exec_x86_64_d13_s1_w128
>      4.47%  libhs.so.4.0.1      [.] nfaExecMcClellan16_B
>      2.82%  libc-2.22.so        [.] __memset_sse2
>      2.80%  suricata            [.] IPOnlyMatchPacket
>      2.40%  suricata            [.] BoyerMoore
>      2.38%  libhs.so.4.0.1      [.] fdr_exec_x86_64_d12_s2_w128
>      2.35%  suricata            [.] SigMatchSignatures
>      2.00%  suricata            [.] StreamTcp
>      1.76%  [kernel]            [k] __memcpy
>      1.45%  gzip                [.] longest_match
>      1.42%  suricata            [.] FlowManager
>      1.40%  [ixgbe]             [k] ixgbe_configure
>      1.20%  libc-2.22.so        [.] vfprintf
>      1.09%  [kernel]            [k] tpacket_rcv
>      1.06%  libpthread-2.22.so  [.] pthread_mutex_lock
>      1.00%  [kernel]            [k] __memset
>      0.93%  suricata            [.] AFPReadFromRing
>      0.79%  libhs.so.4.0.1      [.] roseBlockExec_i
>      0.71%  suricata            [.] StreamTcpPacket
>      0.68%  libpthread-2.22.so  [.] __pthread_mutex_unlock_usercnt
>      0.66%  [kernel]            [k] __netif_receive_skb_core
>      0.64%  suricata            [.] FlowGetFlowFromHash
>      0.55%  libc-2.22.so        [.] _int_malloc
>      0.52%  gzip                [.] deflate

> Best regards, jv _______________________________________________ 
> Suricata IDS Users mailing list:
> oisf-users at openinfosecfoundation.org Site: http://suricata-ids.org |
> Support: http://suricata-ids.org/support/ List:
> https://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users 
> Suricata User Conference November 9-11 in Washington, DC:
> http://oisfevents.net
> 

-- 
Cooper Nelson
Network Security Analyst
UCSD ITS Security Team
cnelson at ucsd.edu x41042