[Oisf-users] Suricata, 10k rules, 10Gbit/sec and lots of RAM
Cooper F. Nelson
cnelson at ucsd.edu
Wed Dec 9 20:11:50 UTC 2015
-----BEGIN PGP SIGNED MESSAGE-----
On 12/9/2015 5:36 AM, Victor Julien wrote:
> Our main performance hit in the multi pattern matching (mpm) stage.
> We've used a skip based algorithm in the past (b2g is still in our
> tree), but performance with AC is quite a lot better. Generally the
> problem for IDS patterns is that they are of poor quality, many 1 and
> 2 byte patterns. These defeat the skip based algo's. Another issue
> that is important to us is the worst-case performance. The skip based
> algo's seem to have a worse worst case profile.
As I was walking to the pub last night I remembered that suricata has
migrated to AC some time ago! Thanks for the details regardless, it's
> Btw, I recently saw a new paper on a mix of AC and skip based approach
> that I still have to take a deeper look at:
- From the paper:
> In , a throughput of 7.5 Gbps was achieved
> using 32 processors in a Cray XMT supercomputer. There
> is yet a cost-efficient DBSM solution capable of matching
> 10 Gbps traffic against several thousand strings on a multicore
I've been doing this for years with suricata and a small bpf filter, on
a 16 core (actually 8 w/hyper-threading) Xeon server. Over 20k
signatures, too. I will admit the EmergingThreats guys do a fabulous
job of optimizing their signatures for efficiency.
As I mentioned previously the actual suricata process is only processing
a fraction of the original packets, but if you are primarily interested
in matching against HTTP headers I don't particularly see the value of a
full DPI solution. Particularly when you allow services like Netflix
and Youtube on your network.
The real bottleneck on all modern multi-core Von-Neumann style
architectures is memory (particularly cache memory) I/O. So this is
less of problem with the performance of the pattern-matching engine as
it's an issue with memory pressure put on the various core sub-systems
by attempting to match against full TCP flows. The authors allude to
this at points, however I think if they ran better performance counters
this would be more obvious.
The tl;dr is that what they are discussing *is* possible if you
pre-process your IP traffic via an efficient byte-based pattern matcher
like bpf. This is why packet filters were invented, in fact.
I guess its possible that they are already working with sampled traffic,
but I doubt it.
> Finally, we should start experimenting with Intel's Hyperscan soon.
> They claim much better perf, so we will see :)
Ok now this is interesting and a new thing for me. My next question for
you was if you were still looking at using SSE for pattern-matching.
Especially in the context of Aho-Corasick, as I would think it would be
possible to analyze multiple flows/packets/patterns in parallel via a
SIMD approach. Great to see this is open-source, too.
There is a concern that SSE breaks hyperthreading to an extent, in that
hyper-threaded cores share a single FP/SSE execution pipeline. However,
I would think the performance benefits afforded by vectorizing the
regexp process would exceed any losses incurred by losing 1-2
traditional integer pipelines.
Anyways, this if fabulously exciting and would be willing and able to
help test this once available.
Network Security Analyst
UCSD ACT Security Team
cnelson at ucsd.edu x41042
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)
-----END PGP SIGNATURE-----
More information about the Oisf-users