[Oisf-users] Suricata, 10k rules, 10Gbit/sec and lots of RAM

Wed Dec 9 20:11:50 UTC 2015

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/9/2015 5:36 AM, Victor Julien wrote:
> Our main performance hit in the multi pattern matching (mpm) stage.
> We've used a skip based algorithm in the past (b2g is still in our
> tree), but performance with AC is quite a lot better. Generally the
> problem for IDS patterns is that they are of poor quality, many 1 and
> 2 byte patterns. These defeat the skip based algo's. Another issue
> that is important to us is the worst-case performance. The skip based
> algo's seem to have a worse worst case profile.

As I was walking to the pub last night I remembered that suricata has
migrated to AC some time ago!  Thanks for the details regardless, it's
very interesting.

> Btw, I recently saw a new paper on a mix of AC and skip based approach
> that I still have to take a deeper look at:
> http://halcyon.usc.edu/~pk/prasannawebsite/papers/HeadBody_camera.pdf

- From the paper:

> In [14], a throughput of 7.5 Gbps was achieved
> using 32 processors in a Cray XMT supercomputer. There
> is yet a cost-efficient DBSM solution capable of matching
> 10 Gbps traffic against several thousand strings on a multicore
> platform.

I've been doing this for years with suricata and a small bpf filter, on
a 16 core (actually 8 w/hyper-threading) Xeon server.  Over 20k
signatures, too.  I will admit the EmergingThreats guys do a fabulous
job of optimizing their signatures for efficiency.

As I mentioned previously the actual suricata process is only processing
a fraction of the original packets, but if you are primarily interested
in matching against HTTP headers I don't particularly see the value of a
full DPI solution.  Particularly when you allow services like Netflix
and Youtube on your network.

The real bottleneck on all modern multi-core Von-Neumann style
architectures is memory (particularly cache memory) I/O.  So this is
less of problem with the performance of the pattern-matching engine as
it's an issue with memory pressure put on the various core sub-systems
by attempting to match against full TCP flows.  The authors allude to
this at points, however I think if they ran better performance counters
this would be more obvious.

The tl;dr is that what they are discussing *is* possible if you
pre-process your IP traffic via an efficient byte-based pattern matcher
like bpf.  This is why packet filters were invented, in fact.

I guess its possible that they are already working with sampled traffic,
but I doubt it.

> Finally, we should start experimenting with Intel's Hyperscan soon.
> They claim much better perf, so we will see :)

Ok now this is interesting and a new thing for me.  My next question for
you was if you were still looking at using SSE for pattern-matching.
Especially in the context of Aho-Corasick, as I would think it would be
possible to analyze multiple flows/packets/patterns in parallel via a
SIMD approach.  Great to see this is open-source, too.

There is a concern that SSE breaks hyperthreading to an extent, in that
hyper-threaded cores share a single FP/SSE execution pipeline.  However,
I would think the performance benefits afforded by vectorizing the
regexp process would exceed any losses incurred by losing 1-2
traditional integer pipelines.

Anyways, this if fabulously exciting and would be willing and able to
help test this once available.

- -- 
Cooper Nelson
Network Security Analyst
UCSD ACT Security Team
cnelson at ucsd.edu x41042
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJWaIsGAAoJEKIFRYQsa8FWvzwIAJ5/HNQGg5F9IcQRGKOsqsAt
xiJuO/Dzh9Z6dfzs61W1sxeV0iSJzJPTnfu19yaioo4m5e2sBXOkvpqdCcmFh3SI
KFSEg8qay4rfM845/j98BAUYeHuyrJP51Ic3rH3FQe3gbsmyW/73pExfsZj5CEB3
cColbpVawRdYZcEfo7/FKw4diDjwuH3uX9ClxQLq46Mjf+gT35+iUwQ/bDI23pC+
vSiKjhJuAHrjWSiOzqe8gIMY0CGqCtGDZnO6hGKZO2bnYnzgzUl7DpM/LoNW1fRq
Y469HVGJgIzitDVBLKCwyrI1r4lkPd1PjzdNwdGohaAwN0uY4FzpRtRkl7l13yE=
=X6M+
-----END PGP SIGNATURE-----