[Discussion] rule profiling (was: Re: Non-tokenized preprocessor parameter lines)

Martin Holste mcholste at gmail.com
Fri Feb 13 00:39:23 UTC 2009


Wow, sounds like you're the guy to talk to on this stuff (for the record, I
was counting you as one of the people who understood the state transition
tables).  What do you mean by rule groups?  I'd love to know how you broke
the emerging-all.rules into 50mb.  Running with ac-std takes over 2gb
normally, and I think ac takes well over 4gb, but I've never used it.  Is a
longer match always better?  Is there a threshold at which a pattern (say
100 or so bytes long) is too unwieldy and thus creates a "sweet spot?"

The frequency of matching was kind of what I was getting at the other day on
the ET list regarding the use of the Snort HTTP preproc versus the straight
pattern matcher because I was trying to figure out if the HTTP preproc
(itself having already searched for terms like "POST" and "GET") would allow
us to significantly reduce the load over sigs which use things like
content:"GET"; distance:5;

So, in a brand-new design for a pattern matcher, how can we take advantage
of the fact that we know certain strings will be searched on and hit much
more frequently than others?  Would separating that into a separate thread
provide any advantage?  Perhaps it could become like a mini-barnyard kind of
situation in which it spits out much, much more traffic than alerting but
still only a fraction of the overall throughput. As in, an HTTP preprocessor
that dumps HTTP field streams without doing further app level pattern
matching.  That trims the workload down substantially for another process,
operating barnyard style, to come through and do higher-level matching and
logic.  I think writing to disk barnyard-style would be fairly out of the
question performance-wise, but maybe not writing to a socket where an
entirely different process can read from.  Snort doesn't allow the preproc
to cross CPU's, so the resources are all coming from the same pool.  If your
HTTP preproc had a dedicated CPU, then the cache hits would be extremely
high since it would only search for a few patterns.  If you did it right, I
bet you could get almost ASIC or FPGA-level performance for URI content
searches on a dedicated CPU.

On Thu, Feb 12, 2009 at 2:26 AM, Victor Julien <lists at inliniac.net> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Martin Holste wrote:
> > I've done a little bit of work with rule profiling enough to realize
> > that I need help from someone who understands Aho-Corasick better so
> > that I can more accurately figure out what the load would be on the
> > detection engine.  (I posted something to this effect on the Emerging
> > Threats list last week.)  One real performance factor, as far as I can
> > tell, would be to see if the content matching already appears in another
> > rule.  If that's true, then the effective load of adding that rule could
> > be negligible, since the detection engine's effective load doesn't
> > actually increase.  So, a tool which takes the entire ruleset into
> > account would be very helpful.  I know that Sourcefire kind of already
> > does this in Snort in a few places, but I think the number of people who
> > understand the information from the state transition table reports could
> > be counted on one hand (judging from the lack of comments on the ET
> > list).  That information needs to be wrapped into a larger profiler.
>
> I'll add in some more variables that matter a lot:
>
> Number of patterns. All pattern matcher algorithms slow down if the
> number of patterns increases. This is because the match verification
> process gets more expensive as hash tables will fill up (using a 8bit
> alphabet in a 2gram algorithm results in a hash of max 65536). Usually
> switching from 2gram to 3 (or more) gram versions of algorithms helps,
> however at a cost of using more memory & higher computation overhead.
> That influences performance quite a bit, probably because of more memory
> meaning more CPU cache misses. Usually the average pattern length goes
> down too.
>
> The minimum largest pattern size. Most pattern matcher algorithms
> perform best with longer patterns since the matcher can step through the
> searched data in bigger steps. So if all your sigs have at least a
> pattern length of 8 and you add in one of length 3 your performance hit
> is going to be bigger than when there are already sigs with smaller
> patterns.
>
> Similar rules are grouped together so safe memory on (among others)
> pattern matcher contexts. A bad rule will only influence the groups it's
> in. But an even worse rule can end up in a lot of contexts. (using the
> emerging-all.rules file I can easily have the engine use a few gigabytes
> of ram, but using the grouping I have slimmed it down to about 50mb.
> Guess which performs better? The smaller one :))
>
> CPU cache size & memory bus speeds seems to make a big difference too.
> In my code I've implemented the (simple) BNDM algorithm, both in a
> 2-gram and 3-gram version. On a Core2duo T5500 (2mb cache) a 2-gram
> sBNDM performed best, on Core2duo E6600 (4mb cache) a 3-gram BNDM. On my
> gateway box, P3 500mhz 512mb cache, again the 2-gram sBNDM, but with
> different hash table sizes and stuff...
>
> One thing that also influences performance is the likelihood of a match
> because after the pattern matcher suspects a match, it has to be
> verified by the detection engine.  For example I think HTTP keywords,
> HTML stuff, SMTP commands, etc, etc, all have a bigger likelihood of
> matching and thus are more expensive... maybe a blacklist could help
> there. Any pattern on that list would be classified as more expensive...
>
> Regards,
> Victor
>
> - --
> - ---------------------------------------------
> Victor Julien
> http://www.inliniac.net/
> PGP: http://www.inliniac.net/victorjulien.asc
> - ---------------------------------------------
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iEYEARECAAYFAkmT3SAACgkQiSMBBAuniMeyRgCfe0nDEaM+qzCpWQ1V2aCfotzp
> hOUAn14sw9BI3LZGCRYLKh9dFB6oiubV
> =kZSl
> -----END PGP SIGNATURE-----
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openinfosecfoundation.org/pipermail/discussion/attachments/20090212/23f7d75f/attachment-0002.html>


More information about the Discussion mailing list