[Discussion] rule profiling

Fri Feb 13 09:36:55 UTC 2009

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Holste wrote:
> Wow, sounds like you're the guy to talk to on this stuff (for the
> record, I was counting you as one of the people who understood the state
> transition tables).

I don't consider myself to be an expert, but ever since I'm creating
these things and reading papers about them I'm learning a great deal :)

>  What do you mean by rule groups?  I'd love to know
> how you broke the emerging-all.rules into 50mb.  Running with ac-std
> takes over 2gb normally, and I think ac takes well over 4gb, but I've
> never used it.

In the simplest case we would inspect every packet against every
signature. Thus we would put all patterns in one pattern matcher and
work with that. That doesn't make much sense though, as we know (based
on ports, src and dst settings (think $HOME_NET, etc)) that we're not
interested in many of the sigs. So grouping sigs that are similar makes
sure we inspect less sigs. Example:

alert tcp $HOME_NET any -> any 80 (content:"abc", sid:1)
alert tcp $HOME_NET any -> 1.2.3.4 80 (content:"def", sid:2)
alert tcp $HOME_NET any -> any 80 (content:"ghi", sid:3)

Here we could create 3 sig groups:
tcp, $HOME_NET, any, 0.0.0.0-1.2.3.3:         sids: 1,3
tcp, $HOME_NET, any, 1.2.3.4:                 sids: 1,2,3
tcp, $HOME_NET, any, 1.2.3.5-255.255.255.255: sids: 1,3

When we receive a packet we look up the group that matches. Since we
want to look up only one group, we merge any overlapping group.

In the above exaple sid:2 would only be inspected if the dst addr of a
packet is 1.2.3.4.

This grouping can be taken as far as you want. Protocol, src ip, dst ip,
src port, dst port, dsize, flow status, etc. However, you will easily
end up with many tens of thousands of groups. Each contains a list of
the sigs that will need to be inspected, each has a pattern matcher
context, etc. So that will make memory requirements explode.

While intuitively you'd think it's better to have as many groups as
possible (and thus inspect as little sigs as possible), it doesn't work
that way. I think this is caused by the increased memory usage.

(a lot of memory usage can be reduced by sharing the groups & pattern
matcher contexts. Many groups in practice end up with the exact same
sigs or patterns. Those can share memory. But even then memory usage can
be quite high)

>  Is a longer match always better?  Is there a threshold
> at which a pattern (say 100 or so bytes long) is too unwieldy and thus
> creates a "sweet spot?"

I guess there is a point that longer isn't better, but it's not likely
we'll reach that point in IDS.

> The frequency of matching was kind of what I was getting at the other
> day on the ET list regarding the use of the Snort HTTP preproc versus
> the straight pattern matcher because I was trying to figure out if the
> HTTP preproc (itself having already searched for terms like "POST" and
> "GET") would allow us to significantly reduce the load over sigs which
> use things like content:"GET"; distance:5;

I have no idea about this, I'd be interested in hearing it though.

> So, in a brand-new design for a pattern matcher, how can we take
> advantage of the fact that we know certain strings will be searched on
> and hit much more frequently than others?

One thing I'm thinking about is to group expensive sigs together as much
as possible, so the other groups won't suffer from them...

>  Would separating that into a
> separate thread provide any advantage?

Interesting idea. Maybe we can get the expensive sig groups to be
handled by a different thread than the others. I have to think about this...

>  Perhaps it could become like a
> mini-barnyard kind of situation in which it spits out much, much more
> traffic than alerting but still only a fraction of the overall
> throughput. As in, an HTTP preprocessor that dumps HTTP field streams
> without doing further app level pattern matching.  That trims the
> workload down substantially for another process, operating barnyard
> style, to come through and do higher-level matching and logic.  I think
> writing to disk barnyard-style would be fairly out of the question
> performance-wise, but maybe not writing to a socket where an entirely
> different process can read from.  Snort doesn't allow the preproc to
> cross CPU's, so the resources are all coming from the same pool.  If
> your HTTP preproc had a dedicated CPU, then the cache hits would be
> extremely high since it would only search for a few patterns.  If you
> did it right, I bet you could get almost ASIC or FPGA-level performance
> for URI content searches on a dedicated CPU.

One thing I've been dreaming about is having something like ModSecurity
be able to take data from us, maybe using a socket or pipe. That way it
could be a separate process or maybe even be running on a separate box...

Regards,
Victor

- --
- ---------------------------------------------
Victor Julien
http://www.inliniac.net/
PGP: http://www.inliniac.net/victorjulien.asc
- ---------------------------------------------

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkmVPzUACgkQiSMBBAuniMctjACfb2fTa0BKWCb3YB6V1y+aEpaG
+uEAnjdLVfC4U1TQjOWe0rVasjA1gTBh
=6Nna
-----END PGP SIGNATURE-----