There would have to be a hard limit as to the number/size of the md5sum file before it&#39;s read in, I&#39;ve got list of known MW that go into the millions. That said, as we go back to the reporting topic again, I still think that a configurable template system is ideal but if not I agree with Martin, JSON for the win. <br>

<br><div class="gmail_quote">On Thu, Feb 16, 2012 at 10:06 AM, Martin Holste <span dir="ltr">&lt;<a href="mailto:mcholste@gmail.com">mcholste@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Ok, so to really make this thing pay off, and without too much effort,<br>

I bet this could be done:<br>

<br>

1. Add a simple config option to suricata.yaml for md5list: some_file_name.txt<br>

2. Read the file into memory at startup<br>

3. Generate an alert if any of the runtime-detected m5&#39;s match the list.<br>

4. Re-read the file periodically, (or on Linux, async when the inode changes)<br>

<br>

Done!<br>

<br>

Of course, for some lists of md5&#39;s, it may be too big for memory, but<br>

I say too bad for those folks, just start out with this much and<br>

everyone will win.  A proper memory/disk trade-off mechanism could be<br>

concocted later to deal with mega-lists of md5&#39;s.<br>

<div class="HOEnZb"><div class="h5"><br>

On Thu, Feb 16, 2012 at 8:59 AM, Nikolay Denev &lt;<a href="mailto:ndenev@gmail.com">ndenev@gmail.com</a>&gt; wrote:<br>

&gt; On Feb 16, 2012, at 4:21 PM, Victor Julien wrote:<br>

&gt;<br>

&gt;&gt; So I guess the best development happens when you&#39;re actually doing<br>

&gt;&gt; boring stuff and you allow yourself to spend 30 minutes on a hunch. Of<br>

&gt;&gt; course the 30 minutes becomes a couple of hours, but who cares :)<br>

&gt;&gt;<br>

&gt;&gt; Anyway, the hunch here was integrating libnss&#39; md5 calculation code into<br>

&gt;&gt; the Suricata file inspection/extraction code, calculating the md5<br>

&gt;&gt; checksum of files on the fly.<br>

&gt;&gt;<br>

&gt;&gt; Turns out it works and at decent speeds too. In a test pcap I extract<br>

&gt;&gt; 8393 files in 16.9 seconds. With md5 on the fly it&#39;s 17.6 seconds.<br>

&gt;&gt; Sounds acceptable, no?<br>

&gt;&gt;<br>

&gt;&gt; Right now all I have is writing the md5 to the .meta file, like so:<br>

&gt;&gt;<br>

&gt;&gt; TIME:              10/02/2009-21:35:10.556990<br>

&gt;&gt; PCAP PKT NUM:      6225<br>

&gt;&gt; SRC IP:            61.191.61.40<br>

&gt;&gt; DST IP:            192.168.2.7<br>

&gt;&gt; PROTO:             6<br>

&gt;&gt; SRC PORT:          80<br>

&gt;&gt; DST PORT:          1091<br>

&gt;&gt; FILENAME:          /ww/aa7.exe<br>

&gt;&gt; MAGIC:             PE32 executable for MS Windows (GUI) Intel 80386 32-bit<br>

&gt;&gt; STATE:             CLOSED<br>

&gt;&gt; MD5:               e148eaaadceecb2e3e25fd25809cb5db<br>

&gt;&gt; SIZE:              25712<br>

&gt;&gt;<br>

&gt;&gt; But obviously this needs to be made available to the rule language. I<br>

&gt;&gt; was thinking a simple filemd5 keyword to start, allowing matching on<br>

&gt;&gt; single md5&#39;s. But the real value is probably in a keyword that allows<br>

&gt;&gt; you to check an entire db of md5&#39;s all at once. I&#39;m sure there are ppl<br>

&gt;&gt; sitting on large collections of known bad md5.<br>

&gt;&gt;<br>

&gt;&gt; Does this all make sense? Any other ideas?<br>

&gt;&gt;<br>

&gt;&gt; --<br>

&gt;&gt; ---------------------------------------------<br>

&gt;&gt; Victor Julien<br>

&gt;&gt; <a href="http://www.inliniac.net/" target="_blank">http://www.inliniac.net/</a><br>

&gt;&gt; PGP: <a href="http://www.inliniac.net/victorjulien.asc" target="_blank">http://www.inliniac.net/victorjulien.asc</a><br>

&gt;&gt; ---------------------------------------------<br>

&gt;&gt;<br>

&gt;<br>

&gt; Very cool!<br>

&gt;<br>

&gt; This seems that can be also used to provide DLP functionality, i.e. keep a database with md5 checksums of files with sensitive data and<br>

&gt; alert if the data is leaked (regardless of filename). I&#39;ve heard at least of one DLP vendor that uses similar method to detect unauthorized data leaks.<br>

&gt;<br>

&gt;<br>

&gt; _______________________________________________<br>

&gt; Oisf-devel mailing list<br>

&gt; <a href="mailto:Oisf-devel@openinfosecfoundation.org">Oisf-devel@openinfosecfoundation.org</a><br>

&gt; <a href="http://lists.openinfosecfoundation.org/mailman/listinfo/oisf-devel" target="_blank">http://lists.openinfosecfoundation.org/mailman/listinfo/oisf-devel</a><br>

_______________________________________________<br>

Oisf-devel mailing list<br>

<a href="mailto:Oisf-devel@openinfosecfoundation.org">Oisf-devel@openinfosecfoundation.org</a><br>

<a href="http://lists.openinfosecfoundation.org/mailman/listinfo/oisf-devel" target="_blank">http://lists.openinfosecfoundation.org/mailman/listinfo/oisf-devel</a><br>

<br>

</div></div></blockquote></div><br>