<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Also, forgot to mention that that is storing it the dumb way(as a string)... If you stored it in actual representation 16 bytes of hex, it would be less.<div><br></div><div>If you did sorting and had a lookup table/index, that would probably the most efficient way, since you could load the first 4 bytes and do the initial location based off it.</div><div><br><div><div>On Feb 16, 2012, at 10:46 AM, James Pleger wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Even with millions of signatures, the issue is not the size/number of the md5sum file... If you stored it in memory, 1 million md5s is only 33 megs of memory (each hash is 32 bytes followed by a null terminator?).<div><br></div><div>What would be really interesting to me would be the ability to do some async actions off the metadata, such as using the virustotal api to see if there are any hits for that hash, and throwing a delayed alert or something afterwards.</div><div><br><div><div>On Feb 16, 2012, at 10:33 AM, Josh White wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">There would have to be a hard limit as to the number/size of the md5sum file before it's read in, I've got list of known MW that go into the millions. That said, as we go back to the reporting topic again, I still think that a configurable template system is ideal but if not I agree with Martin, JSON for the win. <br>

<br><div class="gmail_quote">On Thu, Feb 16, 2012 at 10:06 AM, Martin Holste <span dir="ltr">&lt;<a href="mailto:mcholste@gmail.com">mcholste@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; position: static; z-index: auto; ">

Ok, so to really make this thing pay off, and without too much effort,<br>

I bet this could be done:<br>

<br>

1. Add a simple config option to suricata.yaml for md5list: some_file_name.txt<br>

2. Read the file into memory at startup<br>

3. Generate an alert if any of the runtime-detected m5's match the list.<br>

4. Re-read the file periodically, (or on Linux, async when the inode changes)<br>

<br>

Done!<br>

<br>

Of course, for some lists of md5's, it may be too big for memory, but<br>

I say too bad for those folks, just start out with this much and<br>

everyone will win. &nbsp;A proper memory/disk trade-off mechanism could be<br>

concocted later to deal with mega-lists of md5's.<br>

<div class="HOEnZb"><div class="h5"><br>

On Thu, Feb 16, 2012 at 8:59 AM, Nikolay Denev &lt;<a href="mailto:ndenev@gmail.com">ndenev@gmail.com</a>&gt; wrote:<br>

&gt; On Feb 16, 2012, at 4:21 PM, Victor Julien wrote:<br>

&gt;<br>

&gt;&gt; So I guess the best development happens when you're actually doing<br>

&gt;&gt; boring stuff and you allow yourself to spend 30 minutes on a hunch. Of<br>

&gt;&gt; course the 30 minutes becomes a couple of hours, but who cares :)<br>

&gt;&gt;<br>

&gt;&gt; Anyway, the hunch here was integrating libnss' md5 calculation code into<br>

&gt;&gt; the Suricata file inspection/extraction code, calculating the md5<br>

&gt;&gt; checksum of files on the fly.<br>

&gt;&gt;<br>

&gt;&gt; Turns out it works and at decent speeds too. In a test pcap I extract<br>

&gt;&gt; 8393 files in 16.9 seconds. With md5 on the fly it's 17.6 seconds.<br>

&gt;&gt; Sounds acceptable, no?<br>

&gt;&gt;<br>

&gt;&gt; Right now all I have is writing the md5 to the .meta file, like so:<br>

&gt;&gt;<br>

&gt;&gt; TIME: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;10/02/2009-21:35:10.556990<br>

&gt;&gt; PCAP PKT NUM: &nbsp; &nbsp; &nbsp;6225<br>

&gt;&gt; SRC IP: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;61.191.61.40<br>

&gt;&gt; DST IP: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;192.168.2.7<br>

&gt;&gt; PROTO: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 6<br>

&gt;&gt; SRC PORT: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;80<br>

&gt;&gt; DST PORT: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1091<br>

&gt;&gt; FILENAME: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;/ww/aa7.exe<br>

&gt;&gt; MAGIC: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; PE32 executable for MS Windows (GUI) Intel 80386 32-bit<br>

&gt;&gt; STATE: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; CLOSED<br>

&gt;&gt; MD5: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; e148eaaadceecb2e3e25fd25809cb5db<br>

&gt;&gt; SIZE: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;25712<br>

&gt;&gt;<br>

&gt;&gt; But obviously this needs to be made available to the rule language. I<br>

&gt;&gt; was thinking a simple filemd5 keyword to start, allowing matching on<br>

&gt;&gt; single md5's. But the real value is probably in a keyword that allows<br>

&gt;&gt; you to check an entire db of md5's all at once. I'm sure there are ppl<br>

&gt;&gt; sitting on large collections of known bad md5.<br>

&gt;&gt;<br>

&gt;&gt; Does this all make sense? Any other ideas?<br>

&gt;&gt;<br>

&gt;&gt; --<br>

&gt;&gt; ---------------------------------------------<br>

&gt;&gt; Victor Julien<br>

&gt;&gt; <a href="http://www.inliniac.net/" target="_blank">http://www.inliniac.net/</a><br>

&gt;&gt; PGP: <a href="http://www.inliniac.net/victorjulien.asc" target="_blank">http://www.inliniac.net/victorjulien.asc</a><br>

&gt;&gt; ---------------------------------------------<br>

&gt;&gt;<br>

&gt;<br>

&gt; Very cool!<br>

&gt;<br>

&gt; This seems that can be also used to provide DLP functionality, i.e. keep a database with md5 checksums of files with sensitive data and<br>

&gt; alert if the data is leaked (regardless of filename). I've heard at least of one DLP vendor that uses similar method to detect unauthorized data leaks.<br>

&gt;<br>

&gt;<br>

&gt; _______________________________________________<br>

&gt; Oisf-devel mailing list<br>

&gt; <a href="mailto:Oisf-devel@openinfosecfoundation.org">Oisf-devel@openinfosecfoundation.org</a><br>

&gt; <a href="http://lists.openinfosecfoundation.org/mailman/listinfo/oisf-devel" target="_blank">http://lists.openinfosecfoundation.org/mailman/listinfo/oisf-devel</a><br>

_______________________________________________<br>

Oisf-devel mailing list<br>

<a href="mailto:Oisf-devel@openinfosecfoundation.org">Oisf-devel@openinfosecfoundation.org</a><br>

<a href="http://lists.openinfosecfoundation.org/mailman/listinfo/oisf-devel" target="_blank">http://lists.openinfosecfoundation.org/mailman/listinfo/oisf-devel</a><br>

<br>

</div></div></blockquote></div><br>

_______________________________________________<br>Oisf-devel mailing list<br><a href="mailto:Oisf-devel@openinfosecfoundation.org">Oisf-devel@openinfosecfoundation.org</a><br><a href="http://lists.openinfosecfoundation.org/mailman/listinfo/oisf-devel">http://lists.openinfosecfoundation.org/mailman/listinfo/oisf-devel</a></blockquote></div><br></div></div></blockquote></div><br></div></body></html>