[Oisf-devel] filemd5?

Thu Feb 16 11:59:20 EST 2012

Right, that's what I was thinking.  Compared to the huge amount of RAM
most are using already for pattern matching, a large list of md5's
shouldn't be too bad.  (And yes, I do happen to have 144 GB of memory
in my boxes!)

Regarding the Virustotal stuff, absolutely, though I don't think that
should be OISF's job to code.  That's a great place to put a script to
asynchronously handle the output from Suricata.  That's why a JSON
output would be perfect for piping to something that can do all of the
heavy-lifting and custom stuff in a script.  CIF, Virustotal, Cuckoo,
DLP--those are all easy tasks if you've got an ever-growing JSON
stream of md5's.

On Thu, Feb 16, 2012 at 9:56 AM, James Pleger <jpleger at gmail.com> wrote:
> Also, forgot to mention that that is storing it the dumb way(as a string)...
> If you stored it in actual representation 16 bytes of hex, it would be less.
>
> If you did sorting and had a lookup table/index, that would probably the
> most efficient way, since you could load the first 4 bytes and do the
> initial location based off it.
>
> On Feb 16, 2012, at 10:46 AM, James Pleger wrote:
>
> Even with millions of signatures, the issue is not the size/number of the
> md5sum file... If you stored it in memory, 1 million md5s is only 33 megs of
> memory (each hash is 32 bytes followed by a null terminator?).
>
> What would be really interesting to me would be the ability to do some async
> actions off the metadata, such as using the virustotal api to see if there
> are any hits for that hash, and throwing a delayed alert or something
> afterwards.
>
> On Feb 16, 2012, at 10:33 AM, Josh White wrote:
>
> There would have to be a hard limit as to the number/size of the md5sum file
> before it's read in, I've got list of known MW that go into the millions.
> That said, as we go back to the reporting topic again, I still think that a
> configurable template system is ideal but if not I agree with Martin, JSON
> for the win.
>
> On Thu, Feb 16, 2012 at 10:06 AM, Martin Holste <mcholste at gmail.com> wrote:
>>
>> Ok, so to really make this thing pay off, and without too much effort,
>> I bet this could be done:
>>
>> 1. Add a simple config option to suricata.yaml for md5list:
>> some_file_name.txt
>> 2. Read the file into memory at startup
>> 3. Generate an alert if any of the runtime-detected m5's match the list.
>> 4. Re-read the file periodically, (or on Linux, async when the inode
>> changes)
>>
>> Done!
>>
>> Of course, for some lists of md5's, it may be too big for memory, but
>> I say too bad for those folks, just start out with this much and
>> everyone will win.  A proper memory/disk trade-off mechanism could be
>> concocted later to deal with mega-lists of md5's.
>>
>> On Thu, Feb 16, 2012 at 8:59 AM, Nikolay Denev <ndenev at gmail.com> wrote:
>> > On Feb 16, 2012, at 4:21 PM, Victor Julien wrote:
>> >
>> >> So I guess the best development happens when you're actually doing
>> >> boring stuff and you allow yourself to spend 30 minutes on a hunch. Of
>> >> course the 30 minutes becomes a couple of hours, but who cares :)
>> >>
>> >> Anyway, the hunch here was integrating libnss' md5 calculation code
>> >> into
>> >> the Suricata file inspection/extraction code, calculating the md5
>> >> checksum of files on the fly.
>> >>
>> >> Turns out it works and at decent speeds too. In a test pcap I extract
>> >> 8393 files in 16.9 seconds. With md5 on the fly it's 17.6 seconds.
>> >> Sounds acceptable, no?
>> >>
>> >> Right now all I have is writing the md5 to the .meta file, like so:
>> >>
>> >> TIME:              10/02/2009-21:35:10.556990
>> >> PCAP PKT NUM:      6225
>> >> SRC IP:            61.191.61.40
>> >> DST IP:            192.168.2.7
>> >> PROTO:             6
>> >> SRC PORT:          80
>> >> DST PORT:          1091
>> >> FILENAME:          /ww/aa7.exe
>> >> MAGIC:             PE32 executable for MS Windows (GUI) Intel 80386
>> >> 32-bit
>> >> STATE:             CLOSED
>> >> MD5:               e148eaaadceecb2e3e25fd25809cb5db
>> >> SIZE:              25712
>> >>
>> >> But obviously this needs to be made available to the rule language. I
>> >> was thinking a simple filemd5 keyword to start, allowing matching on
>> >> single md5's. But the real value is probably in a keyword that allows
>> >> you to check an entire db of md5's all at once. I'm sure there are ppl
>> >> sitting on large collections of known bad md5.
>> >>
>> >> Does this all make sense? Any other ideas?
>> >>
>> >> --
>> >> ---------------------------------------------
>> >> Victor Julien
>> >> http://www.inliniac.net/
>> >> PGP: http://www.inliniac.net/victorjulien.asc
>> >> ---------------------------------------------
>> >>
>> >
>> > Very cool!
>> >
>> > This seems that can be also used to provide DLP functionality, i.e. keep
>> > a database with md5 checksums of files with sensitive data and
>> > alert if the data is leaked (regardless of filename). I've heard at
>> > least of one DLP vendor that uses similar method to detect unauthorized data
>> > leaks.
>> >
>> >
>> > _______________________________________________
>> > Oisf-devel mailing list
>> > Oisf-devel at openinfosecfoundation.org
>> > http://lists.openinfosecfoundation.org/mailman/listinfo/oisf-devel
>> _______________________________________________
>> Oisf-devel mailing list
>> Oisf-devel at openinfosecfoundation.org
>> http://lists.openinfosecfoundation.org/mailman/listinfo/oisf-devel
>>
>
> _______________________________________________
> Oisf-devel mailing list
> Oisf-devel at openinfosecfoundation.org
> http://lists.openinfosecfoundation.org/mailman/listinfo/oisf-devel
>
>
>