Oh happy day.... :-) This will be great for getting binaries off network for things like exploit kits and so on. I think having the file is essential; to confirm exploits, to see if you detect something as malware and if not how you could fix that if you have a problem where something is on a machine. While grabbing files off network good and do other stuff to it to try and get out what is bad being able to say extract files dropped after a Java exploit using those sigs or a file sent from an exploit kit will be very useful.<br>

<br>Great work. <br><br><br><div class="gmail_quote">On 29 November 2011 16:54, Victor Julien <span dir="ltr"><<a href="mailto:victor@inliniac.net">victor@inliniac.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

>From my blog:<br>

<a href="http://www.inliniac.net/blog/2011/11/29/file-extraction-in-suricata.html" target="_blank">http://www.inliniac.net/blog/2011/11/29/file-extraction-in-suricata.html</a><br>

<br>

File extraction in Suricata<br>

<br>

Today I pushed out a new feature in Suricata I’m very excited about. It<br>

has been long in the making and with over 6000 new lines of code it’s a<br>

significant effort. It’s available in the current git master. I’d<br>

consider it alpha quality, so handle with care.<br>

<br>

So what is this all about? Simply put, we can now extract files from<br>

HTTP streams in Suricata. Both uploads and downloads. Fully controlled<br>

by the rule language. But thats not all. I’ve added a touch of magic. By<br>

utilizing libmagic (this powers the “file” command), we know the file<br>

type of files as well. Lots of interesting stuff that can be done there.<br>

<br>

Rule keywords<br>

<br>

Four new rule keywords were added: filename, fileext, filemagic and<br>

filestore.<br>

<br>

Filename and fileext are pretty trivial: match on the full name or file<br>

extension of a file.<br>

<br>

    alert http any any -> any any (filename:”secret.xls”;)<br>

    alert http any any -> any any (fileext:”pdf”;)<br>

<br>

More interesting is the filemagic keyword. It runs on the magic output<br>

of inspecting the (start of) a file. This value is for example:<br>

<br>

    GIF image data, version 89a, 1 x 1<br>

    PE32 executable for MS Windows (GUI) Intel 80386 32-bit<br>

    HTML document text<br>

    Macromedia Flash data (compressed), version 9<br>

    MS Windows icon resource – 2 icons, 16×16, 256-colors<br>

    PNG image data, 70 x 53, 8-bit/color RGBA, non-interlaced<br>

    JPEG image data, JFIF standard 1.01<br>

    PDF document, version 1.6<br>

<br>

So how the filemagic keyword allows you to match on this is pretty simple:<br>

<br>

    alert http any any -> any any (filemagic:”PDF document”;)<br>

    alert http any any -> any any (filemagic:”PDF document, version 1.6″;)<br>

<br>

Pretty cool, eh? You can match both very specifically and loosely. For<br>

example:<br>

<br>

    alert http any any -> any any (filemagic:”executable for MS Windows”;)<br>

<br>

Will match on (among others) these types:<br>

<br>

    PE32 executable for MS Windows (DLL) (GUI) Intel 80386 32-bit<br>

    PE32 executable for MS Windows (GUI) Intel 80386 32-bit<br>

    PE32+ executable for MS Windows (GUI) Mono/.Net assembly<br>

<br>

Finally there is the filestore keyword. It is the simplest of all: if<br>

the rule matches, the files will be written to disk.<br>

<br>

Naturally you can combine the file keywords with the regular HTTP<br>

keywords, limiting to POST’s for example:<br>

<br>

    alert http $EXTERNAL_NET any -> $HOME_NET any (msg:”pdf upload<br>

claimed, but not pdf”; flow:established,to_server; content:”POST”;<br>

http_method; fileext:”pdf”; filemagic:!”PDF document”; filestore; sid:1;<br>

rev:1;)<br>

<br>

This will alert on and store all files that are uploaded using a POST<br>

request that have a filename extension of pdf, but the actual file is<br>

not pdf.<br>

<br>

Storage<br>

<br>

The storage to disk is handled by a new output module called “file”.<br>

It’s config looks like this:<br>

<br>

enabled: yes # set to yes to enable<br>

log-dir: files # directory to store the files<br>

force-magic: no # force logging magic on all stored files<br>

<br>

It needs to be enabled for file storing to work.<br>

<br>

The files are stored to disk as “file.1″, “file.2″, etc. For each of the<br>

files a meta file is created containing the flow information, file name,<br>

size, etc. Example:<br>

<br>

TIME: 01/27/2010-17:41:11.579196<br>

PCAP PKT NUM: 2847035<br>

SRC IP: <a href="tel:68.142.93.214" value="+16814293214">68.142.93.214</a><br>

DST IP: 10.7.185.57<br>

PROTO: 6<br>

SRC PORT: 80<br>

DST PORT: 56207<br>

FILENAME:<br>

/msdownload/update/software/defu/2010/01/mpas-fe_7af9217bac55e4a6f71c989231e424a9e3d9055b.exe<br>

MAGIC: PE32+ executable for MS Windows (GUI) Mono/.Net assembly<br>

STATE: CLOSED<br>

SIZE: 5204<br>

<br>

Configuration<br>

<br>

The file extraction is for HTTP only currently, and works on top of our<br>

HTTP parser. As the HTTP parser runs on top of the stream reassembly<br>

engine, configuration parameters of both these parts of Suricata affect<br>

handling of files.<br>

<br>

The stream engine option “stream.reassembly.depth” (default 1 Mb)<br>

controls the depth into a stream in which we look. Set to 0 for no limit.<br>

The libhtp options request-body-limit and response-body-limit control<br>

how far into a HTTP request or response body we look. Again set to 0 for<br>

no limit. This can be controlled per HTTP server.<br>

<br>

Performance<br>

<br>

The file handling is fully streaming, so it’s very efficient.<br>

Nonetheless there will be an overhead for the extra parsing, book<br>

keeping, writing to disk, etc. Memory requirements appear to be limited<br>

as well. Suricata shouldn’t keep more than a few kb per flow in memory.<br>

<br>

Limitations<br>

<br>

Lack of limits is a limitation. For file storage no limits have been<br>

implemented yet. So it’s easy to clutter your disk up with files.<br>

Example: 118Gb enterprise pcap storing just JPG’s extracted 400.000<br>

files. Better use a separate partition if you’re on a life link.<br>

<br>

Future work<br>

<br>

Apart from stabilizing this code and performance optimizing it, the next<br>

step will be SMTP file extraction. Possibly other protocols, although<br>

nothing is set in stone there yet.<br>

<span class="HOEnZb"><font color="#888888"><br>

<br>

--<br>

---------------------------------------------<br>

Victor Julien<br>

<a href="http://www.inliniac.net/" target="_blank">http://www.inliniac.net/</a><br>

PGP: <a href="http://www.inliniac.net/victorjulien.asc" target="_blank">http://www.inliniac.net/victorjulien.asc</a><br>

---------------------------------------------<br>

<br>

_______________________________________________<br>

Oisf-users mailing list<br>

<a href="mailto:Oisf-users@openinfosecfoundation.org">Oisf-users@openinfosecfoundation.org</a><br>

<a href="http://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users" target="_blank">http://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users</a><br>

</font></span></blockquote></div><br>