Oh happy day.... :-) This will be great for getting binaries off network for things like exploit kits and so on. I think having the file is essential; to confirm exploits, to see if you detect something as malware and if not how you could fix that if you have a problem where something is on a machine. While grabbing files off network good and do other stuff to it to try and get out what is bad being able to say extract files dropped after a Java exploit using those sigs or a file sent from an exploit kit will be very useful.<br>
<br>Great work. <br><br><br><div class="gmail_quote">On 29 November 2011 16:54, Victor Julien <span dir="ltr"><<a href="mailto:victor@inliniac.net">victor@inliniac.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
>From my blog:<br>
<a href="http://www.inliniac.net/blog/2011/11/29/file-extraction-in-suricata.html" target="_blank">http://www.inliniac.net/blog/2011/11/29/file-extraction-in-suricata.html</a><br>
<br>
File extraction in Suricata<br>
<br>
Today I pushed out a new feature in Suricata I’m very excited about. It<br>
has been long in the making and with over 6000 new lines of code it’s a<br>
significant effort. It’s available in the current git master. I’d<br>
consider it alpha quality, so handle with care.<br>
<br>
So what is this all about? Simply put, we can now extract files from<br>
HTTP streams in Suricata. Both uploads and downloads. Fully controlled<br>
by the rule language. But thats not all. I’ve added a touch of magic. By<br>
utilizing libmagic (this powers the “file” command), we know the file<br>
type of files as well. Lots of interesting stuff that can be done there.<br>
<br>
Rule keywords<br>
<br>
Four new rule keywords were added: filename, fileext, filemagic and<br>
filestore.<br>
<br>
Filename and fileext are pretty trivial: match on the full name or file<br>
extension of a file.<br>
<br>
alert http any any -> any any (filename:”secret.xls”;)<br>
alert http any any -> any any (fileext:”pdf”;)<br>
<br>
More interesting is the filemagic keyword. It runs on the magic output<br>
of inspecting the (start of) a file. This value is for example:<br>
<br>
GIF image data, version 89a, 1 x 1<br>
PE32 executable for MS Windows (GUI) Intel 80386 32-bit<br>
HTML document text<br>
Macromedia Flash data (compressed), version 9<br>
MS Windows icon resource – 2 icons, 16×16, 256-colors<br>
PNG image data, 70 x 53, 8-bit/color RGBA, non-interlaced<br>
JPEG image data, JFIF standard 1.01<br>
PDF document, version 1.6<br>
<br>
So how the filemagic keyword allows you to match on this is pretty simple:<br>
<br>
alert http any any -> any any (filemagic:”PDF document”;)<br>
alert http any any -> any any (filemagic:”PDF document, version 1.6″;)<br>
<br>
Pretty cool, eh? You can match both very specifically and loosely. For<br>
example:<br>
<br>
alert http any any -> any any (filemagic:”executable for MS Windows”;)<br>
<br>
Will match on (among others) these types:<br>
<br>
PE32 executable for MS Windows (DLL) (GUI) Intel 80386 32-bit<br>
PE32 executable for MS Windows (GUI) Intel 80386 32-bit<br>
PE32+ executable for MS Windows (GUI) Mono/.Net assembly<br>
<br>
Finally there is the filestore keyword. It is the simplest of all: if<br>
the rule matches, the files will be written to disk.<br>
<br>
Naturally you can combine the file keywords with the regular HTTP<br>
keywords, limiting to POST’s for example:<br>
<br>
alert http $EXTERNAL_NET any -> $HOME_NET any (msg:”pdf upload<br>
claimed, but not pdf”; flow:established,to_server; content:”POST”;<br>
http_method; fileext:”pdf”; filemagic:!”PDF document”; filestore; sid:1;<br>
rev:1;)<br>
<br>
This will alert on and store all files that are uploaded using a POST<br>
request that have a filename extension of pdf, but the actual file is<br>
not pdf.<br>
<br>
Storage<br>
<br>
The storage to disk is handled by a new output module called “file”.<br>
It’s config looks like this:<br>
<br>
enabled: yes # set to yes to enable<br>
log-dir: files # directory to store the files<br>
force-magic: no # force logging magic on all stored files<br>
<br>
It needs to be enabled for file storing to work.<br>
<br>
The files are stored to disk as “file.1″, “file.2″, etc. For each of the<br>
files a meta file is created containing the flow information, file name,<br>
size, etc. Example:<br>
<br>
TIME: 01/27/2010-17:41:11.579196<br>
PCAP PKT NUM: 2847035<br>
SRC IP: <a href="tel:68.142.93.214" value="+16814293214">68.142.93.214</a><br>
DST IP: 10.7.185.57<br>
PROTO: 6<br>
SRC PORT: 80<br>
DST PORT: 56207<br>
FILENAME:<br>
/msdownload/update/software/defu/2010/01/mpas-fe_7af9217bac55e4a6f71c989231e424a9e3d9055b.exe<br>
MAGIC: PE32+ executable for MS Windows (GUI) Mono/.Net assembly<br>
STATE: CLOSED<br>
SIZE: 5204<br>
<br>
Configuration<br>
<br>
The file extraction is for HTTP only currently, and works on top of our<br>
HTTP parser. As the HTTP parser runs on top of the stream reassembly<br>
engine, configuration parameters of both these parts of Suricata affect<br>
handling of files.<br>
<br>
The stream engine option “stream.reassembly.depth” (default 1 Mb)<br>
controls the depth into a stream in which we look. Set to 0 for no limit.<br>
The libhtp options request-body-limit and response-body-limit control<br>
how far into a HTTP request or response body we look. Again set to 0 for<br>
no limit. This can be controlled per HTTP server.<br>
<br>
Performance<br>
<br>
The file handling is fully streaming, so it’s very efficient.<br>
Nonetheless there will be an overhead for the extra parsing, book<br>
keeping, writing to disk, etc. Memory requirements appear to be limited<br>
as well. Suricata shouldn’t keep more than a few kb per flow in memory.<br>
<br>
Limitations<br>
<br>
Lack of limits is a limitation. For file storage no limits have been<br>
implemented yet. So it’s easy to clutter your disk up with files.<br>
Example: 118Gb enterprise pcap storing just JPG’s extracted 400.000<br>
files. Better use a separate partition if you’re on a life link.<br>
<br>
Future work<br>
<br>
Apart from stabilizing this code and performance optimizing it, the next<br>
step will be SMTP file extraction. Possibly other protocols, although<br>
nothing is set in stone there yet.<br>
<span class="HOEnZb"><font color="#888888"><br>
<br>
--<br>
---------------------------------------------<br>
Victor Julien<br>
<a href="http://www.inliniac.net/" target="_blank">http://www.inliniac.net/</a><br>
PGP: <a href="http://www.inliniac.net/victorjulien.asc" target="_blank">http://www.inliniac.net/victorjulien.asc</a><br>
---------------------------------------------<br>
<br>
_______________________________________________<br>
Oisf-users mailing list<br>
<a href="mailto:Oisf-users@openinfosecfoundation.org">Oisf-users@openinfosecfoundation.org</a><br>
<a href="http://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users" target="_blank">http://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users</a><br>
</font></span></blockquote></div><br>