I think that is up to you to decide.<br>I would personnalyy want to keep a separate rule for PDFs.<br>thanks<br><br><div class="gmail_quote">On Fri, Dec 9, 2011 at 7:41 PM, Dewhirst, Rob <span dir="ltr"><<a href="mailto:robdewhirst@gmail.com">robdewhirst@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">How are people typically implementing this?  One rule file for all<br>

types of file extraction?<br>

<div class="HOEnZb"><div class="h5"><br>

On Thu, Dec 1, 2011 at 3:38 AM, Peter Manev <<a href="mailto:petermanev@gmail.com">petermanev@gmail.com</a>> wrote:<br>

> This is absolutely phenomenal  - great work - dev team!<br>

> Makes it a lot easier to find,learn,teach,look into pdf/other attachment<br>

> exploits.....<br>

><br>

><br>

> On Thu, Dec 1, 2011 at 10:26 AM, Kevin Ross <<a href="mailto:kevross33@googlemail.com">kevross33@googlemail.com</a>><br>

> wrote:<br>

>><br>

>> Oh happy day.... :-) This will be great for getting binaries off network<br>

>> for things like exploit kits and so on. I think having the file is<br>

>> essential; to confirm exploits, to see if you detect something as malware<br>

>> and if not how you could fix that if you have a problem where something is<br>

>> on a machine. While grabbing files off network good and do other stuff to it<br>

>> to try and get out what is bad being able to say extract files dropped after<br>

>> a Java exploit using those sigs or a file sent from an exploit kit will be<br>

>> very useful.<br>

>><br>

>> Great work.<br>

>><br>

>><br>

>> On 29 November 2011 16:54, Victor Julien <<a href="mailto:victor@inliniac.net">victor@inliniac.net</a>> wrote:<br>

>>><br>

>>> >From my blog:<br>

>>> <a href="http://www.inliniac.net/blog/2011/11/29/file-extraction-in-suricata.html" target="_blank">http://www.inliniac.net/blog/2011/11/29/file-extraction-in-suricata.html</a><br>

>>><br>

>>> File extraction in Suricata<br>

>>><br>

>>> Today I pushed out a new feature in Suricata I’m very excited about. It<br>

>>> has been long in the making and with over 6000 new lines of code it’s a<br>

>>> significant effort. It’s available in the current git master. I’d<br>

>>> consider it alpha quality, so handle with care.<br>

>>><br>

>>> So what is this all about? Simply put, we can now extract files from<br>

>>> HTTP streams in Suricata. Both uploads and downloads. Fully controlled<br>

>>> by the rule language. But thats not all. I’ve added a touch of magic. By<br>

>>> utilizing libmagic (this powers the “file” command), we know the file<br>

>>> type of files as well. Lots of interesting stuff that can be done there.<br>

>>><br>

>>> Rule keywords<br>

>>><br>

>>> Four new rule keywords were added: filename, fileext, filemagic and<br>

>>> filestore.<br>

>>><br>

>>> Filename and fileext are pretty trivial: match on the full name or file<br>

>>> extension of a file.<br>

>>><br>

>>>    alert http any any -> any any (filename:”secret.xls”;)<br>

>>>    alert http any any -> any any (fileext:”pdf”;)<br>

>>><br>

>>> More interesting is the filemagic keyword. It runs on the magic output<br>

>>> of inspecting the (start of) a file. This value is for example:<br>

>>><br>

>>>    GIF image data, version 89a, 1 x 1<br>

>>>    PE32 executable for MS Windows (GUI) Intel 80386 32-bit<br>

>>>    HTML document text<br>

>>>    Macromedia Flash data (compressed), version 9<br>

>>>    MS Windows icon resource – 2 icons, 16×16, 256-colors<br>

>>>    PNG image data, 70 x 53, 8-bit/color RGBA, non-interlaced<br>

>>>    JPEG image data, JFIF standard 1.01<br>

>>>    PDF document, version 1.6<br>

>>><br>

>>> So how the filemagic keyword allows you to match on this is pretty<br>

>>> simple:<br>

>>><br>

>>>    alert http any any -> any any (filemagic:”PDF document”;)<br>

>>>    alert http any any -> any any (filemagic:”PDF document, version 1.6″;)<br>

>>><br>

>>> Pretty cool, eh? You can match both very specifically and loosely. For<br>

>>> example:<br>

>>><br>

>>>    alert http any any -> any any (filemagic:”executable for MS Windows”;)<br>

>>><br>

>>> Will match on (among others) these types:<br>

>>><br>

>>>    PE32 executable for MS Windows (DLL) (GUI) Intel 80386 32-bit<br>

>>>    PE32 executable for MS Windows (GUI) Intel 80386 32-bit<br>

>>>    PE32+ executable for MS Windows (GUI) Mono/.Net assembly<br>

>>><br>

>>> Finally there is the filestore keyword. It is the simplest of all: if<br>

>>> the rule matches, the files will be written to disk.<br>

>>><br>

>>> Naturally you can combine the file keywords with the regular HTTP<br>

>>> keywords, limiting to POST’s for example:<br>

>>><br>

>>>    alert http $EXTERNAL_NET any -> $HOME_NET any (msg:”pdf upload<br>

>>> claimed, but not pdf”; flow:established,to_server; content:”POST”;<br>

>>> http_method; fileext:”pdf”; filemagic:!”PDF document”; filestore; sid:1;<br>

>>> rev:1;)<br>

>>><br>

>>> This will alert on and store all files that are uploaded using a POST<br>

>>> request that have a filename extension of pdf, but the actual file is<br>

>>> not pdf.<br>

>>><br>

>>> Storage<br>

>>><br>

>>> The storage to disk is handled by a new output module called “file”.<br>

>>> It’s config looks like this:<br>

>>><br>

>>> enabled: yes # set to yes to enable<br>

>>> log-dir: files # directory to store the files<br>

>>> force-magic: no # force logging magic on all stored files<br>

>>><br>

>>> It needs to be enabled for file storing to work.<br>

>>><br>

>>> The files are stored to disk as “file.1″, “file.2″, etc. For each of the<br>

>>> files a meta file is created containing the flow information, file name,<br>

>>> size, etc. Example:<br>

>>><br>

>>> TIME: 01/27/2010-17:41:11.579196<br>

>>> PCAP PKT NUM: 2847035<br>

>>> SRC IP: <a href="tel:68.142.93.214" value="+16814293214">68.142.93.214</a><br>

>>> DST IP: 10.7.185.57<br>

>>> PROTO: 6<br>

>>> SRC PORT: 80<br>

>>> DST PORT: 56207<br>

>>> FILENAME:<br>

>>><br>

>>> /msdownload/update/software/defu/2010/01/mpas-fe_7af9217bac55e4a6f71c989231e424a9e3d9055b.exe<br>

>>> MAGIC: PE32+ executable for MS Windows (GUI) Mono/.Net assembly<br>

>>> STATE: CLOSED<br>

>>> SIZE: 5204<br>

>>><br>

>>> Configuration<br>

>>><br>

>>> The file extraction is for HTTP only currently, and works on top of our<br>

>>> HTTP parser. As the HTTP parser runs on top of the stream reassembly<br>

>>> engine, configuration parameters of both these parts of Suricata affect<br>

>>> handling of files.<br>

>>><br>

>>> The stream engine option “stream.reassembly.depth” (default 1 Mb)<br>

>>> controls the depth into a stream in which we look. Set to 0 for no limit.<br>

>>> The libhtp options request-body-limit and response-body-limit control<br>

>>> how far into a HTTP request or response body we look. Again set to 0 for<br>

>>> no limit. This can be controlled per HTTP server.<br>

>>><br>

>>> Performance<br>

>>><br>

>>> The file handling is fully streaming, so it’s very efficient.<br>

>>> Nonetheless there will be an overhead for the extra parsing, book<br>

>>> keeping, writing to disk, etc. Memory requirements appear to be limited<br>

>>> as well. Suricata shouldn’t keep more than a few kb per flow in memory.<br>

>>><br>

>>> Limitations<br>

>>><br>

>>> Lack of limits is a limitation. For file storage no limits have been<br>

>>> implemented yet. So it’s easy to clutter your disk up with files.<br>

>>> Example: 118Gb enterprise pcap storing just JPG’s extracted 400.000<br>

>>> files. Better use a separate partition if you’re on a life link.<br>

>>><br>

>>> Future work<br>

>>><br>

>>> Apart from stabilizing this code and performance optimizing it, the next<br>

>>> step will be SMTP file extraction. Possibly other protocols, although<br>

>>> nothing is set in stone there yet.<br>

>>><br>

>>><br>

>>> --<br>

>>> ---------------------------------------------<br>

>>> Victor Julien<br>

>>> <a href="http://www.inliniac.net/" target="_blank">http://www.inliniac.net/</a><br>

>>> PGP: <a href="http://www.inliniac.net/victorjulien.asc" target="_blank">http://www.inliniac.net/victorjulien.asc</a><br>

>>> ---------------------------------------------<br>

>>><br>

>>> _______________________________________________<br>

>>> Oisf-users mailing list<br>

>>> <a href="mailto:Oisf-users@openinfosecfoundation.org">Oisf-users@openinfosecfoundation.org</a><br>

>>> <a href="http://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users" target="_blank">http://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users</a><br>

>><br>

>><br>

>><br>

>> _______________________________________________<br>

>> Oisf-users mailing list<br>

>> <a href="mailto:Oisf-users@openinfosecfoundation.org">Oisf-users@openinfosecfoundation.org</a><br>

>> <a href="http://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users" target="_blank">http://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users</a><br>

>><br>

><br>

><br>

><br>

> --<br>

> Peter Manev<br>

><br>

> _______________________________________________<br>

> Oisf-users mailing list<br>

> <a href="mailto:Oisf-users@openinfosecfoundation.org">Oisf-users@openinfosecfoundation.org</a><br>

> <a href="http://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users" target="_blank">http://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users</a><br>

><br>

_______________________________________________<br>

Oisf-users mailing list<br>

<a href="mailto:Oisf-users@openinfosecfoundation.org">Oisf-users@openinfosecfoundation.org</a><br>

<a href="http://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users" target="_blank">http://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users</a><br>

</div></div></blockquote></div><br><br clear="all"><br>-- <br>Peter Manev<br>

<div style id="avg_ls_inline_popup"></div>