[Oisf-users] [Oisf-devel] file extraction -- Re: [COMMIT] OISF branch, master, updated. a556338936ad3cd2b0379a6985fb62084368d99e

Fri Dec 9 18:41:03 UTC 2011

How are people typically implementing this?  One rule file for all
types of file extraction?

On Thu, Dec 1, 2011 at 3:38 AM, Peter Manev <petermanev at gmail.com> wrote:
> This is absolutely phenomenal  - great work - dev team!
> Makes it a lot easier to find,learn,teach,look into pdf/other attachment
> exploits.....
>
>
> On Thu, Dec 1, 2011 at 10:26 AM, Kevin Ross <kevross33 at googlemail.com>
> wrote:
>>
>> Oh happy day.... :-) This will be great for getting binaries off network
>> for things like exploit kits and so on. I think having the file is
>> essential; to confirm exploits, to see if you detect something as malware
>> and if not how you could fix that if you have a problem where something is
>> on a machine. While grabbing files off network good and do other stuff to it
>> to try and get out what is bad being able to say extract files dropped after
>> a Java exploit using those sigs or a file sent from an exploit kit will be
>> very useful.
>>
>> Great work.
>>
>>
>> On 29 November 2011 16:54, Victor Julien <victor at inliniac.net> wrote:
>>>
>>> >From my blog:
>>> http://www.inliniac.net/blog/2011/11/29/file-extraction-in-suricata.html
>>>
>>> File extraction in Suricata
>>>
>>> Today I pushed out a new feature in Suricata I’m very excited about. It
>>> has been long in the making and with over 6000 new lines of code it’s a
>>> significant effort. It’s available in the current git master. I’d
>>> consider it alpha quality, so handle with care.
>>>
>>> So what is this all about? Simply put, we can now extract files from
>>> HTTP streams in Suricata. Both uploads and downloads. Fully controlled
>>> by the rule language. But thats not all. I’ve added a touch of magic. By
>>> utilizing libmagic (this powers the “file” command), we know the file
>>> type of files as well. Lots of interesting stuff that can be done there.
>>>
>>> Rule keywords
>>>
>>> Four new rule keywords were added: filename, fileext, filemagic and
>>> filestore.
>>>
>>> Filename and fileext are pretty trivial: match on the full name or file
>>> extension of a file.
>>>
>>>    alert http any any -> any any (filename:”secret.xls”;)
>>>    alert http any any -> any any (fileext:”pdf”;)
>>>
>>> More interesting is the filemagic keyword. It runs on the magic output
>>> of inspecting the (start of) a file. This value is for example:
>>>
>>>    GIF image data, version 89a, 1 x 1
>>>    PE32 executable for MS Windows (GUI) Intel 80386 32-bit
>>>    HTML document text
>>>    Macromedia Flash data (compressed), version 9
>>>    MS Windows icon resource – 2 icons, 16×16, 256-colors
>>>    PNG image data, 70 x 53, 8-bit/color RGBA, non-interlaced
>>>    JPEG image data, JFIF standard 1.01
>>>    PDF document, version 1.6
>>>
>>> So how the filemagic keyword allows you to match on this is pretty
>>> simple:
>>>
>>>    alert http any any -> any any (filemagic:”PDF document”;)
>>>    alert http any any -> any any (filemagic:”PDF document, version 1.6″;)
>>>
>>> Pretty cool, eh? You can match both very specifically and loosely. For
>>> example:
>>>
>>>    alert http any any -> any any (filemagic:”executable for MS Windows”;)
>>>
>>> Will match on (among others) these types:
>>>
>>>    PE32 executable for MS Windows (DLL) (GUI) Intel 80386 32-bit
>>>    PE32 executable for MS Windows (GUI) Intel 80386 32-bit
>>>    PE32+ executable for MS Windows (GUI) Mono/.Net assembly
>>>
>>> Finally there is the filestore keyword. It is the simplest of all: if
>>> the rule matches, the files will be written to disk.
>>>
>>> Naturally you can combine the file keywords with the regular HTTP
>>> keywords, limiting to POST’s for example:
>>>
>>>    alert http $EXTERNAL_NET any -> $HOME_NET any (msg:”pdf upload
>>> claimed, but not pdf”; flow:established,to_server; content:”POST”;
>>> http_method; fileext:”pdf”; filemagic:!”PDF document”; filestore; sid:1;
>>> rev:1;)
>>>
>>> This will alert on and store all files that are uploaded using a POST
>>> request that have a filename extension of pdf, but the actual file is
>>> not pdf.
>>>
>>> Storage
>>>
>>> The storage to disk is handled by a new output module called “file”.
>>> It’s config looks like this:
>>>
>>> enabled: yes # set to yes to enable
>>> log-dir: files # directory to store the files
>>> force-magic: no # force logging magic on all stored files
>>>
>>> It needs to be enabled for file storing to work.
>>>
>>> The files are stored to disk as “file.1″, “file.2″, etc. For each of the
>>> files a meta file is created containing the flow information, file name,
>>> size, etc. Example:
>>>
>>> TIME: 01/27/2010-17:41:11.579196
>>> PCAP PKT NUM: 2847035
>>> SRC IP: 68.142.93.214
>>> DST IP: 10.7.185.57
>>> PROTO: 6
>>> SRC PORT: 80
>>> DST PORT: 56207
>>> FILENAME:
>>>
>>> /msdownload/update/software/defu/2010/01/mpas-fe_7af9217bac55e4a6f71c989231e424a9e3d9055b.exe
>>> MAGIC: PE32+ executable for MS Windows (GUI) Mono/.Net assembly
>>> STATE: CLOSED
>>> SIZE: 5204
>>>
>>> Configuration
>>>
>>> The file extraction is for HTTP only currently, and works on top of our
>>> HTTP parser. As the HTTP parser runs on top of the stream reassembly
>>> engine, configuration parameters of both these parts of Suricata affect
>>> handling of files.
>>>
>>> The stream engine option “stream.reassembly.depth” (default 1 Mb)
>>> controls the depth into a stream in which we look. Set to 0 for no limit.
>>> The libhtp options request-body-limit and response-body-limit control
>>> how far into a HTTP request or response body we look. Again set to 0 for
>>> no limit. This can be controlled per HTTP server.
>>>
>>> Performance
>>>
>>> The file handling is fully streaming, so it’s very efficient.
>>> Nonetheless there will be an overhead for the extra parsing, book
>>> keeping, writing to disk, etc. Memory requirements appear to be limited
>>> as well. Suricata shouldn’t keep more than a few kb per flow in memory.
>>>
>>> Limitations
>>>
>>> Lack of limits is a limitation. For file storage no limits have been
>>> implemented yet. So it’s easy to clutter your disk up with files.
>>> Example: 118Gb enterprise pcap storing just JPG’s extracted 400.000
>>> files. Better use a separate partition if you’re on a life link.
>>>
>>> Future work
>>>
>>> Apart from stabilizing this code and performance optimizing it, the next
>>> step will be SMTP file extraction. Possibly other protocols, although
>>> nothing is set in stone there yet.
>>>
>>>
>>> --
>>> ---------------------------------------------
>>> Victor Julien
>>> http://www.inliniac.net/
>>> PGP: http://www.inliniac.net/victorjulien.asc
>>> ---------------------------------------------
>>>
>>> _______________________________________________
>>> Oisf-users mailing list
>>> Oisf-users at openinfosecfoundation.org
>>> http://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users
>>
>>
>>
>> _______________________________________________
>> Oisf-users mailing list
>> Oisf-users at openinfosecfoundation.org
>> http://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users
>>
>
>
>
> --
> Peter Manev
>
> _______________________________________________
> Oisf-users mailing list
> Oisf-users at openinfosecfoundation.org
> http://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users
>