[Oisf-users] [Oisf-devel] file extraction -- Re: [COMMIT] OISF branch, master, updated. a556338936ad3cd2b0379a6985fb62084368d99e

Sat Dec 24 16:34:52 UTC 2011

log-dir must exist if or you will get a fatal error.

that may not be unique to this output module, but since it's a new
module others may run into this issue as I have.

On Tue, Nov 29, 2011 at 11:54 AM, Victor Julien <victor at inliniac.net> wrote:
> From my blog:
> http://www.inliniac.net/blog/2011/11/29/file-extraction-in-suricata.html
>
> File extraction in Suricata
>
> Today I pushed out a new feature in Suricata I’m very excited about. It
> has been long in the making and with over 6000 new lines of code it’s a
> significant effort. It’s available in the current git master. I’d
> consider it alpha quality, so handle with care.
>
> So what is this all about? Simply put, we can now extract files from
> HTTP streams in Suricata. Both uploads and downloads. Fully controlled
> by the rule language. But thats not all. I’ve added a touch of magic. By
> utilizing libmagic (this powers the “file” command), we know the file
> type of files as well. Lots of interesting stuff that can be done there.
>
> Rule keywords
>
> Four new rule keywords were added: filename, fileext, filemagic and
> filestore.
>
> Filename and fileext are pretty trivial: match on the full name or file
> extension of a file.
>
>    alert http any any -> any any (filename:”secret.xls”;)
>    alert http any any -> any any (fileext:”pdf”;)
>
> More interesting is the filemagic keyword. It runs on the magic output
> of inspecting the (start of) a file. This value is for example:
>
>    GIF image data, version 89a, 1 x 1
>    PE32 executable for MS Windows (GUI) Intel 80386 32-bit
>    HTML document text
>    Macromedia Flash data (compressed), version 9
>    MS Windows icon resource – 2 icons, 16×16, 256-colors
>    PNG image data, 70 x 53, 8-bit/color RGBA, non-interlaced
>    JPEG image data, JFIF standard 1.01
>    PDF document, version 1.6
>
> So how the filemagic keyword allows you to match on this is pretty simple:
>
>    alert http any any -> any any (filemagic:”PDF document”;)
>    alert http any any -> any any (filemagic:”PDF document, version 1.6″;)
>
> Pretty cool, eh? You can match both very specifically and loosely. For
> example:
>
>    alert http any any -> any any (filemagic:”executable for MS Windows”;)
>
> Will match on (among others) these types:
>
>    PE32 executable for MS Windows (DLL) (GUI) Intel 80386 32-bit
>    PE32 executable for MS Windows (GUI) Intel 80386 32-bit
>    PE32+ executable for MS Windows (GUI) Mono/.Net assembly
>
> Finally there is the filestore keyword. It is the simplest of all: if
> the rule matches, the files will be written to disk.
>
> Naturally you can combine the file keywords with the regular HTTP
> keywords, limiting to POST’s for example:
>
>    alert http $EXTERNAL_NET any -> $HOME_NET any (msg:”pdf upload
> claimed, but not pdf”; flow:established,to_server; content:”POST”;
> http_method; fileext:”pdf”; filemagic:!”PDF document”; filestore; sid:1;
> rev:1;)
>
> This will alert on and store all files that are uploaded using a POST
> request that have a filename extension of pdf, but the actual file is
> not pdf.
>
> Storage
>
> The storage to disk is handled by a new output module called “file”.
> It’s config looks like this:
>
> enabled: yes # set to yes to enable
> log-dir: files # directory to store the files
> force-magic: no # force logging magic on all stored files
>
> It needs to be enabled for file storing to work.
>
> The files are stored to disk as “file.1″, “file.2″, etc. For each of the
> files a meta file is created containing the flow information, file name,
> size, etc. Example:
>
> TIME: 01/27/2010-17:41:11.579196
> PCAP PKT NUM: 2847035
> SRC IP: 68.142.93.214
> DST IP: 10.7.185.57
> PROTO: 6
> SRC PORT: 80
> DST PORT: 56207
> FILENAME:
> /msdownload/update/software/defu/2010/01/mpas-fe_7af9217bac55e4a6f71c989231e424a9e3d9055b.exe
> MAGIC: PE32+ executable for MS Windows (GUI) Mono/.Net assembly
> STATE: CLOSED
> SIZE: 5204
>
> Configuration
>
> The file extraction is for HTTP only currently, and works on top of our
> HTTP parser. As the HTTP parser runs on top of the stream reassembly
> engine, configuration parameters of both these parts of Suricata affect
> handling of files.
>
> The stream engine option “stream.reassembly.depth” (default 1 Mb)
> controls the depth into a stream in which we look. Set to 0 for no limit.
> The libhtp options request-body-limit and response-body-limit control
> how far into a HTTP request or response body we look. Again set to 0 for
> no limit. This can be controlled per HTTP server.
>
> Performance
>
> The file handling is fully streaming, so it’s very efficient.
> Nonetheless there will be an overhead for the extra parsing, book
> keeping, writing to disk, etc. Memory requirements appear to be limited
> as well. Suricata shouldn’t keep more than a few kb per flow in memory.
>
> Limitations
>
> Lack of limits is a limitation. For file storage no limits have been
> implemented yet. So it’s easy to clutter your disk up with files.
> Example: 118Gb enterprise pcap storing just JPG’s extracted 400.000
> files. Better use a separate partition if you’re on a life link.
>
> Future work
>
> Apart from stabilizing this code and performance optimizing it, the next
> step will be SMTP file extraction. Possibly other protocols, although
> nothing is set in stone there yet.
>
>
> --
> ---------------------------------------------
> Victor Julien
> http://www.inliniac.net/
> PGP: http://www.inliniac.net/victorjulien.asc
> ---------------------------------------------
>
> _______________________________________________
> Oisf-users mailing list
> Oisf-users at openinfosecfoundation.org
> http://lists.openinfosecfoundation.org/mailman/listinfo/oisf-users