[Oisf-devel] libhtp 0.5.x integration - bug 775

Ivan Ristic ivan.ristic at gmail.com
Mon Jun 10 12:22:09 UTC 2013


On Fri, Jun 7, 2013 at 3:51 PM, Anoop Saldanha <anoopsaldanha at gmail.com> wrote:
> Ivan,
>
> When a request such as "HELLO\r\n", libhtp would have the
> "request_protocol_number" set as HTP_PROTOCOL_0_9.  Is that right or
> should it be HTP_PROTOCOL_UNKNOWN?

I think HTP_PROTOCOL_0_9 is right, according to real-life web server
behaviour. For example, if you submit such a request to Apache it will
process it as a HTTP/0.9 request, even though the request method is
invalid (Apache always ignores that) and the path is not set.


> On Tue, Jun 4, 2013 at 12:02 AM, Anoop Saldanha <anoopsaldanha at gmail.com> wrote:
>> On Mon, Jun 3, 2013 at 10:48 PM, Victor Julien <victor at inliniac.net> wrote:
>>> On 06/03/2013 06:07 PM, Anoop Saldanha wrote:
>>>> @Victor
>>>>
>>>> Since we need to store the normalized request uri in our htp_state, we
>>>> can probably figure out a solution that we can also reuse in dcerpc
>>>> for storing transactions.
>>>>
>>>> Probably a linked_list that stores the tx_id(tx id for the related
>>>> data) of it's head?
>>>
>>> Would it be an option to use the per tx HtpUserData that we use in the
>>> 0.2.x implementation for body tracking?
>>
>> Yes, we can use it store all generated buffers.
>>
>>>
>>>> On Wed, May 15, 2013 at 12:38 PM, Anoop Saldanha
>>>> <anoopsaldanha at gmail.com> wrote:
>>>>> Right.  Thanks.
>>>>>
>>>>> On Wed, May 15, 2013 at 12:18 PM, Ivan Ristic <ivan.ristic at gmail.com> wrote:
>>>>>> On Wed, May 15, 2013 at 7:37 AM, Anoop Saldanha <anoopsaldanha at gmail.com> wrote:
>>>>>>> Ivan,
>>>>>>>
>>>>>>> I see the introduction of
>>>>>>>
>>>>>>> htp_tx_t *htp_connp_get_in_tx(const htp_connp_t *connp);
>>>>>>> htp_tx_t *htp_connp_get_out_tx(const htp_connp_t *connp);
>>>>>>>
>>>>>>> Which means I won't be able to retrieve individual txs?
>>>>>>
>>>>>> Those 2 functions will give you only the currently active request and
>>>>>> response, respectively. There can be one of each at any given time.
>>>>>>
>>>>>> With recent changes, callbacks are sent the correct tx, so the above
>>>>>> functions will rarely be needed when you're processing one transaction
>>>>>> at a time.
>>>>>>
>>>>>>
>>>>>>> I receive 5
>>>>>>> pipelined requests, so that would be 5 txs created.  How do I retrieve
>>>>>>> the individual txs?
>>>>>>
>>>>>> The transactions are in htp_conn_t::transactions, which is a list. How
>>>>>> to access the htp_conn_t pointer depends on your setup. You probably
>>>>>> keep a pointer to connp somewhere in your context, and from there you
>>>>>> can get a connection using htp_connp_get_connection().
>>>>>>
>>>>>>
>>>>>>> On Thu, Apr 11, 2013 at 10:16 PM, Anoop Saldanha
>>>>>>> <anoopsaldanha at gmail.com> wrote:
>>>>>>>> On Thu, Apr 11, 2013 at 9:10 PM, Ivan Ristic <ivan.ristic at gmail.com> wrote:
>>>>>>>>> I wouldn't advise you to do any buffering anyhow.
>>>>>>>>> But I am curious if you're
>>>>>>>>> deleting transactions once you're done with them. Because, if you're not,
>>>>>>>>> you may be allocating a lot of memory (all tx instances) on long-lived HTTP
>>>>>>>>> connections.
>>>>>>>>>
>>>>>>>>
>>>>>>>> We do delete them, once we're done.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Apr 9, 2013 at 1:06 PM, Anoop Saldanha <anoopsaldanha at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> On Tue, Apr 9, 2013 at 2:36 PM, Victor Julien <victor at inliniac.net> wrote:
>>>>>>>>>>> (bad juju to brian and ivan for top posting and/or html emails! :)
>>>>>>>>>>>
>>>>>>>>>>> On 04/09/2013 10:21 AM, Ivan Ristic wrote:
>>>>>>>>>>>> On Mon, Apr 8, 2013 at 3:54 PM, Anoop Saldanha <anoopsaldanha at gmail.com
>>>>>>>>>>>> <mailto:anoopsaldanha at gmail.com>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>     On Mon, Apr 8, 2013 at 7:50 PM, Brian Rectanus <brectanu at gmail.com
>>>>>>>>>>>>     <mailto:brectanu at gmail.com>> wrote:
>>>>>>>>>>>>     >
>>>>>>>>>>>>     > On Mon, Apr 8, 2013 at 9:16 AM, Brian Rectanus
>>>>>>>>>>>> <brectanu at gmail.com
>>>>>>>>>>>>     <mailto:brectanu at gmail.com>> wrote:
>>>>>>>>>>>>     >>
>>>>>>>>>>>>     >>
>>>>>>>>>>>>     >> On Mon, Apr 8, 2013 at 8:47 AM, Anoop Saldanha
>>>>>>>>>>>>     <anoopsaldanha at gmail.com <mailto:anoopsaldanha at gmail.com>>
>>>>>>>>>>>>     >> wrote:
>>>>>>>>>>>>     >>>
>>>>>>>>>>>>     >>> On Mon, Apr 8, 2013 at 3:42 PM, Victor Julien
>>>>>>>>>>>>     <victor at inliniac.net <mailto:victor at inliniac.net>>
>>>>>>>>>>>>     >>> wrote:
>>>>>>>>>>>>     >>> > (moving to oisf-devel)
>>>>>>>>>>>>     >>> >
>>>>>>>>>>>>     >>> > On 04/08/2013 06:17 AM, Anoop Saldanha wrote:
>>>>>>>>>>>>     >>> >>>> I recollect we introduced path and query double decoding
>>>>>>>>>>>>     through
>>>>>>>>>>>>     >>> >>>> configurable params, and also we had this thing with query
>>>>>>>>>>>>     >>> >>>> decoding(single level).  Can you explain a bit what the
>>>>>>>>>>>>     status was
>>>>>>>>>>>>     >>> >>>> previously.  Seeing related failed uts.
>>>>>>>>>>>>     >>> >>>>
>>>>>>>>>>>>     >>> >>>
>>>>>>>>>>>>     >>> >>> We run the path normalization on the query through our
>>>>>>>>>>>>     >>> >>> HTPCallbackRequestUriNormalizeQuery callback. Previously we
>>>>>>>>>>>> used
>>>>>>>>>>>>     >>> >>> htp_decode_path_inplace to normalize the query (e.g. for
>>>>>>>>>>>>     >>> >>> uridecoding).
>>>>>>>>>>>>     >>> >>> However, this was causing issues (remember that pcre "bug"
>>>>>>>>>>>> we
>>>>>>>>>>>>     >>> >>> discussed
>>>>>>>>>>>>     >>> >>> a while back, where http:// turned into http:/).
>>>>>>>>>>>>     >>> >>>
>>>>>>>>>>>>     >>> >>> In libhtp I copied htp_decode_path_inplace to
>>>>>>>>>>>>     >>> >>> htp_decode_query_inplace
>>>>>>>>>>>>     >>> >>> and also copied the config params and cfg funcs:
>>>>>>>>>>>>     >>> >>>
>>>>>>>>>>>>     >>> >>>
>>>>>>>>>>>>
>>>>>>>>>>>> https://github.com/inliniac/suricata/commit/d41c762689a08e6814dc93e8bfebeceab97175c3
>>>>>>>>>>>>     >>> >>>
>>>>>>>>>>>>     >>> >>> Hack of the 1st order, which is wrong in many ways. But it
>>>>>>>>>>>>     basically
>>>>>>>>>>>>     >>> >>> allowed me to make sure we don't normalize the query as if
>>>>>>>>>>>>     it's path,
>>>>>>>>>>>>     >>> >>> esp with turning ftp:// into ftp:/ and such.
>>>>>>>>>>>>     >>> >>>
>>>>>>>>>>>>     >>> >>> For 0.5 integration I think we need a proper solution. The
>>>>>>>>>>>> only
>>>>>>>>>>>>     >>> >>> reason I
>>>>>>>>>>>>     >>> >>> pushed my hack like this was that I knew in 0.5 we would
>>>>>>>>>>>>     make things
>>>>>>>>>>>>     >>> >>> right.
>>>>>>>>>>>>     >>> >>>
>>>>>>>>>>>>     >>> >>
>>>>>>>>>>>>     >>> >> I think if we still want to double decode, we still require
>>>>>>>>>>>>     all of
>>>>>>>>>>>>     >>> >> these above things from our bundled htp.
>>>>>>>>>>>>     >>> >>
>>>>>>>>>>>>     >>> >> -----
>>>>>>>>>>>>     >>> >>
>>>>>>>>>>>>     >>> >> In 0.5.x, tx->request_uri_normalized has been removed, and
>>>>>>>>>>>>     we'd now
>>>>>>>>>>>>     >>> >> have to use the REQUEST_URI hook.  We'll have to carry out
>>>>>>>>>>>> the
>>>>>>>>>>>>     >>> >> reconstruction ourselves, and store it ourselves in our
>>>>>>>>>>>> HTPState.
>>>>>>>>>>>>     >>> >>
>>>>>>>>>>>>     >>>
>>>>>>>>>>>>     >>> What are your thoughts on this?
>>>>>>>>>>>>     >>>
>>>>>>>>>>>>     >>> >
>>>>>>>>>>>>     >>> > IIRC there is some function in libhtp that does just the
>>>>>>>>>>>>     decoding of
>>>>>>>>>>>>     >>> > uriencoding and unicode. We should probably just use that on
>>>>>>>>>>>>     the query
>>>>>>>>>>>>     >>> > and do the full normalization on the path.
>>>>>>>>>>>>     >>> >
>>>>>>>>>>>>     >>> > As a side thought: I think it would be nice to store path and
>>>>>>>>>>>>     query
>>>>>>>>>>>>     >>> > separately so that we can add http_path and http_query
>>>>>>>>>>>>     keywords later
>>>>>>>>>>>>     >>> > on.
>>>>>>>>>>>>     >>> >
>>>>>>>>>>>>     >>>
>>>>>>>>>>>>     >>> We'd pretty much extract it directly from parsed_uri.  Will
>>>>>>>>>>>> have to
>>>>>>>>>>>>     >>> check if we need the extract double decode phase we have
>>>>>>>>>>>>     currently in
>>>>>>>>>>>>     >>> our bundled htp, in which case we'd need to store them
>>>>>>>>>>>> separately.
>>>>>>>>>>>>     >>>
>>>>>>>>>>>>     >>
>>>>>>>>>>>>     >> Yes, all the normalized components are in tx->parsed_uri.  This
>>>>>>>>>>>>     is what is
>>>>>>>>>>>>     >> used in ironbee to expose all the various parts like
>>>>>>>>>>>>     tx->parsed_uri->path
>>>>>>>>>>>>     >> and tx->parsed_uri->query.
>>>>>>>>>>>>     >>
>>>>>>>>>>>>     >> Also note that the hostname should now be obtained from
>>>>>>>>>>>>     >> tx->request_hostname in 0.5.
>>>>>>>>>>>>     >>
>>>>>>>>>>>>     >> -B
>>>>>>>>>>>>     >
>>>>>>>>>>>>     >
>>>>>>>>>>>>     > FYI, for an example using libhtp 0.5 see ironbee code.  This was
>>>>>>>>>>>> all
>>>>>>>>>>>>     > recently updated for 0.5.
>>>>>>>>>>>>     >
>>>>>>>>>>>>     > https://github.com/ironbee/ironbee/blob/0.7.x/modules/modhtp.c
>>>>>>>>>>>>     >
>>>>>>>>>>>>
>>>>>>>>>>>>     Will have a look.  Thanks.
>>>>>>>>>>>>
>>>>>>>>>>>>     Previously we would use tx->connp->conn->transactions to access txs
>>>>>>>>>>>>     in the state.  Now that htp_connp_t is an opaque pointer how do I
>>>>>>>>>>>>     access the txs? Tried locating helper functions to retrieve it, but
>>>>>>>>>>>> I
>>>>>>>>>>>>     didn't find any.
>>>>>>>>>>>>
>>>>>>>>>>>> It's an oversight that there isn't a helper function to retrieve
>>>>>>>>>>>> transactions on a connections. I will add one tomorrow.
>>>>>>>>>>>>
>>>>>>>>>>>> Having said that, what is your use case that you require to retrieve
>>>>>>>>>>>> transactions? I thought your code was driven by the callbacks, which >
>>>>>>>>>>>> all
>>>>>>>>>>>> come with a tx instance (via connp)? For my education, can you explain
>>>>>>>>>>>> how you process connection data?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> One of the things that we don't do out of the callbacks is logging the
>>>>>>>>>>> requests. This is one of the things we need access to the TX store for.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> And to add to it, since we already have the txs stored in a list
>>>>>>>>>> inside libhtp, re-buffering the txs would come as a redundant task,
>>>>>>>>>> from where I see it.
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Anoop Saldanha
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Suricata IDS Devel mailing list: oisf-devel at openinfosecfoundation.org
>>>>>>>>>> Site: http://suricata-ids.org | Participate:
>>>>>>>>>> http://suricata-ids.org/participate/
>>>>>>>>>> List: https://lists.openinfosecfoundation.org/mailman/listinfo/oisf-devel
>>>>>>>>>> Redmine: https://redmine.openinfosecfoundation.org/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Ivan Ristić
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Anoop Saldanha
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Anoop Saldanha
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Ivan Ristić
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Anoop Saldanha
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> ---------------------------------------------
>>> Victor Julien
>>> http://www.inliniac.net/
>>> PGP: http://www.inliniac.net/victorjulien.asc
>>> ---------------------------------------------
>>>
>>
>>
>>
>> --
>> -------------------------------
>> Anoop Saldanha
>> http://www.poona.me
>> -------------------------------
>
>
>
> --
> -------------------------------
> Anoop Saldanha
> http://www.poona.me
> -------------------------------



-- 
Ivan Ristić



More information about the Oisf-devel mailing list