[Oisf-devel] libhtp 0.5.x integration - bug 775

Anoop Saldanha anoopsaldanha at gmail.com
Mon Jun 10 14:06:45 UTC 2013


@Ivan

I am using htp_unparse_uri_noencode() to retrieve the reconstructed
normalized uri.

1. Should we enable url query normalization using
htp_config_register_urlencode_parser()?
2. If (1) is true, shouldn't the value stored by (1) in
tx->request_params be used by htp_unparse_uri_noencode() while
reconstructing the normalized uri?
3. Suricata provides conf settings to enable double decoding for path
and query.  We do this by hooking into the request line callback and
calling htp_decode_path_inplace() and htp_decode_query_inplace().  Do
we have any internal support from libhtp for decoding double encoded
characters?

@ Victor.

We enabled some features here to decode path in query here -
https://github.com/inliniac/suricata/commit/d41c762689a08e6814dc93e8bfebeceab97175c3

If a query parameter is an uri, should be decode the uri?  It's an
argument and from the user perspective we don't know how the user
treats the query argument internally .  So some of the settings wrt
decoding a path in a query may not make sense?

On Mon, Jun 10, 2013 at 5:52 PM, Ivan Ristic <ivan.ristic at gmail.com> wrote:
> Yes, thanks for noticing. I've removed it now.
>
>
> On Mon, Jun 10, 2013 at 12:48 PM, Anoop Saldanha
> <anoopsaldanha at gmail.com> wrote:
>> Ivan,
>>
>> Now that libhtp 0.5.x doesn't generate the normalized request uri
>> anymore, htp_config_set_generate_request_uri_normalized() should
>> probably be removed?
>>
>> On Fri, Jun 7, 2013 at 8:21 PM, Anoop Saldanha <anoopsaldanha at gmail.com> wrote:
>>> Ivan,
>>>
>>> When a request such as "HELLO\r\n", libhtp would have the
>>> "request_protocol_number" set as HTP_PROTOCOL_0_9.  Is that right or
>>> should it be HTP_PROTOCOL_UNKNOWN?
>>>
>>> On Tue, Jun 4, 2013 at 12:02 AM, Anoop Saldanha <anoopsaldanha at gmail.com> wrote:
>>>> On Mon, Jun 3, 2013 at 10:48 PM, Victor Julien <victor at inliniac.net> wrote:
>>>>> On 06/03/2013 06:07 PM, Anoop Saldanha wrote:
>>>>>> @Victor
>>>>>>
>>>>>> Since we need to store the normalized request uri in our htp_state, we
>>>>>> can probably figure out a solution that we can also reuse in dcerpc
>>>>>> for storing transactions.
>>>>>>
>>>>>> Probably a linked_list that stores the tx_id(tx id for the related
>>>>>> data) of it's head?
>>>>>
>>>>> Would it be an option to use the per tx HtpUserData that we use in the
>>>>> 0.2.x implementation for body tracking?
>>>>
>>>> Yes, we can use it store all generated buffers.
>>>>
>>>>>
>>>>>> On Wed, May 15, 2013 at 12:38 PM, Anoop Saldanha
>>>>>> <anoopsaldanha at gmail.com> wrote:
>>>>>>> Right.  Thanks.
>>>>>>>
>>>>>>> On Wed, May 15, 2013 at 12:18 PM, Ivan Ristic <ivan.ristic at gmail.com> wrote:
>>>>>>>> On Wed, May 15, 2013 at 7:37 AM, Anoop Saldanha <anoopsaldanha at gmail.com> wrote:
>>>>>>>>> Ivan,
>>>>>>>>>
>>>>>>>>> I see the introduction of
>>>>>>>>>
>>>>>>>>> htp_tx_t *htp_connp_get_in_tx(const htp_connp_t *connp);
>>>>>>>>> htp_tx_t *htp_connp_get_out_tx(const htp_connp_t *connp);
>>>>>>>>>
>>>>>>>>> Which means I won't be able to retrieve individual txs?
>>>>>>>>
>>>>>>>> Those 2 functions will give you only the currently active request and
>>>>>>>> response, respectively. There can be one of each at any given time.
>>>>>>>>
>>>>>>>> With recent changes, callbacks are sent the correct tx, so the above
>>>>>>>> functions will rarely be needed when you're processing one transaction
>>>>>>>> at a time.
>>>>>>>>
>>>>>>>>
>>>>>>>>> I receive 5
>>>>>>>>> pipelined requests, so that would be 5 txs created.  How do I retrieve
>>>>>>>>> the individual txs?
>>>>>>>>
>>>>>>>> The transactions are in htp_conn_t::transactions, which is a list. How
>>>>>>>> to access the htp_conn_t pointer depends on your setup. You probably
>>>>>>>> keep a pointer to connp somewhere in your context, and from there you
>>>>>>>> can get a connection using htp_connp_get_connection().
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Thu, Apr 11, 2013 at 10:16 PM, Anoop Saldanha
>>>>>>>>> <anoopsaldanha at gmail.com> wrote:
>>>>>>>>>> On Thu, Apr 11, 2013 at 9:10 PM, Ivan Ristic <ivan.ristic at gmail.com> wrote:
>>>>>>>>>>> I wouldn't advise you to do any buffering anyhow.
>>>>>>>>>>> But I am curious if you're
>>>>>>>>>>> deleting transactions once you're done with them. Because, if you're not,
>>>>>>>>>>> you may be allocating a lot of memory (all tx instances) on long-lived HTTP
>>>>>>>>>>> connections.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> We do delete them, once we're done.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Apr 9, 2013 at 1:06 PM, Anoop Saldanha <anoopsaldanha at gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Apr 9, 2013 at 2:36 PM, Victor Julien <victor at inliniac.net> wrote:
>>>>>>>>>>>>> (bad juju to brian and ivan for top posting and/or html emails! :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 04/09/2013 10:21 AM, Ivan Ristic wrote:
>>>>>>>>>>>>>> On Mon, Apr 8, 2013 at 3:54 PM, Anoop Saldanha <anoopsaldanha at gmail.com
>>>>>>>>>>>>>> <mailto:anoopsaldanha at gmail.com>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     On Mon, Apr 8, 2013 at 7:50 PM, Brian Rectanus <brectanu at gmail.com
>>>>>>>>>>>>>>     <mailto:brectanu at gmail.com>> wrote:
>>>>>>>>>>>>>>     >
>>>>>>>>>>>>>>     > On Mon, Apr 8, 2013 at 9:16 AM, Brian Rectanus
>>>>>>>>>>>>>> <brectanu at gmail.com
>>>>>>>>>>>>>>     <mailto:brectanu at gmail.com>> wrote:
>>>>>>>>>>>>>>     >>
>>>>>>>>>>>>>>     >>
>>>>>>>>>>>>>>     >> On Mon, Apr 8, 2013 at 8:47 AM, Anoop Saldanha
>>>>>>>>>>>>>>     <anoopsaldanha at gmail.com <mailto:anoopsaldanha at gmail.com>>
>>>>>>>>>>>>>>     >> wrote:
>>>>>>>>>>>>>>     >>>
>>>>>>>>>>>>>>     >>> On Mon, Apr 8, 2013 at 3:42 PM, Victor Julien
>>>>>>>>>>>>>>     <victor at inliniac.net <mailto:victor at inliniac.net>>
>>>>>>>>>>>>>>     >>> wrote:
>>>>>>>>>>>>>>     >>> > (moving to oisf-devel)
>>>>>>>>>>>>>>     >>> >
>>>>>>>>>>>>>>     >>> > On 04/08/2013 06:17 AM, Anoop Saldanha wrote:
>>>>>>>>>>>>>>     >>> >>>> I recollect we introduced path and query double decoding
>>>>>>>>>>>>>>     through
>>>>>>>>>>>>>>     >>> >>>> configurable params, and also we had this thing with query
>>>>>>>>>>>>>>     >>> >>>> decoding(single level).  Can you explain a bit what the
>>>>>>>>>>>>>>     status was
>>>>>>>>>>>>>>     >>> >>>> previously.  Seeing related failed uts.
>>>>>>>>>>>>>>     >>> >>>>
>>>>>>>>>>>>>>     >>> >>>
>>>>>>>>>>>>>>     >>> >>> We run the path normalization on the query through our
>>>>>>>>>>>>>>     >>> >>> HTPCallbackRequestUriNormalizeQuery callback. Previously we
>>>>>>>>>>>>>> used
>>>>>>>>>>>>>>     >>> >>> htp_decode_path_inplace to normalize the query (e.g. for
>>>>>>>>>>>>>>     >>> >>> uridecoding).
>>>>>>>>>>>>>>     >>> >>> However, this was causing issues (remember that pcre "bug"
>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>     >>> >>> discussed
>>>>>>>>>>>>>>     >>> >>> a while back, where http:// turned into http:/).
>>>>>>>>>>>>>>     >>> >>>
>>>>>>>>>>>>>>     >>> >>> In libhtp I copied htp_decode_path_inplace to
>>>>>>>>>>>>>>     >>> >>> htp_decode_query_inplace
>>>>>>>>>>>>>>     >>> >>> and also copied the config params and cfg funcs:
>>>>>>>>>>>>>>     >>> >>>
>>>>>>>>>>>>>>     >>> >>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://github.com/inliniac/suricata/commit/d41c762689a08e6814dc93e8bfebeceab97175c3
>>>>>>>>>>>>>>     >>> >>>
>>>>>>>>>>>>>>     >>> >>> Hack of the 1st order, which is wrong in many ways. But it
>>>>>>>>>>>>>>     basically
>>>>>>>>>>>>>>     >>> >>> allowed me to make sure we don't normalize the query as if
>>>>>>>>>>>>>>     it's path,
>>>>>>>>>>>>>>     >>> >>> esp with turning ftp:// into ftp:/ and such.
>>>>>>>>>>>>>>     >>> >>>
>>>>>>>>>>>>>>     >>> >>> For 0.5 integration I think we need a proper solution. The
>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>     >>> >>> reason I
>>>>>>>>>>>>>>     >>> >>> pushed my hack like this was that I knew in 0.5 we would
>>>>>>>>>>>>>>     make things
>>>>>>>>>>>>>>     >>> >>> right.
>>>>>>>>>>>>>>     >>> >>>
>>>>>>>>>>>>>>     >>> >>
>>>>>>>>>>>>>>     >>> >> I think if we still want to double decode, we still require
>>>>>>>>>>>>>>     all of
>>>>>>>>>>>>>>     >>> >> these above things from our bundled htp.
>>>>>>>>>>>>>>     >>> >>
>>>>>>>>>>>>>>     >>> >> -----
>>>>>>>>>>>>>>     >>> >>
>>>>>>>>>>>>>>     >>> >> In 0.5.x, tx->request_uri_normalized has been removed, and
>>>>>>>>>>>>>>     we'd now
>>>>>>>>>>>>>>     >>> >> have to use the REQUEST_URI hook.  We'll have to carry out
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>     >>> >> reconstruction ourselves, and store it ourselves in our
>>>>>>>>>>>>>> HTPState.
>>>>>>>>>>>>>>     >>> >>
>>>>>>>>>>>>>>     >>>
>>>>>>>>>>>>>>     >>> What are your thoughts on this?
>>>>>>>>>>>>>>     >>>
>>>>>>>>>>>>>>     >>> >
>>>>>>>>>>>>>>     >>> > IIRC there is some function in libhtp that does just the
>>>>>>>>>>>>>>     decoding of
>>>>>>>>>>>>>>     >>> > uriencoding and unicode. We should probably just use that on
>>>>>>>>>>>>>>     the query
>>>>>>>>>>>>>>     >>> > and do the full normalization on the path.
>>>>>>>>>>>>>>     >>> >
>>>>>>>>>>>>>>     >>> > As a side thought: I think it would be nice to store path and
>>>>>>>>>>>>>>     query
>>>>>>>>>>>>>>     >>> > separately so that we can add http_path and http_query
>>>>>>>>>>>>>>     keywords later
>>>>>>>>>>>>>>     >>> > on.
>>>>>>>>>>>>>>     >>> >
>>>>>>>>>>>>>>     >>>
>>>>>>>>>>>>>>     >>> We'd pretty much extract it directly from parsed_uri.  Will
>>>>>>>>>>>>>> have to
>>>>>>>>>>>>>>     >>> check if we need the extract double decode phase we have
>>>>>>>>>>>>>>     currently in
>>>>>>>>>>>>>>     >>> our bundled htp, in which case we'd need to store them
>>>>>>>>>>>>>> separately.
>>>>>>>>>>>>>>     >>>
>>>>>>>>>>>>>>     >>
>>>>>>>>>>>>>>     >> Yes, all the normalized components are in tx->parsed_uri.  This
>>>>>>>>>>>>>>     is what is
>>>>>>>>>>>>>>     >> used in ironbee to expose all the various parts like
>>>>>>>>>>>>>>     tx->parsed_uri->path
>>>>>>>>>>>>>>     >> and tx->parsed_uri->query.
>>>>>>>>>>>>>>     >>
>>>>>>>>>>>>>>     >> Also note that the hostname should now be obtained from
>>>>>>>>>>>>>>     >> tx->request_hostname in 0.5.
>>>>>>>>>>>>>>     >>
>>>>>>>>>>>>>>     >> -B
>>>>>>>>>>>>>>     >
>>>>>>>>>>>>>>     >
>>>>>>>>>>>>>>     > FYI, for an example using libhtp 0.5 see ironbee code.  This was
>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>     > recently updated for 0.5.
>>>>>>>>>>>>>>     >
>>>>>>>>>>>>>>     > https://github.com/ironbee/ironbee/blob/0.7.x/modules/modhtp.c
>>>>>>>>>>>>>>     >
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     Will have a look.  Thanks.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     Previously we would use tx->connp->conn->transactions to access txs
>>>>>>>>>>>>>>     in the state.  Now that htp_connp_t is an opaque pointer how do I
>>>>>>>>>>>>>>     access the txs? Tried locating helper functions to retrieve it, but
>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>     didn't find any.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It's an oversight that there isn't a helper function to retrieve
>>>>>>>>>>>>>> transactions on a connections. I will add one tomorrow.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Having said that, what is your use case that you require to retrieve
>>>>>>>>>>>>>> transactions? I thought your code was driven by the callbacks, which >
>>>>>>>>>>>>>> all
>>>>>>>>>>>>>> come with a tx instance (via connp)? For my education, can you explain
>>>>>>>>>>>>>> how you process connection data?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> One of the things that we don't do out of the callbacks is logging the
>>>>>>>>>>>>> requests. This is one of the things we need access to the TX store for.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> And to add to it, since we already have the txs stored in a list
>>>>>>>>>>>> inside libhtp, re-buffering the txs would come as a redundant task,
>>>>>>>>>>>> from where I see it.
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Anoop Saldanha
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Suricata IDS Devel mailing list: oisf-devel at openinfosecfoundation.org
>>>>>>>>>>>> Site: http://suricata-ids.org | Participate:
>>>>>>>>>>>> http://suricata-ids.org/participate/
>>>>>>>>>>>> List: https://lists.openinfosecfoundation.org/mailman/listinfo/oisf-devel
>>>>>>>>>>>> Redmine: https://redmine.openinfosecfoundation.org/
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Ivan Ristić
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Anoop Saldanha
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Anoop Saldanha
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Ivan Ristić
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Anoop Saldanha
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ---------------------------------------------
>>>>> Victor Julien
>>>>> http://www.inliniac.net/
>>>>> PGP: http://www.inliniac.net/victorjulien.asc
>>>>> ---------------------------------------------
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> -------------------------------
>>>> Anoop Saldanha
>>>> http://www.poona.me
>>>> -------------------------------
>>>
>>>
>>>
>>> --
>>> -------------------------------
>>> Anoop Saldanha
>>> http://www.poona.me
>>> -------------------------------
>>
>>
>>
>> --
>> -------------------------------
>> Anoop Saldanha
>> http://www.poona.me
>> -------------------------------
>
>
>
> --
> Ivan Ristić



-- 
-------------------------------
Anoop Saldanha
http://www.poona.me
-------------------------------



More information about the Oisf-devel mailing list