[Oisf-devel] libhtp 0.5.x integration - bug 775
Anoop Saldanha
anoopsaldanha at gmail.com
Mon Jun 3 18:32:18 UTC 2013
On Mon, Jun 3, 2013 at 10:48 PM, Victor Julien <victor at inliniac.net> wrote:
> On 06/03/2013 06:07 PM, Anoop Saldanha wrote:
>> @Victor
>>
>> Since we need to store the normalized request uri in our htp_state, we
>> can probably figure out a solution that we can also reuse in dcerpc
>> for storing transactions.
>>
>> Probably a linked_list that stores the tx_id(tx id for the related
>> data) of it's head?
>
> Would it be an option to use the per tx HtpUserData that we use in the
> 0.2.x implementation for body tracking?
Yes, we can use it store all generated buffers.
>
>> On Wed, May 15, 2013 at 12:38 PM, Anoop Saldanha
>> <anoopsaldanha at gmail.com> wrote:
>>> Right. Thanks.
>>>
>>> On Wed, May 15, 2013 at 12:18 PM, Ivan Ristic <ivan.ristic at gmail.com> wrote:
>>>> On Wed, May 15, 2013 at 7:37 AM, Anoop Saldanha <anoopsaldanha at gmail.com> wrote:
>>>>> Ivan,
>>>>>
>>>>> I see the introduction of
>>>>>
>>>>> htp_tx_t *htp_connp_get_in_tx(const htp_connp_t *connp);
>>>>> htp_tx_t *htp_connp_get_out_tx(const htp_connp_t *connp);
>>>>>
>>>>> Which means I won't be able to retrieve individual txs?
>>>>
>>>> Those 2 functions will give you only the currently active request and
>>>> response, respectively. There can be one of each at any given time.
>>>>
>>>> With recent changes, callbacks are sent the correct tx, so the above
>>>> functions will rarely be needed when you're processing one transaction
>>>> at a time.
>>>>
>>>>
>>>>> I receive 5
>>>>> pipelined requests, so that would be 5 txs created. How do I retrieve
>>>>> the individual txs?
>>>>
>>>> The transactions are in htp_conn_t::transactions, which is a list. How
>>>> to access the htp_conn_t pointer depends on your setup. You probably
>>>> keep a pointer to connp somewhere in your context, and from there you
>>>> can get a connection using htp_connp_get_connection().
>>>>
>>>>
>>>>> On Thu, Apr 11, 2013 at 10:16 PM, Anoop Saldanha
>>>>> <anoopsaldanha at gmail.com> wrote:
>>>>>> On Thu, Apr 11, 2013 at 9:10 PM, Ivan Ristic <ivan.ristic at gmail.com> wrote:
>>>>>>> I wouldn't advise you to do any buffering anyhow.
>>>>>>> But I am curious if you're
>>>>>>> deleting transactions once you're done with them. Because, if you're not,
>>>>>>> you may be allocating a lot of memory (all tx instances) on long-lived HTTP
>>>>>>> connections.
>>>>>>>
>>>>>>
>>>>>> We do delete them, once we're done.
>>>>>>
>>>>>>>
>>>>>>> On Tue, Apr 9, 2013 at 1:06 PM, Anoop Saldanha <anoopsaldanha at gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> On Tue, Apr 9, 2013 at 2:36 PM, Victor Julien <victor at inliniac.net> wrote:
>>>>>>>>> (bad juju to brian and ivan for top posting and/or html emails! :)
>>>>>>>>>
>>>>>>>>> On 04/09/2013 10:21 AM, Ivan Ristic wrote:
>>>>>>>>>> On Mon, Apr 8, 2013 at 3:54 PM, Anoop Saldanha <anoopsaldanha at gmail.com
>>>>>>>>>> <mailto:anoopsaldanha at gmail.com>> wrote:
>>>>>>>>>>
>>>>>>>>>> On Mon, Apr 8, 2013 at 7:50 PM, Brian Rectanus <brectanu at gmail.com
>>>>>>>>>> <mailto:brectanu at gmail.com>> wrote:
>>>>>>>>>> >
>>>>>>>>>> > On Mon, Apr 8, 2013 at 9:16 AM, Brian Rectanus
>>>>>>>>>> <brectanu at gmail.com
>>>>>>>>>> <mailto:brectanu at gmail.com>> wrote:
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >> On Mon, Apr 8, 2013 at 8:47 AM, Anoop Saldanha
>>>>>>>>>> <anoopsaldanha at gmail.com <mailto:anoopsaldanha at gmail.com>>
>>>>>>>>>> >> wrote:
>>>>>>>>>> >>>
>>>>>>>>>> >>> On Mon, Apr 8, 2013 at 3:42 PM, Victor Julien
>>>>>>>>>> <victor at inliniac.net <mailto:victor at inliniac.net>>
>>>>>>>>>> >>> wrote:
>>>>>>>>>> >>> > (moving to oisf-devel)
>>>>>>>>>> >>> >
>>>>>>>>>> >>> > On 04/08/2013 06:17 AM, Anoop Saldanha wrote:
>>>>>>>>>> >>> >>>> I recollect we introduced path and query double decoding
>>>>>>>>>> through
>>>>>>>>>> >>> >>>> configurable params, and also we had this thing with query
>>>>>>>>>> >>> >>>> decoding(single level). Can you explain a bit what the
>>>>>>>>>> status was
>>>>>>>>>> >>> >>>> previously. Seeing related failed uts.
>>>>>>>>>> >>> >>>>
>>>>>>>>>> >>> >>>
>>>>>>>>>> >>> >>> We run the path normalization on the query through our
>>>>>>>>>> >>> >>> HTPCallbackRequestUriNormalizeQuery callback. Previously we
>>>>>>>>>> used
>>>>>>>>>> >>> >>> htp_decode_path_inplace to normalize the query (e.g. for
>>>>>>>>>> >>> >>> uridecoding).
>>>>>>>>>> >>> >>> However, this was causing issues (remember that pcre "bug"
>>>>>>>>>> we
>>>>>>>>>> >>> >>> discussed
>>>>>>>>>> >>> >>> a while back, where http:// turned into http:/).
>>>>>>>>>> >>> >>>
>>>>>>>>>> >>> >>> In libhtp I copied htp_decode_path_inplace to
>>>>>>>>>> >>> >>> htp_decode_query_inplace
>>>>>>>>>> >>> >>> and also copied the config params and cfg funcs:
>>>>>>>>>> >>> >>>
>>>>>>>>>> >>> >>>
>>>>>>>>>>
>>>>>>>>>> https://github.com/inliniac/suricata/commit/d41c762689a08e6814dc93e8bfebeceab97175c3
>>>>>>>>>> >>> >>>
>>>>>>>>>> >>> >>> Hack of the 1st order, which is wrong in many ways. But it
>>>>>>>>>> basically
>>>>>>>>>> >>> >>> allowed me to make sure we don't normalize the query as if
>>>>>>>>>> it's path,
>>>>>>>>>> >>> >>> esp with turning ftp:// into ftp:/ and such.
>>>>>>>>>> >>> >>>
>>>>>>>>>> >>> >>> For 0.5 integration I think we need a proper solution. The
>>>>>>>>>> only
>>>>>>>>>> >>> >>> reason I
>>>>>>>>>> >>> >>> pushed my hack like this was that I knew in 0.5 we would
>>>>>>>>>> make things
>>>>>>>>>> >>> >>> right.
>>>>>>>>>> >>> >>>
>>>>>>>>>> >>> >>
>>>>>>>>>> >>> >> I think if we still want to double decode, we still require
>>>>>>>>>> all of
>>>>>>>>>> >>> >> these above things from our bundled htp.
>>>>>>>>>> >>> >>
>>>>>>>>>> >>> >> -----
>>>>>>>>>> >>> >>
>>>>>>>>>> >>> >> In 0.5.x, tx->request_uri_normalized has been removed, and
>>>>>>>>>> we'd now
>>>>>>>>>> >>> >> have to use the REQUEST_URI hook. We'll have to carry out
>>>>>>>>>> the
>>>>>>>>>> >>> >> reconstruction ourselves, and store it ourselves in our
>>>>>>>>>> HTPState.
>>>>>>>>>> >>> >>
>>>>>>>>>> >>>
>>>>>>>>>> >>> What are your thoughts on this?
>>>>>>>>>> >>>
>>>>>>>>>> >>> >
>>>>>>>>>> >>> > IIRC there is some function in libhtp that does just the
>>>>>>>>>> decoding of
>>>>>>>>>> >>> > uriencoding and unicode. We should probably just use that on
>>>>>>>>>> the query
>>>>>>>>>> >>> > and do the full normalization on the path.
>>>>>>>>>> >>> >
>>>>>>>>>> >>> > As a side thought: I think it would be nice to store path and
>>>>>>>>>> query
>>>>>>>>>> >>> > separately so that we can add http_path and http_query
>>>>>>>>>> keywords later
>>>>>>>>>> >>> > on.
>>>>>>>>>> >>> >
>>>>>>>>>> >>>
>>>>>>>>>> >>> We'd pretty much extract it directly from parsed_uri. Will
>>>>>>>>>> have to
>>>>>>>>>> >>> check if we need the extract double decode phase we have
>>>>>>>>>> currently in
>>>>>>>>>> >>> our bundled htp, in which case we'd need to store them
>>>>>>>>>> separately.
>>>>>>>>>> >>>
>>>>>>>>>> >>
>>>>>>>>>> >> Yes, all the normalized components are in tx->parsed_uri. This
>>>>>>>>>> is what is
>>>>>>>>>> >> used in ironbee to expose all the various parts like
>>>>>>>>>> tx->parsed_uri->path
>>>>>>>>>> >> and tx->parsed_uri->query.
>>>>>>>>>> >>
>>>>>>>>>> >> Also note that the hostname should now be obtained from
>>>>>>>>>> >> tx->request_hostname in 0.5.
>>>>>>>>>> >>
>>>>>>>>>> >> -B
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > FYI, for an example using libhtp 0.5 see ironbee code. This was
>>>>>>>>>> all
>>>>>>>>>> > recently updated for 0.5.
>>>>>>>>>> >
>>>>>>>>>> > https://github.com/ironbee/ironbee/blob/0.7.x/modules/modhtp.c
>>>>>>>>>> >
>>>>>>>>>>
>>>>>>>>>> Will have a look. Thanks.
>>>>>>>>>>
>>>>>>>>>> Previously we would use tx->connp->conn->transactions to access txs
>>>>>>>>>> in the state. Now that htp_connp_t is an opaque pointer how do I
>>>>>>>>>> access the txs? Tried locating helper functions to retrieve it, but
>>>>>>>>>> I
>>>>>>>>>> didn't find any.
>>>>>>>>>>
>>>>>>>>>> It's an oversight that there isn't a helper function to retrieve
>>>>>>>>>> transactions on a connections. I will add one tomorrow.
>>>>>>>>>>
>>>>>>>>>> Having said that, what is your use case that you require to retrieve
>>>>>>>>>> transactions? I thought your code was driven by the callbacks, which >
>>>>>>>>>> all
>>>>>>>>>> come with a tx instance (via connp)? For my education, can you explain
>>>>>>>>>> how you process connection data?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> One of the things that we don't do out of the callbacks is logging the
>>>>>>>>> requests. This is one of the things we need access to the TX store for.
>>>>>>>>>
>>>>>>>>
>>>>>>>> And to add to it, since we already have the txs stored in a list
>>>>>>>> inside libhtp, re-buffering the txs would come as a redundant task,
>>>>>>>> from where I see it.
>>>>>>>>
>>>>>>>> --
>>>>>>>> Anoop Saldanha
>>>>>>>> _______________________________________________
>>>>>>>> Suricata IDS Devel mailing list: oisf-devel at openinfosecfoundation.org
>>>>>>>> Site: http://suricata-ids.org | Participate:
>>>>>>>> http://suricata-ids.org/participate/
>>>>>>>> List: https://lists.openinfosecfoundation.org/mailman/listinfo/oisf-devel
>>>>>>>> Redmine: https://redmine.openinfosecfoundation.org/
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Ivan Ristić
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Anoop Saldanha
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Anoop Saldanha
>>>>
>>>>
>>>>
>>>> --
>>>> Ivan Ristić
>>>
>>>
>>>
>>> --
>>> Anoop Saldanha
>>
>>
>>
>
>
> --
> ---------------------------------------------
> Victor Julien
> http://www.inliniac.net/
> PGP: http://www.inliniac.net/victorjulien.asc
> ---------------------------------------------
>
--
-------------------------------
Anoop Saldanha
http://www.poona.me
-------------------------------
More information about the Oisf-devel
mailing list