[Oisf-devel] libhtp 0.5.x integration - bug 775
Ivan Ristic
ivan.ristic at gmail.com
Mon Jun 10 12:22:32 UTC 2013
Yes, thanks for noticing. I've removed it now.
On Mon, Jun 10, 2013 at 12:48 PM, Anoop Saldanha
<anoopsaldanha at gmail.com> wrote:
> Ivan,
>
> Now that libhtp 0.5.x doesn't generate the normalized request uri
> anymore, htp_config_set_generate_request_uri_normalized() should
> probably be removed?
>
> On Fri, Jun 7, 2013 at 8:21 PM, Anoop Saldanha <anoopsaldanha at gmail.com> wrote:
>> Ivan,
>>
>> When a request such as "HELLO\r\n", libhtp would have the
>> "request_protocol_number" set as HTP_PROTOCOL_0_9. Is that right or
>> should it be HTP_PROTOCOL_UNKNOWN?
>>
>> On Tue, Jun 4, 2013 at 12:02 AM, Anoop Saldanha <anoopsaldanha at gmail.com> wrote:
>>> On Mon, Jun 3, 2013 at 10:48 PM, Victor Julien <victor at inliniac.net> wrote:
>>>> On 06/03/2013 06:07 PM, Anoop Saldanha wrote:
>>>>> @Victor
>>>>>
>>>>> Since we need to store the normalized request uri in our htp_state, we
>>>>> can probably figure out a solution that we can also reuse in dcerpc
>>>>> for storing transactions.
>>>>>
>>>>> Probably a linked_list that stores the tx_id(tx id for the related
>>>>> data) of it's head?
>>>>
>>>> Would it be an option to use the per tx HtpUserData that we use in the
>>>> 0.2.x implementation for body tracking?
>>>
>>> Yes, we can use it store all generated buffers.
>>>
>>>>
>>>>> On Wed, May 15, 2013 at 12:38 PM, Anoop Saldanha
>>>>> <anoopsaldanha at gmail.com> wrote:
>>>>>> Right. Thanks.
>>>>>>
>>>>>> On Wed, May 15, 2013 at 12:18 PM, Ivan Ristic <ivan.ristic at gmail.com> wrote:
>>>>>>> On Wed, May 15, 2013 at 7:37 AM, Anoop Saldanha <anoopsaldanha at gmail.com> wrote:
>>>>>>>> Ivan,
>>>>>>>>
>>>>>>>> I see the introduction of
>>>>>>>>
>>>>>>>> htp_tx_t *htp_connp_get_in_tx(const htp_connp_t *connp);
>>>>>>>> htp_tx_t *htp_connp_get_out_tx(const htp_connp_t *connp);
>>>>>>>>
>>>>>>>> Which means I won't be able to retrieve individual txs?
>>>>>>>
>>>>>>> Those 2 functions will give you only the currently active request and
>>>>>>> response, respectively. There can be one of each at any given time.
>>>>>>>
>>>>>>> With recent changes, callbacks are sent the correct tx, so the above
>>>>>>> functions will rarely be needed when you're processing one transaction
>>>>>>> at a time.
>>>>>>>
>>>>>>>
>>>>>>>> I receive 5
>>>>>>>> pipelined requests, so that would be 5 txs created. How do I retrieve
>>>>>>>> the individual txs?
>>>>>>>
>>>>>>> The transactions are in htp_conn_t::transactions, which is a list. How
>>>>>>> to access the htp_conn_t pointer depends on your setup. You probably
>>>>>>> keep a pointer to connp somewhere in your context, and from there you
>>>>>>> can get a connection using htp_connp_get_connection().
>>>>>>>
>>>>>>>
>>>>>>>> On Thu, Apr 11, 2013 at 10:16 PM, Anoop Saldanha
>>>>>>>> <anoopsaldanha at gmail.com> wrote:
>>>>>>>>> On Thu, Apr 11, 2013 at 9:10 PM, Ivan Ristic <ivan.ristic at gmail.com> wrote:
>>>>>>>>>> I wouldn't advise you to do any buffering anyhow.
>>>>>>>>>> But I am curious if you're
>>>>>>>>>> deleting transactions once you're done with them. Because, if you're not,
>>>>>>>>>> you may be allocating a lot of memory (all tx instances) on long-lived HTTP
>>>>>>>>>> connections.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> We do delete them, once we're done.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Apr 9, 2013 at 1:06 PM, Anoop Saldanha <anoopsaldanha at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Apr 9, 2013 at 2:36 PM, Victor Julien <victor at inliniac.net> wrote:
>>>>>>>>>>>> (bad juju to brian and ivan for top posting and/or html emails! :)
>>>>>>>>>>>>
>>>>>>>>>>>> On 04/09/2013 10:21 AM, Ivan Ristic wrote:
>>>>>>>>>>>>> On Mon, Apr 8, 2013 at 3:54 PM, Anoop Saldanha <anoopsaldanha at gmail.com
>>>>>>>>>>>>> <mailto:anoopsaldanha at gmail.com>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Apr 8, 2013 at 7:50 PM, Brian Rectanus <brectanu at gmail.com
>>>>>>>>>>>>> <mailto:brectanu at gmail.com>> wrote:
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > On Mon, Apr 8, 2013 at 9:16 AM, Brian Rectanus
>>>>>>>>>>>>> <brectanu at gmail.com
>>>>>>>>>>>>> <mailto:brectanu at gmail.com>> wrote:
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >> On Mon, Apr 8, 2013 at 8:47 AM, Anoop Saldanha
>>>>>>>>>>>>> <anoopsaldanha at gmail.com <mailto:anoopsaldanha at gmail.com>>
>>>>>>>>>>>>> >> wrote:
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> >>> On Mon, Apr 8, 2013 at 3:42 PM, Victor Julien
>>>>>>>>>>>>> <victor at inliniac.net <mailto:victor at inliniac.net>>
>>>>>>>>>>>>> >>> wrote:
>>>>>>>>>>>>> >>> > (moving to oisf-devel)
>>>>>>>>>>>>> >>> >
>>>>>>>>>>>>> >>> > On 04/08/2013 06:17 AM, Anoop Saldanha wrote:
>>>>>>>>>>>>> >>> >>>> I recollect we introduced path and query double decoding
>>>>>>>>>>>>> through
>>>>>>>>>>>>> >>> >>>> configurable params, and also we had this thing with query
>>>>>>>>>>>>> >>> >>>> decoding(single level). Can you explain a bit what the
>>>>>>>>>>>>> status was
>>>>>>>>>>>>> >>> >>>> previously. Seeing related failed uts.
>>>>>>>>>>>>> >>> >>>>
>>>>>>>>>>>>> >>> >>>
>>>>>>>>>>>>> >>> >>> We run the path normalization on the query through our
>>>>>>>>>>>>> >>> >>> HTPCallbackRequestUriNormalizeQuery callback. Previously we
>>>>>>>>>>>>> used
>>>>>>>>>>>>> >>> >>> htp_decode_path_inplace to normalize the query (e.g. for
>>>>>>>>>>>>> >>> >>> uridecoding).
>>>>>>>>>>>>> >>> >>> However, this was causing issues (remember that pcre "bug"
>>>>>>>>>>>>> we
>>>>>>>>>>>>> >>> >>> discussed
>>>>>>>>>>>>> >>> >>> a while back, where http:// turned into http:/).
>>>>>>>>>>>>> >>> >>>
>>>>>>>>>>>>> >>> >>> In libhtp I copied htp_decode_path_inplace to
>>>>>>>>>>>>> >>> >>> htp_decode_query_inplace
>>>>>>>>>>>>> >>> >>> and also copied the config params and cfg funcs:
>>>>>>>>>>>>> >>> >>>
>>>>>>>>>>>>> >>> >>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://github.com/inliniac/suricata/commit/d41c762689a08e6814dc93e8bfebeceab97175c3
>>>>>>>>>>>>> >>> >>>
>>>>>>>>>>>>> >>> >>> Hack of the 1st order, which is wrong in many ways. But it
>>>>>>>>>>>>> basically
>>>>>>>>>>>>> >>> >>> allowed me to make sure we don't normalize the query as if
>>>>>>>>>>>>> it's path,
>>>>>>>>>>>>> >>> >>> esp with turning ftp:// into ftp:/ and such.
>>>>>>>>>>>>> >>> >>>
>>>>>>>>>>>>> >>> >>> For 0.5 integration I think we need a proper solution. The
>>>>>>>>>>>>> only
>>>>>>>>>>>>> >>> >>> reason I
>>>>>>>>>>>>> >>> >>> pushed my hack like this was that I knew in 0.5 we would
>>>>>>>>>>>>> make things
>>>>>>>>>>>>> >>> >>> right.
>>>>>>>>>>>>> >>> >>>
>>>>>>>>>>>>> >>> >>
>>>>>>>>>>>>> >>> >> I think if we still want to double decode, we still require
>>>>>>>>>>>>> all of
>>>>>>>>>>>>> >>> >> these above things from our bundled htp.
>>>>>>>>>>>>> >>> >>
>>>>>>>>>>>>> >>> >> -----
>>>>>>>>>>>>> >>> >>
>>>>>>>>>>>>> >>> >> In 0.5.x, tx->request_uri_normalized has been removed, and
>>>>>>>>>>>>> we'd now
>>>>>>>>>>>>> >>> >> have to use the REQUEST_URI hook. We'll have to carry out
>>>>>>>>>>>>> the
>>>>>>>>>>>>> >>> >> reconstruction ourselves, and store it ourselves in our
>>>>>>>>>>>>> HTPState.
>>>>>>>>>>>>> >>> >>
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> >>> What are your thoughts on this?
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> >>> >
>>>>>>>>>>>>> >>> > IIRC there is some function in libhtp that does just the
>>>>>>>>>>>>> decoding of
>>>>>>>>>>>>> >>> > uriencoding and unicode. We should probably just use that on
>>>>>>>>>>>>> the query
>>>>>>>>>>>>> >>> > and do the full normalization on the path.
>>>>>>>>>>>>> >>> >
>>>>>>>>>>>>> >>> > As a side thought: I think it would be nice to store path and
>>>>>>>>>>>>> query
>>>>>>>>>>>>> >>> > separately so that we can add http_path and http_query
>>>>>>>>>>>>> keywords later
>>>>>>>>>>>>> >>> > on.
>>>>>>>>>>>>> >>> >
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> >>> We'd pretty much extract it directly from parsed_uri. Will
>>>>>>>>>>>>> have to
>>>>>>>>>>>>> >>> check if we need the extract double decode phase we have
>>>>>>>>>>>>> currently in
>>>>>>>>>>>>> >>> our bundled htp, in which case we'd need to store them
>>>>>>>>>>>>> separately.
>>>>>>>>>>>>> >>>
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >> Yes, all the normalized components are in tx->parsed_uri. This
>>>>>>>>>>>>> is what is
>>>>>>>>>>>>> >> used in ironbee to expose all the various parts like
>>>>>>>>>>>>> tx->parsed_uri->path
>>>>>>>>>>>>> >> and tx->parsed_uri->query.
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >> Also note that the hostname should now be obtained from
>>>>>>>>>>>>> >> tx->request_hostname in 0.5.
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >> -B
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > FYI, for an example using libhtp 0.5 see ironbee code. This was
>>>>>>>>>>>>> all
>>>>>>>>>>>>> > recently updated for 0.5.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > https://github.com/ironbee/ironbee/blob/0.7.x/modules/modhtp.c
>>>>>>>>>>>>> >
>>>>>>>>>>>>>
>>>>>>>>>>>>> Will have a look. Thanks.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Previously we would use tx->connp->conn->transactions to access txs
>>>>>>>>>>>>> in the state. Now that htp_connp_t is an opaque pointer how do I
>>>>>>>>>>>>> access the txs? Tried locating helper functions to retrieve it, but
>>>>>>>>>>>>> I
>>>>>>>>>>>>> didn't find any.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It's an oversight that there isn't a helper function to retrieve
>>>>>>>>>>>>> transactions on a connections. I will add one tomorrow.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Having said that, what is your use case that you require to retrieve
>>>>>>>>>>>>> transactions? I thought your code was driven by the callbacks, which >
>>>>>>>>>>>>> all
>>>>>>>>>>>>> come with a tx instance (via connp)? For my education, can you explain
>>>>>>>>>>>>> how you process connection data?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> One of the things that we don't do out of the callbacks is logging the
>>>>>>>>>>>> requests. This is one of the things we need access to the TX store for.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> And to add to it, since we already have the txs stored in a list
>>>>>>>>>>> inside libhtp, re-buffering the txs would come as a redundant task,
>>>>>>>>>>> from where I see it.
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Anoop Saldanha
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Suricata IDS Devel mailing list: oisf-devel at openinfosecfoundation.org
>>>>>>>>>>> Site: http://suricata-ids.org | Participate:
>>>>>>>>>>> http://suricata-ids.org/participate/
>>>>>>>>>>> List: https://lists.openinfosecfoundation.org/mailman/listinfo/oisf-devel
>>>>>>>>>>> Redmine: https://redmine.openinfosecfoundation.org/
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Ivan Ristić
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Anoop Saldanha
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Anoop Saldanha
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Ivan Ristić
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Anoop Saldanha
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> ---------------------------------------------
>>>> Victor Julien
>>>> http://www.inliniac.net/
>>>> PGP: http://www.inliniac.net/victorjulien.asc
>>>> ---------------------------------------------
>>>>
>>>
>>>
>>>
>>> --
>>> -------------------------------
>>> Anoop Saldanha
>>> http://www.poona.me
>>> -------------------------------
>>
>>
>>
>> --
>> -------------------------------
>> Anoop Saldanha
>> http://www.poona.me
>> -------------------------------
>
>
>
> --
> -------------------------------
> Anoop Saldanha
> http://www.poona.me
> -------------------------------
--
Ivan Ristić
More information about the Oisf-devel
mailing list