[Oisf-devel] libhtp 0.5.x integration - bug 775

Ivan Ristic ivan.ristic at gmail.com
Wed Jun 12 10:12:58 UTC 2013


On Tue, Jun 11, 2013 at 3:45 PM, Anoop Saldanha <anoopsaldanha at gmail.com> wrote:
> Ivan,
>
> htp_parsers.c:149
>
>       connp->in_tx->request_auth_password = bstr_dup_ex(decoded, i +
> 1, bstr_len(decoded) - i - 1);
>       if (connp->in_tx->request_auth_password) {
>           bstr_free(decoded);
>           bstr_free(connp->in_tx->request_auth_username);
>           return HTP_ERROR;
>       }
>
> If we have a non-null auth password we error out on the stream.  Is
> this right or should that have been a NULL check?

Yes, it should have been an NULL check. Fixed now in 0.5.x.


> On Tue, Jun 11, 2013 at 5:11 PM, Anoop Saldanha <anoopsaldanha at gmail.com> wrote:
>> Ivan,
>>
>> Updating my point (2) from the previous mail, since
>> htp_unparse_uri_noencode() takes a htp_uri_t argument, I'd probaby
>> have to create a custom function that utilizes
>> htp_unparse_uri_noencode() + use tx->request_params while producing
>> the normalized uri.  Is there any other way to do this, while using
>> the normalized query to create the normalized uri?
>>
>> On Mon, Jun 10, 2013 at 7:36 PM, Anoop Saldanha <anoopsaldanha at gmail.com> wrote:
>>> @Ivan
>>>
>>> I am using htp_unparse_uri_noencode() to retrieve the reconstructed
>>> normalized uri.
>>>
>>> 1. Should we enable url query normalization using
>>> htp_config_register_urlencode_parser()?
>>> 2. If (1) is true, shouldn't the value stored by (1) in
>>> tx->request_params be used by htp_unparse_uri_noencode() while
>>> reconstructing the normalized uri?
>>> 3. Suricata provides conf settings to enable double decoding for path
>>> and query.  We do this by hooking into the request line callback and
>>> calling htp_decode_path_inplace() and htp_decode_query_inplace().  Do
>>> we have any internal support from libhtp for decoding double encoded
>>> characters?
>>>
>>> @ Victor.
>>>
>>> We enabled some features here to decode path in query here -
>>> https://github.com/inliniac/suricata/commit/d41c762689a08e6814dc93e8bfebeceab97175c3
>>>
>>> If a query parameter is an uri, should be decode the uri?  It's an
>>> argument and from the user perspective we don't know how the user
>>> treats the query argument internally .  So some of the settings wrt
>>> decoding a path in a query may not make sense?
>>>
>>> On Mon, Jun 10, 2013 at 5:52 PM, Ivan Ristic <ivan.ristic at gmail.com> wrote:
>>>> Yes, thanks for noticing. I've removed it now.
>>>>
>>>>
>>>> On Mon, Jun 10, 2013 at 12:48 PM, Anoop Saldanha
>>>> <anoopsaldanha at gmail.com> wrote:
>>>>> Ivan,
>>>>>
>>>>> Now that libhtp 0.5.x doesn't generate the normalized request uri
>>>>> anymore, htp_config_set_generate_request_uri_normalized() should
>>>>> probably be removed?
>>>>>
>>>>> On Fri, Jun 7, 2013 at 8:21 PM, Anoop Saldanha <anoopsaldanha at gmail.com> wrote:
>>>>>> Ivan,
>>>>>>
>>>>>> When a request such as "HELLO\r\n", libhtp would have the
>>>>>> "request_protocol_number" set as HTP_PROTOCOL_0_9.  Is that right or
>>>>>> should it be HTP_PROTOCOL_UNKNOWN?
>>>>>>
>>>>>> On Tue, Jun 4, 2013 at 12:02 AM, Anoop Saldanha <anoopsaldanha at gmail.com> wrote:
>>>>>>> On Mon, Jun 3, 2013 at 10:48 PM, Victor Julien <victor at inliniac.net> wrote:
>>>>>>>> On 06/03/2013 06:07 PM, Anoop Saldanha wrote:
>>>>>>>>> @Victor
>>>>>>>>>
>>>>>>>>> Since we need to store the normalized request uri in our htp_state, we
>>>>>>>>> can probably figure out a solution that we can also reuse in dcerpc
>>>>>>>>> for storing transactions.
>>>>>>>>>
>>>>>>>>> Probably a linked_list that stores the tx_id(tx id for the related
>>>>>>>>> data) of it's head?
>>>>>>>>
>>>>>>>> Would it be an option to use the per tx HtpUserData that we use in the
>>>>>>>> 0.2.x implementation for body tracking?
>>>>>>>
>>>>>>> Yes, we can use it store all generated buffers.
>>>>>>>
>>>>>>>>
>>>>>>>>> On Wed, May 15, 2013 at 12:38 PM, Anoop Saldanha
>>>>>>>>> <anoopsaldanha at gmail.com> wrote:
>>>>>>>>>> Right.  Thanks.
>>>>>>>>>>
>>>>>>>>>> On Wed, May 15, 2013 at 12:18 PM, Ivan Ristic <ivan.ristic at gmail.com> wrote:
>>>>>>>>>>> On Wed, May 15, 2013 at 7:37 AM, Anoop Saldanha <anoopsaldanha at gmail.com> wrote:
>>>>>>>>>>>> Ivan,
>>>>>>>>>>>>
>>>>>>>>>>>> I see the introduction of
>>>>>>>>>>>>
>>>>>>>>>>>> htp_tx_t *htp_connp_get_in_tx(const htp_connp_t *connp);
>>>>>>>>>>>> htp_tx_t *htp_connp_get_out_tx(const htp_connp_t *connp);
>>>>>>>>>>>>
>>>>>>>>>>>> Which means I won't be able to retrieve individual txs?
>>>>>>>>>>>
>>>>>>>>>>> Those 2 functions will give you only the currently active request and
>>>>>>>>>>> response, respectively. There can be one of each at any given time.
>>>>>>>>>>>
>>>>>>>>>>> With recent changes, callbacks are sent the correct tx, so the above
>>>>>>>>>>> functions will rarely be needed when you're processing one transaction
>>>>>>>>>>> at a time.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> I receive 5
>>>>>>>>>>>> pipelined requests, so that would be 5 txs created.  How do I retrieve
>>>>>>>>>>>> the individual txs?
>>>>>>>>>>>
>>>>>>>>>>> The transactions are in htp_conn_t::transactions, which is a list. How
>>>>>>>>>>> to access the htp_conn_t pointer depends on your setup. You probably
>>>>>>>>>>> keep a pointer to connp somewhere in your context, and from there you
>>>>>>>>>>> can get a connection using htp_connp_get_connection().
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Apr 11, 2013 at 10:16 PM, Anoop Saldanha
>>>>>>>>>>>> <anoopsaldanha at gmail.com> wrote:
>>>>>>>>>>>>> On Thu, Apr 11, 2013 at 9:10 PM, Ivan Ristic <ivan.ristic at gmail.com> wrote:
>>>>>>>>>>>>>> I wouldn't advise you to do any buffering anyhow.
>>>>>>>>>>>>>> But I am curious if you're
>>>>>>>>>>>>>> deleting transactions once you're done with them. Because, if you're not,
>>>>>>>>>>>>>> you may be allocating a lot of memory (all tx instances) on long-lived HTTP
>>>>>>>>>>>>>> connections.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> We do delete them, once we're done.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Apr 9, 2013 at 1:06 PM, Anoop Saldanha <anoopsaldanha at gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Apr 9, 2013 at 2:36 PM, Victor Julien <victor at inliniac.net> wrote:
>>>>>>>>>>>>>>>> (bad juju to brian and ivan for top posting and/or html emails! :)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 04/09/2013 10:21 AM, Ivan Ristic wrote:
>>>>>>>>>>>>>>>>> On Mon, Apr 8, 2013 at 3:54 PM, Anoop Saldanha <anoopsaldanha at gmail.com
>>>>>>>>>>>>>>>>> <mailto:anoopsaldanha at gmail.com>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     On Mon, Apr 8, 2013 at 7:50 PM, Brian Rectanus <brectanu at gmail.com
>>>>>>>>>>>>>>>>>     <mailto:brectanu at gmail.com>> wrote:
>>>>>>>>>>>>>>>>>     >
>>>>>>>>>>>>>>>>>     > On Mon, Apr 8, 2013 at 9:16 AM, Brian Rectanus
>>>>>>>>>>>>>>>>> <brectanu at gmail.com
>>>>>>>>>>>>>>>>>     <mailto:brectanu at gmail.com>> wrote:
>>>>>>>>>>>>>>>>>     >>
>>>>>>>>>>>>>>>>>     >>
>>>>>>>>>>>>>>>>>     >> On Mon, Apr 8, 2013 at 8:47 AM, Anoop Saldanha
>>>>>>>>>>>>>>>>>     <anoopsaldanha at gmail.com <mailto:anoopsaldanha at gmail.com>>
>>>>>>>>>>>>>>>>>     >> wrote:
>>>>>>>>>>>>>>>>>     >>>
>>>>>>>>>>>>>>>>>     >>> On Mon, Apr 8, 2013 at 3:42 PM, Victor Julien
>>>>>>>>>>>>>>>>>     <victor at inliniac.net <mailto:victor at inliniac.net>>
>>>>>>>>>>>>>>>>>     >>> wrote:
>>>>>>>>>>>>>>>>>     >>> > (moving to oisf-devel)
>>>>>>>>>>>>>>>>>     >>> >
>>>>>>>>>>>>>>>>>     >>> > On 04/08/2013 06:17 AM, Anoop Saldanha wrote:
>>>>>>>>>>>>>>>>>     >>> >>>> I recollect we introduced path and query double decoding
>>>>>>>>>>>>>>>>>     through
>>>>>>>>>>>>>>>>>     >>> >>>> configurable params, and also we had this thing with query
>>>>>>>>>>>>>>>>>     >>> >>>> decoding(single level).  Can you explain a bit what the
>>>>>>>>>>>>>>>>>     status was
>>>>>>>>>>>>>>>>>     >>> >>>> previously.  Seeing related failed uts.
>>>>>>>>>>>>>>>>>     >>> >>>>
>>>>>>>>>>>>>>>>>     >>> >>>
>>>>>>>>>>>>>>>>>     >>> >>> We run the path normalization on the query through our
>>>>>>>>>>>>>>>>>     >>> >>> HTPCallbackRequestUriNormalizeQuery callback. Previously we
>>>>>>>>>>>>>>>>> used
>>>>>>>>>>>>>>>>>     >>> >>> htp_decode_path_inplace to normalize the query (e.g. for
>>>>>>>>>>>>>>>>>     >>> >>> uridecoding).
>>>>>>>>>>>>>>>>>     >>> >>> However, this was causing issues (remember that pcre "bug"
>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>     >>> >>> discussed
>>>>>>>>>>>>>>>>>     >>> >>> a while back, where http:// turned into http:/).
>>>>>>>>>>>>>>>>>     >>> >>>
>>>>>>>>>>>>>>>>>     >>> >>> In libhtp I copied htp_decode_path_inplace to
>>>>>>>>>>>>>>>>>     >>> >>> htp_decode_query_inplace
>>>>>>>>>>>>>>>>>     >>> >>> and also copied the config params and cfg funcs:
>>>>>>>>>>>>>>>>>     >>> >>>
>>>>>>>>>>>>>>>>>     >>> >>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> https://github.com/inliniac/suricata/commit/d41c762689a08e6814dc93e8bfebeceab97175c3
>>>>>>>>>>>>>>>>>     >>> >>>
>>>>>>>>>>>>>>>>>     >>> >>> Hack of the 1st order, which is wrong in many ways. But it
>>>>>>>>>>>>>>>>>     basically
>>>>>>>>>>>>>>>>>     >>> >>> allowed me to make sure we don't normalize the query as if
>>>>>>>>>>>>>>>>>     it's path,
>>>>>>>>>>>>>>>>>     >>> >>> esp with turning ftp:// into ftp:/ and such.
>>>>>>>>>>>>>>>>>     >>> >>>
>>>>>>>>>>>>>>>>>     >>> >>> For 0.5 integration I think we need a proper solution. The
>>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>>>     >>> >>> reason I
>>>>>>>>>>>>>>>>>     >>> >>> pushed my hack like this was that I knew in 0.5 we would
>>>>>>>>>>>>>>>>>     make things
>>>>>>>>>>>>>>>>>     >>> >>> right.
>>>>>>>>>>>>>>>>>     >>> >>>
>>>>>>>>>>>>>>>>>     >>> >>
>>>>>>>>>>>>>>>>>     >>> >> I think if we still want to double decode, we still require
>>>>>>>>>>>>>>>>>     all of
>>>>>>>>>>>>>>>>>     >>> >> these above things from our bundled htp.
>>>>>>>>>>>>>>>>>     >>> >>
>>>>>>>>>>>>>>>>>     >>> >> -----
>>>>>>>>>>>>>>>>>     >>> >>
>>>>>>>>>>>>>>>>>     >>> >> In 0.5.x, tx->request_uri_normalized has been removed, and
>>>>>>>>>>>>>>>>>     we'd now
>>>>>>>>>>>>>>>>>     >>> >> have to use the REQUEST_URI hook.  We'll have to carry out
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>     >>> >> reconstruction ourselves, and store it ourselves in our
>>>>>>>>>>>>>>>>> HTPState.
>>>>>>>>>>>>>>>>>     >>> >>
>>>>>>>>>>>>>>>>>     >>>
>>>>>>>>>>>>>>>>>     >>> What are your thoughts on this?
>>>>>>>>>>>>>>>>>     >>>
>>>>>>>>>>>>>>>>>     >>> >
>>>>>>>>>>>>>>>>>     >>> > IIRC there is some function in libhtp that does just the
>>>>>>>>>>>>>>>>>     decoding of
>>>>>>>>>>>>>>>>>     >>> > uriencoding and unicode. We should probably just use that on
>>>>>>>>>>>>>>>>>     the query
>>>>>>>>>>>>>>>>>     >>> > and do the full normalization on the path.
>>>>>>>>>>>>>>>>>     >>> >
>>>>>>>>>>>>>>>>>     >>> > As a side thought: I think it would be nice to store path and
>>>>>>>>>>>>>>>>>     query
>>>>>>>>>>>>>>>>>     >>> > separately so that we can add http_path and http_query
>>>>>>>>>>>>>>>>>     keywords later
>>>>>>>>>>>>>>>>>     >>> > on.
>>>>>>>>>>>>>>>>>     >>> >
>>>>>>>>>>>>>>>>>     >>>
>>>>>>>>>>>>>>>>>     >>> We'd pretty much extract it directly from parsed_uri.  Will
>>>>>>>>>>>>>>>>> have to
>>>>>>>>>>>>>>>>>     >>> check if we need the extract double decode phase we have
>>>>>>>>>>>>>>>>>     currently in
>>>>>>>>>>>>>>>>>     >>> our bundled htp, in which case we'd need to store them
>>>>>>>>>>>>>>>>> separately.
>>>>>>>>>>>>>>>>>     >>>
>>>>>>>>>>>>>>>>>     >>
>>>>>>>>>>>>>>>>>     >> Yes, all the normalized components are in tx->parsed_uri.  This
>>>>>>>>>>>>>>>>>     is what is
>>>>>>>>>>>>>>>>>     >> used in ironbee to expose all the various parts like
>>>>>>>>>>>>>>>>>     tx->parsed_uri->path
>>>>>>>>>>>>>>>>>     >> and tx->parsed_uri->query.
>>>>>>>>>>>>>>>>>     >>
>>>>>>>>>>>>>>>>>     >> Also note that the hostname should now be obtained from
>>>>>>>>>>>>>>>>>     >> tx->request_hostname in 0.5.
>>>>>>>>>>>>>>>>>     >>
>>>>>>>>>>>>>>>>>     >> -B
>>>>>>>>>>>>>>>>>     >
>>>>>>>>>>>>>>>>>     >
>>>>>>>>>>>>>>>>>     > FYI, for an example using libhtp 0.5 see ironbee code.  This was
>>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>>     > recently updated for 0.5.
>>>>>>>>>>>>>>>>>     >
>>>>>>>>>>>>>>>>>     > https://github.com/ironbee/ironbee/blob/0.7.x/modules/modhtp.c
>>>>>>>>>>>>>>>>>     >
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     Will have a look.  Thanks.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     Previously we would use tx->connp->conn->transactions to access txs
>>>>>>>>>>>>>>>>>     in the state.  Now that htp_connp_t is an opaque pointer how do I
>>>>>>>>>>>>>>>>>     access the txs? Tried locating helper functions to retrieve it, but
>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>     didn't find any.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It's an oversight that there isn't a helper function to retrieve
>>>>>>>>>>>>>>>>> transactions on a connections. I will add one tomorrow.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Having said that, what is your use case that you require to retrieve
>>>>>>>>>>>>>>>>> transactions? I thought your code was driven by the callbacks, which >
>>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>> come with a tx instance (via connp)? For my education, can you explain
>>>>>>>>>>>>>>>>> how you process connection data?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> One of the things that we don't do out of the callbacks is logging the
>>>>>>>>>>>>>>>> requests. This is one of the things we need access to the TX store for.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> And to add to it, since we already have the txs stored in a list
>>>>>>>>>>>>>>> inside libhtp, re-buffering the txs would come as a redundant task,
>>>>>>>>>>>>>>> from where I see it.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Anoop Saldanha
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> Suricata IDS Devel mailing list: oisf-devel at openinfosecfoundation.org
>>>>>>>>>>>>>>> Site: http://suricata-ids.org | Participate:
>>>>>>>>>>>>>>> http://suricata-ids.org/participate/
>>>>>>>>>>>>>>> List: https://lists.openinfosecfoundation.org/mailman/listinfo/oisf-devel
>>>>>>>>>>>>>>> Redmine: https://redmine.openinfosecfoundation.org/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Ivan Ristić
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Anoop Saldanha
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Anoop Saldanha
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Ivan Ristić
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Anoop Saldanha
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> ---------------------------------------------
>>>>>>>> Victor Julien
>>>>>>>> http://www.inliniac.net/
>>>>>>>> PGP: http://www.inliniac.net/victorjulien.asc
>>>>>>>> ---------------------------------------------
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> -------------------------------
>>>>>>> Anoop Saldanha
>>>>>>> http://www.poona.me
>>>>>>> -------------------------------
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> -------------------------------
>>>>>> Anoop Saldanha
>>>>>> http://www.poona.me
>>>>>> -------------------------------
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> -------------------------------
>>>>> Anoop Saldanha
>>>>> http://www.poona.me
>>>>> -------------------------------
>>>>
>>>>
>>>>
>>>> --
>>>> Ivan Ristić
>>>
>>>
>>>
>>> --
>>> -------------------------------
>>> Anoop Saldanha
>>> http://www.poona.me
>>> -------------------------------
>>
>>
>>
>> --
>> -------------------------------
>> Anoop Saldanha
>> http://www.poona.me
>> -------------------------------
>
>
>
> --
> -------------------------------
> Anoop Saldanha
> http://www.poona.me
> -------------------------------



-- 
Ivan Ristić



More information about the Oisf-devel mailing list