[Oisf-devel] libhtp - Normalization of query string
Anoop Saldanha
anoopsaldanha at gmail.com
Fri Jun 21 16:36:13 UTC 2013
On Fri, Jun 21, 2013 at 6:14 PM, Ivan Ristić <ivan.ristic at gmail.com> wrote:
> On 19/06/2013 11:35, Anoop Saldanha wrote:
>> On Wed, Jun 19, 2013 at 3:45 PM, Ivan Ristic <ivan.ristic at gmail.com> wrote:
>>> On Tue, Jun 18, 2013 at 2:12 PM, Anoop Saldanha <anoopsaldanha at gmail.com> wrote:
>>>> On Mon, Jun 17, 2013 at 6:40 PM, Ivan Ristic <ivan.ristic at gmail.com> wrote:
>>>>> On Mon, Jun 17, 2013 at 9:18 AM, Anoop Saldanha <anoopsaldanha at gmail.com> wrote:
>>>>>> While producing the normalized uri, what is the right way to
>>>>>> generate the normalized query string? Can see 2 solutions -
>>>>>>
>>>>>> 1. Duplicate this code section from htp_unparse_uri_noencode( ) -
>>>>>>
>>>>>> if (uri->query != NULL) {
>>>>>> bstr *query = bstr_dup(uri->query);
>>>>>> htp_uriencoding_normalize_inplace(query);
>>>>>> bstr_add_c_noex(r, "?");
>>>>>> bstr_add_noex(r, query);
>>>>>> bstr_free(query);
>>>>>> }
>>>>>
>>>>> I think this one is a better approach, although it may depend on
>>>>> exactly how you define normalization.
>>>>
>>>> With htp_uriencoding_normalize_inplace( ) if it sees a %2d it would
>>>> translate it as a '-'(hypen) using x2c, and then checks if it's a
>>>> reserved character and post confirmation leaves it undecoded. Is this
>>>> the right behaviour?
>>>
>>> It depends. It's ambiguous in the spec, and some argue one way, some
>>> another. Unfortunately, I didn't document my reasoning and so I will
>>> need to go back and double-check.
>>>
>>
>> okay.
>>
>> As an example, I have uris with query strings where %2d is not decoded
>> if I use htp_uriencoding_normalize_inplace(). We are also using this
>> function to decode username, password, fragment and hostname, so will
>> have to check if we face the same issue with these.
>>
>>>
>>>> I would have preferred to use htp_decode_urlencoded_inplace(), but
>>>> it's private and duplication would be a nuisance with all the
>>>> reference to cfg.
>>>
>>> I don't think you can avoid the reference to cfg, because there are
>>> many settings that control exactly how the decoding is done.
>>
>> Right, which should also count as the reason why we can't use
>> htp_uriencoding_normalize_inplace() for query decoding.
>>
>>> There
>>> isn't any one true way. I could create a public function removing the
>>> reference to tx -- would you like that?
>>
>> Yes, that would be helpful.
>>
>> Before you push the commit for this, can I have a look at it to make
>> sure that's what I want?
>
> How about:
>
> htp_urldecode_inplace_ex(
> htp_decoder_cfg_t *cfg,
> bstr *input,
> uint64_t flags)?
>
This should be okay. The flags is to specify whether it's for path or not?
>
>>>> Btw the cfg associated with HTP_DECODER_URL_PATH applicable to both
>>>> the path and the query part of the uri?
>>>
>>> HTP_DECODER_URL_PATH will be used for the path only,
>>> HTP_DECODER_URLENCODED for parameters, either in the query string or
>>> in the body. (I've just found one place when the incorrect cfg is
>>> used. Will fix it now.)
>>>
>>
>> FYI, htp_uriencoding_normalize_inplace() has reference to
>> HTP_DEOCER_URL_PATH cfg, and query decoding code calls
>> htp_uriencoding_normalize_inplace().
>
> You probably mean htp_decode_urlencoded_inplace()? I fixed that the
> other day.
>
> https://github.com/ironbee/libhtp/commit/c88d0a59bb00fc343747872b2be73692107f2b60
>
Right. htp_decode_urlencoded_inplace(). Typed it wrong.
>
>>>
>>>>>
>>>>>
>>>>>> 2. Register htp_config_register_urlencoded_parser( ), and then
>>>>>> use the below code -
>>>>>>
>>>>>> if (uri->query != NULL) {
>>>>>> bstr_add_c_noex(r, "?");
>>>>>> size_t tsize = htp_table_size(tx->request_params);
>>>>>> size_t i;
>>>>>> for (i = 0; i < tsize; i++) {
>>>>>> htp_param_t *p =
>>>>>> htp_table_get_index(tx->request_params, i, NULL);
>>>>>> if (p == NULL || p->source != HTP_SOURCE_QUERY_STRING)
>>>>>> continue;
>>>>>> bstr_add_noex(r, p->name);
>>>>>> if (bstr_len(p->value) != 0) {
>>>>>> bstr_add_c_noex(r, "=");
>>>>>> bstr_add_noex(r, p->value);
>>>>>> }
>>>>>> if (i != (tsize - 1))
>>>>>> bstr_add_c_noex(r, "&");
>>>>>> }
>>>>>>
>>>>>> Which of these 2 is the right solution?
>>>>>>
>>
>>
>
>
> --
> Ivan
--
-------------------------------
Anoop Saldanha
http://www.poona.me
-------------------------------
More information about the Oisf-devel
mailing list