[Oisf-devel] libhtp - Normalization of query string

Ivan Ristić ivan.ristic at gmail.com
Fri Jun 21 17:04:18 UTC 2013


On 21/06/2013 17:36, Anoop Saldanha wrote:
> On Fri, Jun 21, 2013 at 6:14 PM, Ivan Ristić <ivan.ristic at gmail.com> wrote:
>> On 19/06/2013 11:35, Anoop Saldanha wrote:
>>> On Wed, Jun 19, 2013 at 3:45 PM, Ivan Ristic <ivan.ristic at gmail.com> wrote:
>>>> On Tue, Jun 18, 2013 at 2:12 PM, Anoop Saldanha <anoopsaldanha at gmail.com> wrote:
>>>>> On Mon, Jun 17, 2013 at 6:40 PM, Ivan Ristic <ivan.ristic at gmail.com> wrote:
>>>>>> On Mon, Jun 17, 2013 at 9:18 AM, Anoop Saldanha <anoopsaldanha at gmail.com> wrote:
>>>>>>> While producing the normalized uri, what is the right way to
>>>>>>> generate the normalized query string? Can see 2 solutions -
>>>>>>>
>>>>>>>     1. Duplicate this code section from htp_unparse_uri_noencode( ) -
>>>>>>>
>>>>>>>         if (uri->query != NULL) {
>>>>>>>             bstr *query = bstr_dup(uri->query);
>>>>>>>             htp_uriencoding_normalize_inplace(query);
>>>>>>>             bstr_add_c_noex(r, "?");
>>>>>>>             bstr_add_noex(r, query);
>>>>>>>             bstr_free(query);
>>>>>>>         }
>>>>>>
>>>>>> I think this one is a better approach, although it may depend on
>>>>>> exactly how you define normalization.
>>>>>
>>>>> With htp_uriencoding_normalize_inplace( ) if it sees a %2d it would
>>>>> translate it as a '-'(hypen) using x2c, and then checks if it's a
>>>>> reserved character and post confirmation leaves it undecoded.  Is this
>>>>> the right behaviour?
>>>>
>>>> It depends. It's ambiguous in the spec, and some argue one way, some
>>>> another. Unfortunately, I didn't document my reasoning and so I will
>>>> need to go back and double-check.
>>>>
>>>
>>> okay.
>>>
>>> As an example, I have uris with query strings where %2d is not decoded
>>> if I use htp_uriencoding_normalize_inplace().  We are also using this
>>> function to decode username, password, fragment and hostname, so will
>>> have to check if we face the same issue with these.
>>>
>>>>
>>>>> I would have preferred to use htp_decode_urlencoded_inplace(), but
>>>>> it's private and duplication would be a nuisance with all the
>>>>> reference to cfg.
>>>>
>>>> I don't think you can avoid the reference to cfg, because there are
>>>> many settings that control exactly how the decoding is done.
>>>
>>> Right, which should also count as the reason why we can't use
>>> htp_uriencoding_normalize_inplace() for query decoding.
>>>
>>>> There
>>>> isn't any one true way. I could create a public function removing the
>>>> reference to tx -- would you like that?
>>>
>>> Yes, that would be helpful.
>>>
>>> Before you push the commit for this, can I have a look at it to make
>>> sure that's what I want?
>>
>> How about:
>>
>>     htp_urldecode_inplace_ex(
>>         htp_decoder_cfg_t *cfg,
>>         bstr *input,
>>         uint64_t flags)?
>>
> 
> This should be okay.  The flags is to specify whether it's for path or not?

No, to tell you what was contained in the string. What type of encoding,
and so on. Same as other flags.

-- 
Ivan



More information about the Oisf-devel mailing list