[Oisf-devel] libhtp - Normalization of query string
Anoop Saldanha
anoopsaldanha at gmail.com
Fri Jun 21 17:44:31 UTC 2013
On Fri, Jun 21, 2013 at 10:34 PM, Ivan Ristić <ivan.ristic at gmail.com> wrote:
> On 21/06/2013 17:36, Anoop Saldanha wrote:
>> On Fri, Jun 21, 2013 at 6:14 PM, Ivan Ristić <ivan.ristic at gmail.com> wrote:
>>> On 19/06/2013 11:35, Anoop Saldanha wrote:
>>>> On Wed, Jun 19, 2013 at 3:45 PM, Ivan Ristic <ivan.ristic at gmail.com> wrote:
>>>>> On Tue, Jun 18, 2013 at 2:12 PM, Anoop Saldanha <anoopsaldanha at gmail.com> wrote:
>>>>>> On Mon, Jun 17, 2013 at 6:40 PM, Ivan Ristic <ivan.ristic at gmail.com> wrote:
>>>>>>> On Mon, Jun 17, 2013 at 9:18 AM, Anoop Saldanha <anoopsaldanha at gmail.com> wrote:
>>>>>>>> While producing the normalized uri, what is the right way to
>>>>>>>> generate the normalized query string? Can see 2 solutions -
>>>>>>>>
>>>>>>>> 1. Duplicate this code section from htp_unparse_uri_noencode( ) -
>>>>>>>>
>>>>>>>> if (uri->query != NULL) {
>>>>>>>> bstr *query = bstr_dup(uri->query);
>>>>>>>> htp_uriencoding_normalize_inplace(query);
>>>>>>>> bstr_add_c_noex(r, "?");
>>>>>>>> bstr_add_noex(r, query);
>>>>>>>> bstr_free(query);
>>>>>>>> }
>>>>>>>
>>>>>>> I think this one is a better approach, although it may depend on
>>>>>>> exactly how you define normalization.
>>>>>>
>>>>>> With htp_uriencoding_normalize_inplace( ) if it sees a %2d it would
>>>>>> translate it as a '-'(hypen) using x2c, and then checks if it's a
>>>>>> reserved character and post confirmation leaves it undecoded. Is this
>>>>>> the right behaviour?
>>>>>
>>>>> It depends. It's ambiguous in the spec, and some argue one way, some
>>>>> another. Unfortunately, I didn't document my reasoning and so I will
>>>>> need to go back and double-check.
>>>>>
>>>>
>>>> okay.
>>>>
>>>> As an example, I have uris with query strings where %2d is not decoded
>>>> if I use htp_uriencoding_normalize_inplace(). We are also using this
>>>> function to decode username, password, fragment and hostname, so will
>>>> have to check if we face the same issue with these.
>>>>
>>>>>
>>>>>> I would have preferred to use htp_decode_urlencoded_inplace(), but
>>>>>> it's private and duplication would be a nuisance with all the
>>>>>> reference to cfg.
>>>>>
>>>>> I don't think you can avoid the reference to cfg, because there are
>>>>> many settings that control exactly how the decoding is done.
>>>>
>>>> Right, which should also count as the reason why we can't use
>>>> htp_uriencoding_normalize_inplace() for query decoding.
>>>>
>>>>> There
>>>>> isn't any one true way. I could create a public function removing the
>>>>> reference to tx -- would you like that?
>>>>
>>>> Yes, that would be helpful.
>>>>
>>>> Before you push the commit for this, can I have a look at it to make
>>>> sure that's what I want?
>>>
>>> How about:
>>>
>>> htp_urldecode_inplace_ex(
>>> htp_decoder_cfg_t *cfg,
>>> bstr *input,
>>> uint64_t flags)?
>>>
>>
>> This should be okay. The flags is to specify whether it's for path or not?
>
> No, to tell you what was contained in the string. What type of encoding,
> and so on. Same as other flags.
>
So you meant a "uint64_t *flags". Sounds good.
--
-------------------------------
Anoop Saldanha
http://www.poona.me
-------------------------------
More information about the Oisf-devel
mailing list