[Oisf-devel] libhtp - Normalization of query string

Anoop Saldanha anoopsaldanha at gmail.com
Sat Jun 29 14:19:24 UTC 2013


On Thu, Jun 27, 2013 at 2:05 PM, Ivan Ristić <ivan.ristic at gmail.com> wrote:
> On 21/06/2013 18:44, Anoop Saldanha wrote:
>> On Fri, Jun 21, 2013 at 10:34 PM, Ivan Ristić <ivan.ristic at gmail.com> wrote:
>>> On 21/06/2013 17:36, Anoop Saldanha wrote:
>>>> On Fri, Jun 21, 2013 at 6:14 PM, Ivan Ristić <ivan.ristic at gmail.com> wrote:
>>>>> On 19/06/2013 11:35, Anoop Saldanha wrote:
>>>>>> On Wed, Jun 19, 2013 at 3:45 PM, Ivan Ristic <ivan.ristic at gmail.com> wrote:
>>>>>>> On Tue, Jun 18, 2013 at 2:12 PM, Anoop Saldanha <anoopsaldanha at gmail.com> wrote:
>>>>>>>> On Mon, Jun 17, 2013 at 6:40 PM, Ivan Ristic <ivan.ristic at gmail.com> wrote:
>>>>>>>>> On Mon, Jun 17, 2013 at 9:18 AM, Anoop Saldanha <anoopsaldanha at gmail.com> wrote:
>>>>>>>>>> While producing the normalized uri, what is the right way to
>>>>>>>>>> generate the normalized query string? Can see 2 solutions -
>>>>>>>>>>
>>>>>>>>>>     1. Duplicate this code section from htp_unparse_uri_noencode( ) -
>>>>>>>>>>
>>>>>>>>>>         if (uri->query != NULL) {
>>>>>>>>>>             bstr *query = bstr_dup(uri->query);
>>>>>>>>>>             htp_uriencoding_normalize_inplace(query);
>>>>>>>>>>             bstr_add_c_noex(r, "?");
>>>>>>>>>>             bstr_add_noex(r, query);
>>>>>>>>>>             bstr_free(query);
>>>>>>>>>>         }
>>>>>>>>>
>>>>>>>>> I think this one is a better approach, although it may depend on
>>>>>>>>> exactly how you define normalization.
>>>>>>>>
>>>>>>>> With htp_uriencoding_normalize_inplace( ) if it sees a %2d it would
>>>>>>>> translate it as a '-'(hypen) using x2c, and then checks if it's a
>>>>>>>> reserved character and post confirmation leaves it undecoded.  Is this
>>>>>>>> the right behaviour?
>>>>>>>
>>>>>>> It depends. It's ambiguous in the spec, and some argue one way, some
>>>>>>> another. Unfortunately, I didn't document my reasoning and so I will
>>>>>>> need to go back and double-check.
>>>>>>>
>>>>>>
>>>>>> okay.
>>>>>>
>>>>>> As an example, I have uris with query strings where %2d is not decoded
>>>>>> if I use htp_uriencoding_normalize_inplace().  We are also using this
>>>>>> function to decode username, password, fragment and hostname, so will
>>>>>> have to check if we face the same issue with these.
>>>>>>
>>>>>>>
>>>>>>>> I would have preferred to use htp_decode_urlencoded_inplace(), but
>>>>>>>> it's private and duplication would be a nuisance with all the
>>>>>>>> reference to cfg.
>>>>>>>
>>>>>>> I don't think you can avoid the reference to cfg, because there are
>>>>>>> many settings that control exactly how the decoding is done.
>>>>>>
>>>>>> Right, which should also count as the reason why we can't use
>>>>>> htp_uriencoding_normalize_inplace() for query decoding.
>>>>>>
>>>>>>> There
>>>>>>> isn't any one true way. I could create a public function removing the
>>>>>>> reference to tx -- would you like that?
>>>>>>
>>>>>> Yes, that would be helpful.
>>>>>>
>>>>>> Before you push the commit for this, can I have a look at it to make
>>>>>> sure that's what I want?
>>>>>
>>>>> How about:
>>>>>
>>>>>     htp_urldecode_inplace_ex(
>>>>>         htp_decoder_cfg_t *cfg,
>>>>>         bstr *input,
>>>>>         uint64_t flags)?
>>>>>
>>>>
>>>> This should be okay.  The flags is to specify whether it's for path or not?
>>>
>>> No, to tell you what was contained in the string. What type of encoding,
>>> and so on. Same as other flags.
>>>
>>
>> So you meant a "uint64_t *flags".  Sounds good.
>
> I've added a public:
>
> htp_status_t htp_urldecode_inplace(htp_cfg_t *cfg, enum
> htp_decoder_ctx_t ctx, bstr *input, uint64_t *flags);
>
> to 0.5.x. Please give it a go.
>
> Another change is that now everything is decoded, even the reserved
> characters. It's not clear if this is the right thing to do, and I
> suspect that there isn't any one right thing to do. So see if it works
> for you.
>
> I will focus on the normalization process in the next release.
>

Works as described.   Thanks.

-- 
-------------------------------
Anoop Saldanha
http://www.poona.me
-------------------------------



More information about the Oisf-devel mailing list