[Oisf-devel] libhtp - Normalization of query string
Anoop Saldanha
anoopsaldanha at gmail.com
Sat Jun 29 14:19:24 UTC 2013
On Thu, Jun 27, 2013 at 2:05 PM, Ivan Ristić <ivan.ristic at gmail.com> wrote:
> On 21/06/2013 18:44, Anoop Saldanha wrote:
>> On Fri, Jun 21, 2013 at 10:34 PM, Ivan Ristić <ivan.ristic at gmail.com> wrote:
>>> On 21/06/2013 17:36, Anoop Saldanha wrote:
>>>> On Fri, Jun 21, 2013 at 6:14 PM, Ivan Ristić <ivan.ristic at gmail.com> wrote:
>>>>> On 19/06/2013 11:35, Anoop Saldanha wrote:
>>>>>> On Wed, Jun 19, 2013 at 3:45 PM, Ivan Ristic <ivan.ristic at gmail.com> wrote:
>>>>>>> On Tue, Jun 18, 2013 at 2:12 PM, Anoop Saldanha <anoopsaldanha at gmail.com> wrote:
>>>>>>>> On Mon, Jun 17, 2013 at 6:40 PM, Ivan Ristic <ivan.ristic at gmail.com> wrote:
>>>>>>>>> On Mon, Jun 17, 2013 at 9:18 AM, Anoop Saldanha <anoopsaldanha at gmail.com> wrote:
>>>>>>>>>> While producing the normalized uri, what is the right way to
>>>>>>>>>> generate the normalized query string? Can see 2 solutions -
>>>>>>>>>>
>>>>>>>>>> 1. Duplicate this code section from htp_unparse_uri_noencode( ) -
>>>>>>>>>>
>>>>>>>>>> if (uri->query != NULL) {
>>>>>>>>>> bstr *query = bstr_dup(uri->query);
>>>>>>>>>> htp_uriencoding_normalize_inplace(query);
>>>>>>>>>> bstr_add_c_noex(r, "?");
>>>>>>>>>> bstr_add_noex(r, query);
>>>>>>>>>> bstr_free(query);
>>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> I think this one is a better approach, although it may depend on
>>>>>>>>> exactly how you define normalization.
>>>>>>>>
>>>>>>>> With htp_uriencoding_normalize_inplace( ) if it sees a %2d it would
>>>>>>>> translate it as a '-'(hypen) using x2c, and then checks if it's a
>>>>>>>> reserved character and post confirmation leaves it undecoded. Is this
>>>>>>>> the right behaviour?
>>>>>>>
>>>>>>> It depends. It's ambiguous in the spec, and some argue one way, some
>>>>>>> another. Unfortunately, I didn't document my reasoning and so I will
>>>>>>> need to go back and double-check.
>>>>>>>
>>>>>>
>>>>>> okay.
>>>>>>
>>>>>> As an example, I have uris with query strings where %2d is not decoded
>>>>>> if I use htp_uriencoding_normalize_inplace(). We are also using this
>>>>>> function to decode username, password, fragment and hostname, so will
>>>>>> have to check if we face the same issue with these.
>>>>>>
>>>>>>>
>>>>>>>> I would have preferred to use htp_decode_urlencoded_inplace(), but
>>>>>>>> it's private and duplication would be a nuisance with all the
>>>>>>>> reference to cfg.
>>>>>>>
>>>>>>> I don't think you can avoid the reference to cfg, because there are
>>>>>>> many settings that control exactly how the decoding is done.
>>>>>>
>>>>>> Right, which should also count as the reason why we can't use
>>>>>> htp_uriencoding_normalize_inplace() for query decoding.
>>>>>>
>>>>>>> There
>>>>>>> isn't any one true way. I could create a public function removing the
>>>>>>> reference to tx -- would you like that?
>>>>>>
>>>>>> Yes, that would be helpful.
>>>>>>
>>>>>> Before you push the commit for this, can I have a look at it to make
>>>>>> sure that's what I want?
>>>>>
>>>>> How about:
>>>>>
>>>>> htp_urldecode_inplace_ex(
>>>>> htp_decoder_cfg_t *cfg,
>>>>> bstr *input,
>>>>> uint64_t flags)?
>>>>>
>>>>
>>>> This should be okay. The flags is to specify whether it's for path or not?
>>>
>>> No, to tell you what was contained in the string. What type of encoding,
>>> and so on. Same as other flags.
>>>
>>
>> So you meant a "uint64_t *flags". Sounds good.
>
> I've added a public:
>
> htp_status_t htp_urldecode_inplace(htp_cfg_t *cfg, enum
> htp_decoder_ctx_t ctx, bstr *input, uint64_t *flags);
>
> to 0.5.x. Please give it a go.
>
> Another change is that now everything is decoded, even the reserved
> characters. It's not clear if this is the right thing to do, and I
> suspect that there isn't any one right thing to do. So see if it works
> for you.
>
> I will focus on the normalization process in the next release.
>
Works as described. Thanks.
--
-------------------------------
Anoop Saldanha
http://www.poona.me
-------------------------------
More information about the Oisf-devel
mailing list