[Oisf-devel] libhtp - Normalization of query string

Anoop Saldanha anoopsaldanha at gmail.com
Wed Jun 19 10:35:43 UTC 2013


On Wed, Jun 19, 2013 at 3:45 PM, Ivan Ristic <ivan.ristic at gmail.com> wrote:
> On Tue, Jun 18, 2013 at 2:12 PM, Anoop Saldanha <anoopsaldanha at gmail.com> wrote:
>> On Mon, Jun 17, 2013 at 6:40 PM, Ivan Ristic <ivan.ristic at gmail.com> wrote:
>>> On Mon, Jun 17, 2013 at 9:18 AM, Anoop Saldanha <anoopsaldanha at gmail.com> wrote:
>>>> While producing the normalized uri, what is the right way to
>>>> generate the normalized query string? Can see 2 solutions -
>>>>
>>>>     1. Duplicate this code section from htp_unparse_uri_noencode( ) -
>>>>
>>>>         if (uri->query != NULL) {
>>>>             bstr *query = bstr_dup(uri->query);
>>>>             htp_uriencoding_normalize_inplace(query);
>>>>             bstr_add_c_noex(r, "?");
>>>>             bstr_add_noex(r, query);
>>>>             bstr_free(query);
>>>>         }
>>>
>>> I think this one is a better approach, although it may depend on
>>> exactly how you define normalization.
>>
>> With htp_uriencoding_normalize_inplace( ) if it sees a %2d it would
>> translate it as a '-'(hypen) using x2c, and then checks if it's a
>> reserved character and post confirmation leaves it undecoded.  Is this
>> the right behaviour?
>
> It depends. It's ambiguous in the spec, and some argue one way, some
> another. Unfortunately, I didn't document my reasoning and so I will
> need to go back and double-check.
>

okay.

As an example, I have uris with query strings where %2d is not decoded
if I use htp_uriencoding_normalize_inplace().  We are also using this
function to decode username, password, fragment and hostname, so will
have to check if we face the same issue with these.

>
>> I would have preferred to use htp_decode_urlencoded_inplace(), but
>> it's private and duplication would be a nuisance with all the
>> reference to cfg.
>
> I don't think you can avoid the reference to cfg, because there are
> many settings that control exactly how the decoding is done.

Right, which should also count as the reason why we can't use
htp_uriencoding_normalize_inplace() for query decoding.

> There
> isn't any one true way. I could create a public function removing the
> reference to tx -- would you like that?

Yes, that would be helpful.

Before you push the commit for this, can I have a look at it to make
sure that's what I want?

>
>
>> Btw the cfg associated with HTP_DECODER_URL_PATH applicable to both
>> the path and the query part of the uri?
>
> HTP_DECODER_URL_PATH will be used for the path only,
> HTP_DECODER_URLENCODED for parameters, either in the query string or
> in the body. (I've just found one place when the incorrect cfg is
> used. Will fix it now.)
>

FYI, htp_uriencoding_normalize_inplace() has reference to
HTP_DEOCER_URL_PATH cfg, and query decoding code calls
htp_uriencoding_normalize_inplace().

>
>>>
>>>
>>>>     2. Register htp_config_register_urlencoded_parser( ), and then
>>>>           use the below code -
>>>>
>>>>         if (uri->query != NULL) {
>>>>             bstr_add_c_noex(r, "?");
>>>>             size_t tsize = htp_table_size(tx->request_params);
>>>>             size_t i;
>>>>             for (i = 0; i < tsize; i++) {
>>>>                 htp_param_t *p =
>>>>                     htp_table_get_index(tx->request_params, i, NULL);
>>>>                 if (p == NULL || p->source != HTP_SOURCE_QUERY_STRING)
>>>>                     continue;
>>>>                 bstr_add_noex(r, p->name);
>>>>                 if (bstr_len(p->value) != 0) {
>>>>                     bstr_add_c_noex(r, "=");
>>>>                     bstr_add_noex(r, p->value);
>>>>                 }
>>>>                 if (i != (tsize - 1))
>>>>                     bstr_add_c_noex(r, "&");
>>>>         }
>>>>
>>>>     Which of these 2 is the right solution?
>>>>


-- 
-------------------------------
Anoop Saldanha
http://www.poona.me
-------------------------------



More information about the Oisf-devel mailing list