<div dir="ltr"><br><div class="gmail_extra"><div class="gmail_quote">On Mon, Apr 8, 2013 at 8:47 AM, Anoop Saldanha <span dir="ltr"><<a href="mailto:anoopsaldanha@gmail.com" target="_blank">anoopsaldanha@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class=""><div class="h5">On Mon, Apr 8, 2013 at 3:42 PM, Victor Julien <<a href="mailto:victor@inliniac.net">victor@inliniac.net</a>> wrote:<br>
> (moving to oisf-devel)<br>
><br>
> On 04/08/2013 06:17 AM, Anoop Saldanha wrote:<br>
>>>> I recollect we introduced path and query double decoding through<br>
>>>> configurable params, and also we had this thing with query<br>
>>>> decoding(single level). Can you explain a bit what the status was<br>
>>>> previously. Seeing related failed uts.<br>
>>>><br>
>>><br>
>>> We run the path normalization on the query through our<br>
>>> HTPCallbackRequestUriNormalizeQuery callback. Previously we used<br>
>>> htp_decode_path_inplace to normalize the query (e.g. for uridecoding).<br>
>>> However, this was causing issues (remember that pcre "bug" we discussed<br>
>>> a while back, where http:// turned into http:/).<br>
>>><br>
>>> In libhtp I copied htp_decode_path_inplace to htp_decode_query_inplace<br>
>>> and also copied the config params and cfg funcs:<br>
>>> <a href="https://github.com/inliniac/suricata/commit/d41c762689a08e6814dc93e8bfebeceab97175c3" target="_blank">https://github.com/inliniac/suricata/commit/d41c762689a08e6814dc93e8bfebeceab97175c3</a><br>
>>><br>
>>> Hack of the 1st order, which is wrong in many ways. But it basically<br>
>>> allowed me to make sure we don't normalize the query as if it's path,<br>
>>> esp with turning ftp:// into ftp:/ and such.<br>
>>><br>
>>> For 0.5 integration I think we need a proper solution. The only reason I<br>
>>> pushed my hack like this was that I knew in 0.5 we would make things right.<br>
>>><br>
>><br>
>> I think if we still want to double decode, we still require all of<br>
>> these above things from our bundled htp.<br>
>><br>
>> -----<br>
>><br>
>> In 0.5.x, tx->request_uri_normalized has been removed, and we'd now<br>
>> have to use the REQUEST_URI hook. We'll have to carry out the<br>
>> reconstruction ourselves, and store it ourselves in our HTPState.<br>
>><br>
<br>
</div></div>What are your thoughts on this?<br>
<div class="im"><br>
><br>
> IIRC there is some function in libhtp that does just the decoding of<br>
> uriencoding and unicode. We should probably just use that on the query<br>
> and do the full normalization on the path.<br>
><br>
> As a side thought: I think it would be nice to store path and query<br>
> separately so that we can add http_path and http_query keywords later on.<br>
><br>
<br>
</div>We'd pretty much extract it directly from parsed_uri. Will have to<br>
check if we need the extract double decode phase we have currently in<br>
our bundled htp, in which case we'd need to store them separately.<br>
<span class=""><font color="#888888"><br></font></span></blockquote><div><br></div><div style>Yes, all the normalized components are in tx->parsed_uri. This is what is used in ironbee to expose all the various parts like tx->parsed_uri->path and tx->parsed_uri->query.</div>
<div style><br></div><div style>Also note that the hostname should now be obtained from tx->request_hostname in 0.5.</div><div style><br></div><div style>-B</div></div></div></div>