[analog-help] Escaping Quotes in Logs

Roberto Hoyle roberto.j.hoyle at Dartmouth.EDU
Wed Feb 27 09:42:20 PST 2008


On Feb 26, 2008, at 8:10 PM, Jeremy Wadsack wrote:

> The format for these parameters is not typical for a web log. Usually
> the query string is URL-escaped. In that case quote characters are
> converted to their hex equivalent. In your example I would expect
> something more like this:
>
> F%2E+Scott+Fitzgerald%27s+evolving+American+dream:+the+
> %22pursuit+of+happiness%22+in+Gatsby%2C+Tender+is+the+night
>
> In this case Analog can parse the file just fine. For your files you
> will probably need to pre-process the lines to convert them to  
> something
> Analog can support.

 From the Apache 2.2 documentation:

Some Notes

For security reasons, starting with version 2.0.46, non-printable and  
other special characters in %r, %i and %o are escaped using \xhh  
sequences, where hh stands for the hexadecimal representation of the  
raw byte. Exceptions from this rule are " and \, which are escaped by  
prepending a backslash, and all whitespace characters, which are  
written in their C-style notation (\n, \t, etc). In versions prior to  
2.0.46, no escaping was performed on these strings so you had to be  
quite careful when dealing with raw log files.

So, for Apache, at least, what I'm seeing is expected behavior.


More information about the analog-help mailing list