[analog-help] How to exclude Crwalers and Robots

Aengus analog07 at eircom.net
Tue Jan 16 03:44:55 PST 2007


On Tuesday, January 16, 2007 5:55 AM [EDT],
Andreas Kuhn <AKuhn at gmx.de> wrote:

>> Do crawlers and robots have any influence of the request report? If
>> so, how can I exclude the PIs crawler and robots produce?

You can use HOSTEXCLUDE or BROWEXCLUDE (http://analog.cx/docs/include.html) 
to exclude any robots/spiders that you identify. (There's an up to date list 
of browser strings used by known Robots at 
http://www.wadsack.com/robot-list.html)

>> My problem is that I am using several reporttools. Comparing the
>> figures the analog-figures are about 50% higher than the others. Now
>> my question is, wether the crawler are producing this difference.

They might be, but there are many reasons why different reporting methods 
return different answers. Analog reports on the data in your web servers log 
files, and you can be quite sure that it is extremely accurate. But its 
reports depend on the parameters that it is told to use (include/exclude 
certain hosts, ignore image requests, what counts as a page, etc). If you 
use a different method that uses different parameters, you'll get a 
different result.

If you don't understand the paraemeters that your different reporting 
methods are using, it's a waste of time comparing them. You can compare this 
months Analog results to last months, and learn something useful from the 
comparison, you won't learn anything useful by comparing an Analog report to 
some other report unless you understand the assumptions that both reports 
are based on.

Aengus 



More information about the analog-help mailing list