[analog-help] How to exclude Crwalers and Robots
Aengus
analog07 at eircom.net
Tue Jan 16 03:44:55 PST 2007
On Tuesday, January 16, 2007 5:55 AM [EDT],
Andreas Kuhn <AKuhn at gmx.de> wrote:
>> Do crawlers and robots have any influence of the request report? If
>> so, how can I exclude the PIs crawler and robots produce?
You can use HOSTEXCLUDE or BROWEXCLUDE (http://analog.cx/docs/include.html)
to exclude any robots/spiders that you identify. (There's an up to date list
of browser strings used by known Robots at
http://www.wadsack.com/robot-list.html)
>> My problem is that I am using several reporttools. Comparing the
>> figures the analog-figures are about 50% higher than the others. Now
>> my question is, wether the crawler are producing this difference.
They might be, but there are many reasons why different reporting methods
return different answers. Analog reports on the data in your web servers log
files, and you can be quite sure that it is extremely accurate. But its
reports depend on the parameters that it is told to use (include/exclude
certain hosts, ignore image requests, what counts as a page, etc). If you
use a different method that uses different parameters, you'll get a
different result.
If you don't understand the paraemeters that your different reporting
methods are using, it's a waste of time comparing them. You can compare this
months Analog results to last months, and learn something useful from the
comparison, you won't learn anything useful by comparing an Analog report to
some other report unless you understand the assumptions that both reports
are based on.
Aengus
More information about the analog-help
mailing list