[analog-help] Re: How to exclude Crwalers and Robots
Aengus
analog07 at eircom.net
Tue Jan 16 14:40:09 PST 2007
Andreas Kuhn <akuhn at gmx.de> wrote:
>
> @Aengus: Pageinclude is what I am still using as you can see in a
> further answer. But I am not sure wether i need to use cols or change
> the settings for cols. I think the standardsettings should be correct
> for me?
the Request Report only shows "Requests", because if the item in
question is a Page, then Page Requests = Requests, and if the item is
not a Page, then Page Requests = 0, so the "Page Requests" column is
redundant.
Other reports, such as the Daily Summary, show the Requests column and
the Page Requests column by default (and the bar chart is based on Page
Requests).
The General Summary lists both Requests and Page Requests.
The Browser Report shows Requests and Page Requests by default (a
browser that only browses pages is probably a spider, and the Requests
and Page Requests column will be the same).
The Organization Report and Host Report only show the Request column by
default.
If you exclude everything except pages, then every Request will be a
Page Request, so you won't get any additional information by changing
your COLS settings.
But rather than excluding all the other files, sometimes you're better
off using
REQINCLUDE pages
to analyse evrything, but only list items that you have identified as
Pages in the Request Report. (Make sure you use the PAGEINCLUDE command
to tell Ananlog to treat .pdf files as Pages).
> I thought/ hoped, that identified Robots/ Crawler would be excluded
> automatically from the "Request Reports". Don't ask me how i got
> this. ;-))
No, ROBOTINCLUDE tells Analog what Browser strings to consider
Robots/spiders. If you want to exclude all requests from a particular
User Agent string, you need to use BROWEXCLUDE. (Just do a
search/replace on the ROBOTEXCLUDE list I pointed to earlier).
Aengus
More information about the analog-help
mailing list