[analog-help] Re: How to exclude Crwalers and Robots
Andreas Kuhn
akuhn at gmx.de
Wed Jan 17 03:24:48 PST 2007
Aengus <analog07 at ...> writes:
>
> Andreas Kuhn <akuhn at ...> wrote:
> >
> > <at> Aengus: Pageinclude is what I am still using as you can see in a
> > further answer. But I am not sure wether i need to use cols or change
> > the settings for cols. I think the standardsettings should be correct
> > for me?
>
> the Request Report only shows "Requests", because if the item in
> question is a Page, then Page Requests = Requests, and if the item is
> not a Page, then Page Requests = 0, so the "Page Requests" column is
> redundant.
>
> Other reports, such as the Daily Summary, show the Requests column and
> the Page Requests column by default (and the bar chart is based on Page
> Requests).
>
> The General Summary lists both Requests and Page Requests.
>
> The Browser Report shows Requests and Page Requests by default (a
> browser that only browses pages is probably a spider, and the Requests
> and Page Requests column will be the same).
>
> The Organization Report and Host Report only show the Request column by
> default.
>
> If you exclude everything except pages, then every Request will be a
> Page Request, so you won't get any additional information by changing
> your COLS settings.
>
> But rather than excluding all the other files, sometimes you're better
> off using
> REQINCLUDE pages
> to analyse evrything, but only list items that you have identified as
> Pages in the Request Report. (Make sure you use the PAGEINCLUDE command
> to tell Ananlog to treat .pdf files as Pages).
>
> > I thought/ hoped, that identified Robots/ Crawler would be excluded
> > automatically from the "Request Reports". Don't ask me how i got
> > this. )
>
> No, ROBOTINCLUDE tells Analog what Browser strings to consider
> Robots/spiders. If you want to exclude all requests from a particular
> User Agent string, you need to use BROWEXCLUDE. (Just do a
> search/replace on the ROBOTEXCLUDE list I pointed to earlier).
>
> Aengus
>
Thanks for the explanations. This gives me some hope; as i wasn't that far
away from beeing wright. ;-)
I am trying to use the BROWEXCLUDE command (BROWEXLUDE *crawler* ; BROWEXLUDE
*bot*; BROWEXLUDE ICCRawler*; ...) but i get no diffenrences in my reports.
The Crawler are still listed in the Browserreport and OSreport.
I placed the commands in the section where I include and exclude the pages,
than I placed them below the Reports (REQUEST ON ; REQFLOOR 1R;
BROWEXCLUDE...) (BROWREP ON; BROWEXCLUDE...) and so on.
What am I making wrong?
The docs give me explanations about the comments and sometimes the complete
string, but not where to put in the configfile.
As I am still learning on this, I have some trouble in understanding how to
use the commands correctly.
Thank you all again for helping me and your patience.
Greetz
Andreas
More information about the analog-help
mailing list