[analog-help] Identifying Known Spiders?
Jeremy Wadsack
jeremy at 7simplemachines.com
Thu Jul 3 08:37:43 PDT 2008
The robots list from which that page was built no longer exists. The group that was maintaining it decided that it didn't make sense to maintain a database of "known robots" any more as anyone can make a robot. However, some quick review looks like there's a new user-submitted list at http://www.robotstxt.org/db.html and a "wild caught" list at http://www.botsvsbrowsers.com/category/1/index.html. If I have time this weekend, maybe I'll update the scripts to pull from one of those sources.
I have to say though, that the more I think about it, the more I'm of the mind set that anything that is *not a known web browser* is most likely a bot. And maybe inverting the logic would make sense at this point.
--
Jeremy Wadsack
Seven Simple Machines
Main: (206) 545-4850
Direct: (206) 812-6829
-----Original Message-----
From: analog-help-bounces at lists.meer.net [mailto:analog-help-bounces at lists.meer.net] On Behalf Of Aengus
Sent: Thursday, July 03, 2008 4:30 AM
To: Support for analog web log analyzer
Subject: Re: [analog-help] Identifying Known Spiders?
On 7/3/2008 3:48 AM, Michael Crawford wrote:
> I'd like to know the success of my efforts to submit a new site to all
> the search engines; some spiders won't visit a site until it's been
> online for a while, and some will only visit the home page.
>
> I can see some of the spiders in the BROWSERREP and BROWSERSUM, but
> it's missing some because it's definitely missing Googlebot and Yahoo
> Slurp.
>
> Also the BROWSERREP shows all the browsers used by my human visitors;
> it will get hard to spot spiders when my traffic picks up.
>
> Is there a report specifically for known spiders?
No, the only special treatment for spiders in Analog is the ROBOTINCLUDE
command which tells Analog to count the requests with the specified
User-Agents as Search Engines in the OS Report.
There used to be a list of Spider User-Agents at
http://www.wadsack.com/robot-list.html but it seems to be empty at the
moment. There's a list from may 2007 at
http://www2.owen.vanderbilt.edu/mike.shor/diversions/analog/RobotInclude.txt
You might want to do a report with FILEINCLUDE /robots.txt, which should
give you a good indication of which search engines are hitting your site.
Aengus
+------------------------------------------------------------------------
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Analog Documentation: http://analog.cx/docs/Readme.html
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
+------------------------------------------------------------------------
More information about the analog-help
mailing list