Spiders and Bots [message #165940] |
Sun, 28 August 2011 22:32 |
The Witcher
Messages: 675 Registered: May 2009 Location: USA
Karma:
|
Senior Member |
|
|
I'm interested in the new Spider Manager, but there are a few things I don't quite understand surprise, surprise.
"Useragent:
Spider's useragent string (partial matches are accepted)."????
I found this reference for user agent string, so I assume that these would be user agent strings for:
Bing: Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
GoogLe:Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
NerdByNature: Mozilla/5.0 (compatible; NerdByNature.Bot; http://www.nerdbynature.net/bot) Etc.
I copied those out of my server's stats/requests page, so are these examples of the strings that need to be input? I'm not seeing anything associated with spiders or bots anywhere else except in these! Is this correct?
"IP Addresses:
Comma separated list of IP Addresses used by the spider."????
As for IP addresses I understand that well enough, I just don't comprehend well enough how to associate the IP with Bots actually crawling the site other than checking them one at a time.
In the past I've always used a "robot.txt" file so this is a new development for me. However in the last few weeks I have had one particular IP that shows repeatedly as replying, browsing, or as errors in my log (hundreds of times) apparently requesting access to files or functions that do not exist or are not enabled.
ISP Information lists this as "JPNIC" using a range of IP's from 119.63.192.0 - 119.63.199.255, so far I have copied perhaps 30 or so of the specific IP's from within this range.
So obviously I can input those 30 IP addresses separated by commas, but is there a way to in put the entire range used by this or any other Bot/spider without inserting a hundred or more IP's within the range they use?
"I'm a Witcher, I solve human problems; not always using a sword!"
|
|
|