Convert HTML to text or remove HTML markup with Detagger
Convert your text files into
web pages (like this one was)
Converted by AscToHTM
Are you using your clipboard
to it's fullest potential?


Search engine robots that visit your web site

Contents of this page

Search engine robots and others
Link Checkers, Link monitors and bookmark managers
FTP clients and download managers
Research projects
Software packages
Offline browsers and other agents
Other miscellaneous agents
Sites that regularly visit
Other useful sites
...And finally, some fakers
Awards for this page

Search engines and other sites send robots to read and index your pages. This page reverses that process and indexes the robots. This information has been gleaned by looking at the server logs for You can read a detailed description of how we hunt spiders

Whenever a page is read from a web site, the log file records a number of details including the time, the IP address and usually the referrer page and the user agent. You can see this in our analysis of a server log sample.

Unlike many pages that list web robots, this page actually tries to go visit the robots themselves. Where possible links are provided to the robots home pages, and descriptions are given of what they're up to. This page is updated regularly as more information is found (the last update was on 30-Jan-2006).

Well behaved robots will identify themselves, often supplying web or email addresses you can contact. In any case, the pattern of pages being read and the IP addresses being used soon sorts the men from the robots.

Good robots will read robots.txt to see what your site policy is, but there are other ways of spotting robots. In addition to the search engine robots, other "user agents" will visit your site, e.g. to validate links to your site from other people's pages. Often these will just access the HEAD of the file, rather than doing a GET on the whole file.

You can also visit our page describing the engines in some detail.

This page is regularly converted from this text file by the author's own text to HTML converter AscToHTM. The last update was on 30-Jan-2006. This software is available as shareware (cost $30)

Search engine robots and others

The following table lists the search engines that spider the web, the IP addresses that they use, and the robot names they send out to visit your site. Version numbers are usually included in the robot names, but are omitted here except where it implies a visit from a different IP address or (as in inktomi) a different search engine.

Often multiple IP addresses are used, in which case we just give a flavour of the names or numbers. Inktomi is a company that offers search engine technology and is used by a number of sites (e.g. and

Wherever <nn> appears this indicates a number of different digits may be used.

Home page/search engine Robot identifier IP address(es) AbachoBOT abcdatos_botlink AESOP_com_SpiderMan crawler ( ia_archiver

tv<nn> AltaVista-Intranet FAST-WebCrawler
  Wget Acoon Robot antibot Atomz AxmoRobot Buscaplus Robi CanSeek/ ChristCRAWLER Clushbot Crawler ROBOT/
Agent-admin/ DeepIndex DittoSpyder Jack Speedy Spider ArchitextSpider Musical instrumentss are used
in the name such as
(and the rest of the band)
more recently first names are being
used like
(excite) ArchitectSpider EuripBot Arachnoidea EZResult
Fast PartnerSite Crawler
FAST Data Search Crawler
FAST Data Search Document Retriever KIT-Fireball ???? FyberSearch GalaxyBot geckobot ???
(Genealogical Search Engine)
GenCrawler ???? GeonaBot getRAX Googlebot
c<nn> moget/2.0 Aranha
(inktomi) Slurp/2.0-KiteHourly;
(inktomi) Slurp/2.0-OwlWeekly
(inktomi) Slurp/3.0-AU Hubater
(research centre) IlTrovatore-Setaccio IncyWincy
InfoSeek Sidewinder Mole2/1.0 MP3Bot <..> kuloko-bot/0.2 LNSpiderguy Linknzbot lookbot MantraAgent
(see also
NetResearchServer Lycos_Spider_(T-Rex) bos-spider<n> JoocerBot HenryTheMiragoRobot MojeekBot ??? mozDex/ (within MSNBOT/0.1 Navadoo Crawler ??? Gulliver ObjectsSearch/0.01 OnetSzukaj/ ??? PicoSearch/ PJspider
but it won't let us in :-(
griffon Spider/
??? various (fakes agent on each access)
??? ??? NationalDirectory-SuperSpider dloader(NaverRobot)/
dumrobo(NaverRobot)/ noxtrumbot/
(Chinese language)
Openfind piranha,Shark
??? psbot CrawlerBoy user<n> news<n> QweeryBot AlkalineBOT StackRambler/ SeznamBot Search-10 Fluffy the spider Scrubby/ asterias speedfind ramBot xtreme Kototoi/0.1 SearchByUsa ??? Searchspider/ SightQuestBot/ Spider_Monkey/ Surfnomore Spider v1.1 Robot@SuperSnooper.Com teoma_agent1 Teradex_Mapper ESISmartSpider Spider TraficDublu 81.196.*.*, Tutorial Crawler updated/0.1beta UK Searcher Spider -
(coming soon)
Vivante Link Checker appie uses an address at, a Dutch ISP Nazilla - marvin/infoseek MuscatFerret ferret<nn> WhizBang! Lab ZyBorg
- WIRE WebRefiner: WSCbot ??? Yandex
pet-based search engine
Yellopet-Spider Findexa Crawler ??? YBSbot search engine indexer
<client sites> libwww-perl  


Most browsers identify themselves with a string that begins "Mozilla...". I've chosen not to document those (as yet). Here are a few of the rarer browser identifiers that I've seen.

Browser identifier Information
Voyager browser for the Amiga
(DOS-compatible browser. Linux version under development)
IBrowse (search for IBrowse)
Amiga-based browser
(I think this is a browser. Site is in Japanese)
(Light browser based on the Mozilla code base)
(Linux KDE browser)
(Cross-platform text based browser)
(Cross-platform, small, efficient and standards lead browser)
(Palm handhelds. Written in Python)
Audio Browser
QWeb (Linux browser)
(see also
Text-based browser for text terminals. Runs under Linux
Freeware tabbed browser
Sleipnir (Japanese)
Japanese browser with apparantly an English version available.
(OpenVMS only version of Mosaic, a pre-Netscape browser)
(Macintosh text-only browser)
(text-based browser)

Link Checkers, Link monitors and bookmark managers

Link checkers and bookmark managers are run by people wanting to keep their pages and bookmarks up to date. Being visited by a link checker is good news as it means that someone has linked to you, and cares that you're still alive. Link monitors regularly check your pages for changes, usually because someone has selected your page as "one to watch".

(pause for warm glow :-)

If you have access to the server log, check the referrer page to try and get the URL from which you are linked. Sometimes these URLs are inside password protected parts of sites, so you won't be able to view the page.

If you build up a list of sites that link to you, these are the guys you should tell when you move (moral - never move)

It's also quite common for the Link checker to give no indication of which URL it's coming from. Some link checkers always come from the same IP address, more usually they come from the client's site. It depends on whether the site owner has purchased a copy of the link checking software, or signed up to some centralized link checking service. If you get the client's IP address you can always try visiting that if they blank the referrer URL field, and surfing their site.

Some of these tools appear to imply they're extracting email addresses (e.g. emailSiphon). As such they're probably unwelcome visitors since these addresses are probably being collected for spammers.

A page listing various link checkers (and other tools) can be found at

Robot identifier IP address(es) Link Checker home page
ActiveBookmark <client site>
<client site>
Reciprocal Link Checker, Manager and Page Generator.
<client site>
Meta Tag Generator
ASPSearch URL Checker
<client site>
a site search engine/index maintenance tool
BlogBot <client site>
<client site>
(Japanese Bookmark Checker)
Bookmark Buddy <client site>
Check&Get <client site>
CheckWeb <client site>
(only if you have software listed at that site)
CSE HTML Validator
<client site>
HTML page validator that includes a link checker
amongst it's functions.
DRKSpider <client site> (An Open Source project)
DISCo Watchman <client site>
Email Extractor
<client site>
<email collector> We don't list links to
email collectors on this site
<client site>
<email collector> We don't list links to
email collectors on this site
EmailWolf <client site>
<client site>,1759,1558477,00.asp
A utility written by PC Magazine to fetch icons files
(favicon.ico) for your IE favorites
Favorites Sweeper
<client site>
Another "favorites" tidy-up utility
FreshLinks.exe <client site>
Funnel Web Profiler
<client site>
Profiles your site, including links to/from it
Html Link Validator <client site>
<client site> an open source
HTML parser, that is probably exercising it's
link-checking features.
The Informant
The Intraformant
<client site>
(in Japanese)
InternetPeriscope <client site>
jdwhatsnew.cgi <client site>
JRTS Check Favorites Utility <client site>
Lambda LinkCheck
LinkLint-checkonly --
Linkbot <client site>
Linkman (Mozilla...)
LinkProver <client site>
(Link management cgi script)
LinkScan Server <client site>
LinkSweeper <client site>
Link Valet Online
LinkVerify Spider
Morning Paper <client site>
(notifies webmasters when your pages have moved)
mylinkcheck -- (German)
NetLookout --
NetMind-Minder (retired)

NetMonitor --
Netprospector JavaCrawler <client site>
online link validator
(online link checker - submit your URL)
Rational SiteCheck <client site>
(checks links in the dmoz directory)
<client site>
Java utility that uses the Java HTTPClient class library
SiteBar <client site>
SpurlBot ??? Online bookmark agent
SurfMaster <client site>
SyncIT <client site>
Watchfire WebXM <client site>
WatzNew Agent <client site>
WebSite-Watcher <client site>
WebTrends Link Analyzer <client site>
Weblink Scanner <client site>
Xenu's Link Sleuth <client site>
Z-Add Link Checker <client site?>


Validators check your web pages for HTML correctness and standards compliance. Since other people are unlikely to send a validator to your site, you don't usually see much of this. Consequently the "list" below is restricted to the on-line validators I've used myself.

However if you choose to validate your own site, then the validation attempts will appear in your logs. The following list is thus limited to the on-line validator I use (and recommend) and a URL submission service that I use.

Robot Identifier IP address Validator home page
Tooter This is
used as part of a link submission
agent (

FTP clients and download managers

If you offer files for download, then you'll start to be visited by various FTP clients. Clients like Go!Zilla and GetRight are smart in that they can resume downloads that have been interrupted. This relies on your web server supporting the necessary protocol, but that's fairly standard these days.

If your download files are over 1Mb in size (or if your server is slow), you'll often see the same IP address make multiple partial downloads of your file (look at the file size). In the case of Clients line Go!Zilla and GetRight if these add up to the right number of bytes, then chances are the download succeeded.

Client Identifier FTP Client home page
ChinaClaw (Chinese)
(Chinese download utility)
DLExpert (English and Chinese versions available)
Download Demon
Download Master (Russian)
Download Ninja (Japanese)
Download Wonder
Ez Auto Downloader
Downloads all files of a given type from a site, so it's
more like a site grabber
JetCar (or FlashGet)
Kontiki Client
Mass Downloader
MetaProducts Download Express
NetZip Downloader
Net Vampire
Nitro Downloader
SpeedDownload (for Macintosh)
WebDownloader for X 1.30
(Linux web downloader with X GUI)
WebLeacher (down last time I tried it)
more details at
WebPictures Downloader
Locates and downloads pictures
Can't find the home page, but it's described (in Russian)

Research projects

These agents come from research projects. Of course that's how Google started...

citenikbot/ One-man project due
for release in 2004.
CLIPS-index (French)
French research robot from a linguistics project (?)

Robot from the research centre at Hungarian Acedemy
of Sciences at Crawls from IP

Spider from which is a project to locate
and index XML content on the web. The company is a spin off
from project at INRIA in France, a frequent source of
web robots. The word "xyleme" apparantly relates to the
vascular system in plants, but cleverly must be one of
the very few words to contain the letters "X", "M" and "L"
(although not in that order ;-)
"Data to Knowledge" data miner. Crawls from
Experimental spider from Mitsibushi R&D division
Crawls from IP
Digimarc WebReader
Digimarc search images on the web looking for digital watermatrs
More details at
Spiders from, which would seem to be part
of, a French-based search engine.

The site describes an Interactive Natural
Language encyclopedia that will become a search engine
at Good name, but at present it just
maps back onto the ExpressUs site (not such a good name).
Crawls from IP address
Ideare - SignSite Spiders from Ideare are
a research company producing search engine technology, and are
part owned by Tiscali in Italy, who seem to use their various
tools for different search engines (mp3, images etc).
Some sort of spider that usually visits using
an IP address from within or
Gulper Web Bot
(Open research project to produce opinion-based search engine)

And from the people that brought you xyro (see below),
comes another, newer bot. This one seems to crawl from
the IP address Update more recently
it's also been seen coming from
And then there was "cosmos", crawling from
Seems these people are a webbot factory. Cosmos doesn't
offer an email address.
IRLbot Crawls from
crawls randomly to determine the topology of the web.
KnowItAll a project that
"extracts massive amounts of information from the Web in
an autonomous, scalable manner". Don't they know that
everyone hates a know-it-all? :-)
MJ12bot A dsitributed search
engine project
Research project to index the last weeks' news items
NEC Research Agent
Research "Inquirus" (meta?) search engine
Dutch robot for a research project. Crawls from
sherlock_spider A course project from
Crawls from
S.T.A.L.K.E.R. "My first robot" :-)
Crawls from
Japanese research robot.
Unable to find details on this, but I'm guessing it's
a research spider from Crawls using
the IP
USyd-NLP-Spider research into Natural
Language Processing at University of Sydney, Australia
Chinese search project
Seems to be a spider associated with a French
research institute. Usually crawls using the IP
Zao/0.2 Another Japanese research robot
Crawls from
Zao-Crawler Same as above, but crawled from

Software packages

These agents are the default identifiers for various software packages. Software developers uses these packages to add Internet functionality to their own applications. As such it's impossible to say without looking at the pattern of access what these agents are being used for as the same agent name may be used by different developers fo achieve differemt results.

While many of these packages allow you to change the user agent, some do not, and many developers are too lazy to change the agent string.

Apparantly some form of web-accessing perl module. Possible
included in the Links SQL product produced by
Default agent name used by the Java HTTPClient class. (See also RPT-HTTPClient below)
Default identifier for a set of light-weight perl modules
for retrieving web documents . See
Set of TCP/IP components used in cross-platform development
of internet tools

The PERL programming language comes with a number of
routines for constructing web-aware scripts. This and
related strings are the default user agent identifiers,
although it's perfectly easy to change this to be whatever
you want.

The GNOME http library. A Linux software library
the offers connectivity to the web. Found in many
places on the web. There is a description at
Macromedia Flash Player
Flash movies can contain scripts that can fetch content
from the web (such as other Flash movies or images)

Agent name used in the sample code supplied with
Visual C++ for accessing the web. This may be therefore
be someone running a program they've written based on
that code.
PEAR HTTP_Request class
TPEAR is a framework and distribution system for reusable PHP
Presumably the default identifier for the urllib module
in the Python programming language
RPT-HTTPClient The Java HTTPClient class library
TeamSoft WinInet Component (menus require Java)
Internet software component suite
Free Unix/Linux package for retrieving web pages
WinScripter iNet Tools
COM/DLL object that supports the SMTP and HTTP protocols
A fast web-spidering robot included with the libwww
package (?). See
W3C-WebCon/ a command-line toolkit that allows you
to perform HTTP operations
wxWidgets cross-platform open source C++ GUI builder
which includes "HTML viewing" and much, much more.
Zeus <nnnn> Webster Pro

Offline browsers and other agents

Agent Identifier Agent home page
(Japanese software from the "Eir Project")
ExtractorPro (Bulk email marketing tool. URL deliberately omitted)
FairAd Client (German)
A German pay-to-surf client
JoBo a site downloader
iSiloWeb (for palm pilot)
Kenjin Spider
(Microsoft IE4.0)
NexTools WebAgent
Offline Explorer
NetAttache Offline browser and search engine agent
Details (in Japanese) at
Searchworks Spider
Teleport Pro
Web site copier. English/German versions available
I think this is an offline browser. Site is in Japanese
(Chinese software. Not 100% sure what it does)
Website eXtractor
Convert websites into help files.
Xaldon WebSpider (German)
Offline browser

Other miscellaneous agents

These agents are ones that we've seen, but been unable to get information for, or which are slightly unusual in origin. If you have any additional information on any of these, feel free to send it to

User Agent Information
Ad Muncher
Browser plug-in that monitors the pages as you view them,
and removes all adverts, popup windows etc.
distributed search engine project
browses from (which doesn't make
sense for a distributed search engine :-)
Albert Indexer
Multi-lingual search technology
AnswerChase a personal search robot.
ASPSeek An open source search engine project
Looks to be an online translation tool, much like
Babelfish. Possibly related to
Seems to be the AltaVista personal search agent. The
crawling site is sometimes referred to in the agent name
Avant Browser Browser add-on for Internet Explorer
Beamer (French). A browser accelerator
that requires sites to create a "pagebeamer.txt" file that is
fetched by this agent to do predictive downloads.
beholder or
BravoBrian (may require IE). A content filtering
service that offers protection from pornography and
other unwanted content for children. Comes from IP
Software used to build "Vortals" (vertical portals).
Details (requires Flash) can be found at
Seems to come from who offer B2B
Possibly Adobe Acrobat or Reader or Adobe Acrobat Reader
used with MSIE (I have been unable to confirm this)
Convera Internet Spider
A "RetrievalWare" product which claims to be a multimedia
web cralwer.
ConveraCrawler Probably related to the above
ccubee Crawler technology from
Tool to map the structure of a web site

UA points to, but there's not
much there. It crawls from which
is (Japanese). Bablefish
suggests this is a Japanese company offering search products
Also calls itself an "Intelligent Deep-Web Robotic Agent"
A search engine indexer that will index dynamic content. Indexs from IP
An Open Source project to display Internet information
ina 3D format.
email program no longer available - that's the only reason I'm
prepared to list it on this page.
Excalibur Internet Spider
Expired Domain Sleuth
Hunts down popular, yet expired domain names with
a view to letting you purchase an already popular
domain name.
Everest-Vulcan Inc./ Next-generation
services rechnology (under development)
(Trivia note: Giskard is probably named after the Isaac Asimov robot)
Grub is a distributed, open source web crawler. Users
download the client which then indexes the web as part
of a distibuted effort
Open-source, extensible web crawler project
search engine software for companies and universities

A browser accelerator. The idea is that you browser "through"
their site, taking advantage of their faster Internet connection,
caching and - most importantly - compression (of the file sent
to your browser) in return for their adverts added to the viewed pages.
Such accesses give the webwarper URL as the User Agent, concealing
the true agent of the original user.
More details at*
This was a child-safe browser, nut it seems no associated
page remains
InternetArchive Presumably, but that's in "stealth mode"
Internet Ninja (Japanese Macintosh browser?)
A web monitoring service.
More details at
ipiumBot (French)
A tool that searches for copies of your documents on
the web. Crawls from
InternetAmi IOR robot gathering data for
an English/Swedish translation service.
Searches data situated in open data sources.
Something to do with the European Regional Internet Registry (RIPE)
Browses using IP address
Unable to find
(Too many "Star Wars" references get in the way)
LimeBot Robot searching for information
on cruises. Browses using IP address
Mata Hari
(Internet search agent)
Geographical-based text search tool. Crawls from
Mister Pix II Picture finder
MOSES 2.0 Spider
NOTE Site crashes my version of netscape 4.7
MonkeyCrawl "Futuristic play".
It's not clear to me which of these products this might be,
but I'm assuming it's one of them.
NPBot crawls from (
A trademark protection service
NutchCVS Open source web-search project
Offers "information management" tools
A search application, combining data from multiple sources
Identifier used in a sample perl program in the online
book "Web Client Programming with Perl". The program is
used to check links. Obviously people have tried it, and it works :-) - PAD File Get.
PAD file poller. PAD files describe software applications to
download sites.
(Data mining bot on IP
A Web search agent with neural net intelligence which organizes
and personalizes Web sites and searches.
Phoaks An index or web resources
listed in UseNet. See also
phpMySearch-Crawler a search engine for individual
A free picture and movie locator
Seems to be a project to create a collage of images gathered
from the Internet.
PicSpider (German). Site offers a "picture crate"
according to babelfish, which seems to be some form of
repository. Not sure why it's spidering, but crawls
from 217-20-118-26 which is part of
PintaSpider Unable to find But the spider came from
Pita (Chub.Stanford.EDU) --
PitSpyder Thread<n>0 Unable to find
A bot indexing pictures. Crawls from
crawls from,,
(child-safe content filtering)
Comes from IP, which a lookup
identifies as "Cross Lingual Info Research" in Japan.
RepoMonkey Bait & Tackle

A bit of detective work here. Recent entries in the
the log file link this to the site,
although the robot always appears to come from an IP
address at (a bookmarking service).
Visiting reveals a "coming soon"
site. Looking at the HTML source leads to another page
at (appears
The META tags for this page all appear to be references
to day trading, futures, training and the like, although
we did spot the word "fibonacci" (our favourite :-).
So... possibly a future search engine related to stock
trading?, or maybe the Monkey and Hippo are just feeding
me a red herring?
There's more. The picture on the Kenjin site at is currently the same as
that at HungryHippo. Kenjin is an Autonomy company.

There are several "PingSoft"s around, but I suspect that
this belongs to one of the products listed at (e.g. SmartHunter)
since I was visited froma Chinese IP address.
SilentSurf A surf anonymizer service
SlySearch A site that hunts down infringements
of intellectual property rights.
A web filter that is "ShonenWare", i.e. you should
purchase a Shonen Knife CD if you use it. Shonen Knife
are a great Japanese band, much loved by the late Kurt
Cobain. Sometimes this sets the referrer page to the
band's home page at (or maybe
the users just happen to go there themselves).
CrawlWave (Greek, and requires login)
Crawls from, which is part of the
Athens University of Economics and Business (
(IE add-on that organizes your browsing)
SQ Webscanner
(on holiday last time I looked)
An open-source web proxy cache for Unix systems
An open-source anti-virus program that I saw accessing icons
on my site (!)

Not 100% sure about this one. When it visited me it came
from the WebSense site 63.212.171.* (and a Google search show
others seem to see the same). At the WebSense site you
can find WebCatcher, a product used to monitor
employees web-surfing habits (as near as I can tell).
But as I say, I'm not 100% sure...
Steganos Internet Anonym
A surf anonymizer utility
content tracking product
Tool that surveys the links in the Open Directory
at, checking their status etc.
TaWWWantula Unable to find
Tcl http client package

The default identifier for any software built using
the Tcl HTTP package
TeraCrawl Unable to find
Plagarism prevention system. Crawls from
A broswer plug-in (initially IE only) that searches for
related pages and categories. In my experience this
seems to entail accessing a favicon.ico file on a daily
basis (presumably to refresh the "favorites" list)
Search engine technology, as used at sites such as Now called mnoGoSearch.
unchaos_crawler A search engine that offers a "hybrid"
of human and machine intelligence, but no search box
that I could see :-). Crawls from
unlostBot is "under construction". The robot came
from IP address which is in France.
URLBlaze File/web search utility

Coming soon at (requires flash). This
venture-capital funded site is "running in stealth mode"
before launching the "new new thing" (is that a typo?).
One of the Flash pages defines Utopia (geddit?), and some
of the browsing is done by IP addresses at
UtilMind HTTPGet
A component intended for downloading pages from the web using
standard Microsoft Windows Internet library (winInet.dll)
Listed on
UrlScope Unable to find

Appears to be a log analyzer for Russian BBS systems.
(I may have got that wrong). I found reference to
it being copyright John Gladkih 1998, but I've not found
any URL that gives a description (not even a Russian one).
VCI WebViewer
Web browser object, that may be incorporated into software
A commercial spidering product.
A set of Delphi components offered to build Internet
applications from
Collates search engine results
News-gathering agent

Forms collage from randomly select web images pet project of one of
the authors of Netscape. Seems to come from
differing IP nodes.
WebCompass ??? (quarterdeck search engine software)
WebGenie presumably one of
the CGI-based products available on this site. Possibly
the "Site Sleuth"
Web Hound
Unable to find
Or rather, I found several different "web hounds", so can't tell
which this was,
Web Magnet
this appears to be a tool used by this web consultancy.

A tool to track down and target visitors to your website
Tool to fetch all pictures from a web site
Originates in Korea, and is possibly related to their
National Computerization Agency. Uses IP address
Search engine popularity meter.
(browser filter)
Software that tracks Trademark usage
last time I saw it it was creating 404 errors by adding
&dg.. to each URL. Hopefully they'll fix this (German). Appears to be an interpreter
designed to help automate regular tasks on a Windows PC.
A toolbar that sets up as the default
search engine. There appears to be a lot of negative
press regarding this toolbar
yacy An open source and distributed
search engine project. The above URL seems to redirect
to an IP-based one

http://www-yottashopping-com/. User arent clains this is a
Shopping Search Engine, but the URL requires a login
so I was unable to verify (so I deliberately made
it's URL non-clickable). Crawled from

Sites that regularly visit

Some IP addresses, or sites may regularly visit you, although the user agent may be obscure, blank, or even change.

Here are a few that I've been able to work out

Site address(es) Description

This is a site thet offers a speed-up
to your surfing, in return for being able to
monitoring people's surfing habits. The speed-ups
are acheived through a variety of techniques,
and the monitoring info is sold on, although your
privacy is protected. Visit
for more details. Not known

This site daily reads any xml files submitted to
a shareware site in PAD format. PAD is a means for
describing shareware devised by the Association of
Shareware Professionals ( This site
is performing daily checks, looking to automatically
update its lists with any changes.

Other useful sites

Here are links to other sites you might find useful when looking into web robots
A Bot monitor site, with regular updates and links to
the bot's home pages. A list of HTML validators

A site that lists IP addresses of search engine
bots and others. More comprehensive (and probably
more up to date) that the IP addresses shown on this
page (which tends to record the first IP address seen)

An online syntax checker for robots.txt files.
Enter the URL of your robots.txt file to get it
checked and to see a summary of what effect it will
Mozilla web browser project. This page describes the
conventions used for formatting the User Agent in the
form "Mozilla..."
A site dedicated to the robots.txt file. This page
gives some background to how robots work, although
there list of robots is quite small.
A page collecting together a number of resources to
do with all aspects of web robots.
A site primarily about "cloaking" sites - the art of
making a site look different to different visitors.
Contains articles on how to detect spiders.
A site listing WAP user agent strings. These will
mostly be mobile phones

This site contains a number of forums for topics of
interest to webmasters everywhere. This particular
forum actively discusses robots and search engines
that visit your site.

...And finally, some fakers

Increasingly security and privacy concerns mean that users and companies are wary about giving away information to sites they visit through the user agent and other fields that appear in server logs.

Some browsers will allow you to select the user agent you present when visiting a site. The Opera browser does this, for example, to allow it's users to pretend to be either IE or Netscape when visiting web sites coded in a way that forgets there are other browsers in use.

Also as firewalls become more common, we will see more and more user agent fields beling blocked by the firewall, that will prevent this information being transmitted to the outside world.

Just to prove that you can never rely on the user agent, here is a selection of user agent strings I've seen in my log files that tell us nothing about the software being used (although some of them speak volumes about the person driving the software). I'm omitting any IP addresses I may have to protect the identities of those concerned :-)

"user agent" seen Comments
Bruciebot I'm assured this was created by a regular
in alt.www.webmasters :-)
Blocked by Norton
Geblokkeerd door Norton
Blockeriet von Norton
The agent has been blocked
by Norton Utilities. The refferrer
is also withheld. The second version
is Dutch. No doubt other languages occur
Don't Like AOL Oh dear. This could start a trend!
Don't be so nosey ;-) Hey! you came to my site first, remember? :-)
Don't you wish you knew. Obviously.
Go Away A bit rich from someone who came
to my site! :-)
Field blocked by AtGuard Surfer is behind the AtGuard firewall (now
part of Norton Internet Security 2000) which
prevents the true User Agent being transmitted.
Field blocked by Outpost
Again field is witheld by the software
Isch habe gar kein Browser ;-)
German for "I have no browser" :-)

Or so I thought, until I received the following
from Clemens Marschner

Actually it is German - with Italian accent!
The word refers to an advertisement of the Nescafe
coffee, where a smart Italian convinces a beautiful
lady to stay and drink coffee with her after she knocks
at his door to complain that his car is in the way
of hers. And after she stayed and listened to him
while he prepares the coffee with lots of gestures
and Italian speak, she again asks him to move his car,
and he goes "Isch 'abe gar keine Auto, Signorina" (I
don't even have a car, signorina). Since that
commercial was shown for years, presumably all German
web masters know it...
My Web browser is not of your business True, but no fun.
multiBlocker browser Although this
seems to mainly offer protection against visitor
to your site, they obviously also provide a
user agent blocker for people browsing
Wabbit's don't use browsers Probably the proxy service at
Wot, no browser? (Win67; X; SK) Win67 ?!? Ah... a dream come true!
Who gives a shit? It's as least as good as Lynx Ah yes, but how do we know that?
Who wants to know? I do. :-)

Awards for this page

Spider award for achieving a top 10 position in search engines
Spidering Hacks by Kevin Hemenway and Tara Calishain I've been told this page is referenced in the book Spidering Hacks

All awards gratefully received :-)

This page is © 2000-2005 John A Fotheringham. It may not be reproduced without permission,
although you are welcome to save a copy for personal use to your hard disk.

home - search engines - contact us - news - product index - search this site
Affiliated sites: Starmount - suppliers of CD/DVD duplicators
For more information contact

Converted by AscToHTM