Convert HTML to text or remove HTML markup with Detagger

Convert your text files into
web pages (like this one was)

Are you using your clipboard
to it's fullest potential?

Search engine robots that visit your web site

Contents of this page

Search engine robots and others
Browsers
Link Checkers, Link monitors and bookmark managers
Validators
FTP clients and download managers
Research projects
Software packages
Offline browsers and other agents
Other miscellaneous agents
Sites that regularly visit
Other useful sites
...And finally, some fakers
Awards for this page

Search engines and other sites send robots to read and index your pages. This page reverses that process and indexes the robots. This information has been gleaned by looking at the server logs for www.jafsoft.com. You can read a detailed description of how we hunt spiders

Whenever a page is read from a web site, the log file records a number of details including the time, the IP address and usually the referrer page and the user agent. You can see this in our analysis of a server log sample.

Unlike many pages that list web robots, this page actually tries to go visit the robots themselves. Where possible links are provided to the robots home pages, and descriptions are given of what they're up to. This page is updated regularly as more information is found (the last update was on 30-Jan-2006).

Well behaved robots will identify themselves, often supplying web or email addresses you can contact. In any case, the pattern of pages being read and the IP addresses being used soon sorts the men from the robots.

Good robots will read robots.txt to see what your site policy is, but there are other ways of spotting robots. In addition to the search engine robots, other "user agents" will visit your site, e.g. to validate links to your site from other people's pages. Often these will just access the HEAD of the file, rather than doing a GET on the whole file.

You can also visit our page describing the engines in some detail.

This page is regularly converted from this text file by the author's own text to HTML converter AscToHTM. The last update was on 30-Jan-2006. This software is available as shareware (cost $30)

Search engine robots and others

The following table lists the search engines that spider the web, the IP addresses that they use, and the robot names they send out to visit your site. Version numbers are usually included in the robot names, but are omitted here except where it implies a visit from a different IP address or (as in inktomi) a different search engine.

Often multiple IP addresses are used, in which case we just give a flavour of the names or numbers. Inktomi is a company that offers search engine technology and is used by a number of sites (e.g. www.snap.com and www.hotbot.com)

Wherever <nn> appears this indicates a number of different digits may be used.

Home page/search engine	Robot identifier	IP address(es)
www.abacho.com	AbachoBOT	srv-ze-robot1.tricus.com
www.abcdatos.com	abcdatos_botlink http://www.abcdatos.com/botlink/	217.126.39.167
www.aesop.com	AESOP_com_SpiderMan	209.189.115.49
www.ah-ha.com	ah-ha.com crawler (crawler@ah-ha.com)	c7pub-216-250-141-186.center7.com
www.alexa.com	ia_archiver	green.alexa.com sarah.alexa.com
www.altavista.com	Scooter Mercator Scooter2_Mercator_3-1.0 roach.smo.av.com-1.0 Tv<nn>_Merc_resh_26_1_D-1.0	test-scooter.pa.alta-vista.net brillo.pa.alta-vista.net av-dev4.pa.alta-vista.net scooter.aveurope.co.uk bigip1-snat.sv.av.com mercator.pa-x.dec.com scooter.pa.alta-vista.net election2000crawl-complaints-to-admin.webresearch.pa-x.dec.com scooter.sv.av.com avfwclient.sv.av.com tv<nn>.sv.av.com
www.altavista.co.uk	AltaVista-Intranet jan.gelin@av.com	host-119.altavista.se
www.alltheweb.com	FAST-WebCrawler crawler@fast.no	209.67.247.154
	www.fast.no/faq/faqfastwebsearch/faqfastwebcrawler.html
	Wget	ext-gw.trd.fast.no
www.acoon.de	Acoon Robot	194.231.42.178
www.antisearch.net	antibot	62.210.155.50
www.atomz.com	Atomz	router-sc.atomz.com index.atomz.com
www.axmo.com	AxmoRobot	194.248.208.82
www.buscaplus.com	Buscaplus Robi http://www.buscaplus.com/robi/
www.canseek.ca	CanSeek/ support@canseek.ca	216.168.111.111
www.christcrawler.com/search.cfm	ChristCRAWLER http://www.christcrawler.com/	207.191.111.231
www.clush.com	Clushbot http://www.clush.com/bot.html	209.249.80.242
www.crawler.de	Crawler admin@crawler.de	crawlit.crawler.de
www.daadle.com	DaAdLe.com ROBOT/	216.12.213.32
www.daum.net	RaBot Agent-admin/ phortse@hanmail.net contact/jylee@kies.co.kr	210.183.28.46 211.50.57.6
	RaBot Agent-admin/ webmaster@kisco.go.kr	202.30.94.34
www.en.deepindex.com	DeepIndex	deepindex.net1.nerim.net
www.ditto.com	DittoSpyder	65.169.94.188
domanova.co.uk	Jack
www.earthcom.info	EARTHCOM.info	194.108.39.74
www.entireweb.com	Speedy Spider	62.13.25.209
www.excite.com	ArchitextSpider	Musical instrumentss are used in the name such as viola.excite.com cello.excite.com piano.excite.com kazoo.excite.com ride.excite.com sabian.excite.com sax.excite.com bugle.excite.com snare.excite.com ziljian.excite.com bongos.excite.com maturana.excite.com mandolin.excite.com piccolo.excite.com kettle.excite.com ichiban.excite.com (and the rest of the band) more recently first names are being used like philip.excite.com peter.excite.con perdita.excite.com macduff.excite.com agouti.excite.com
(excite)	ArchitectSpider	crimpshrine.atext.com ichiban.atext.com
www.eurip.com	EuripBot	81.169.172.30
www.euroseek.net	Arachnoidea arachnoidea@euroseek.net	212.209.54.134
www.ezresults.com	EZResult	216.28.23.59
www.fastsearch.net	Fast PartnerSite Crawler FAST Data Search Crawler FAST Data Search Document Retriever	psprdcrw001.sac2.fastsearch.net 65.198.110.185 69.38.159.128
www.fireball.de	KIT-Fireball	????
http://france.misesajour.com/	france.misesajour.com	66.98.210.71
www.fybersearch.com	FyberSearch	69.49.241.9
www.galaxy.com	GalaxyBot http://www.galaxy.com/galaxybot.html	63.121.41.175
www.geckobot.com	geckobot	???.rdc1.az.coxatwork.com
www.gendoor.com (Genealogical Search Engine)	GenCrawler	????
www.geona.com	GeonaBot	69.59.142.17
www.getrax.com	getRAX	81.169.156.246
www.google.com	Googlebot googlebot@googlebot.com http://googlebot.com/	c<nn>.googlebot.com
www.goo.ne.jp	moget/2.0 moget@goo.ne.jp	202.229.31.13
www.girafa.com	Aranha	Aranha.girafa.com
(inktomi)	Slurp.so/1.0 slurp@inktomi.com	q2004.inktomisearch.com j5006.inktomisearch.com
(inktomi)	Slurp/2.0j slurp@inktomi.com www.inktomisearch.com	202.212.5.34 goo313.goo.ne.jp
(inktomi)	Slurp/2.0-KiteHourly slurp@inktomi.com; www.inktomi.com/slurp.html	y400.inktomi.com
(inktomi)	Slurp/2.0-OwlWeekly spider@aeneid.com www.inktomi.com/slurp.html	209.185.143.198
(inktomi)	Slurp/3.0-AU slurp@inktomi.com	j6000.inktomi.com
www.hubat.com	Hubater	209.114.176.250
www.almaden.ibm.com (research centre)	http://www.almaden.ibm.com/cs/crawler	wfp2.almaden.ibm.com
www.iltrovatore.it	IlTrovatore-Setaccio	213.26.21.8
www.incywincy.com	IncyWincy	64.81.243.66
www.infoseek.com	UltraSeek InfoSeek Sidewinder	cde2c923.infoseek.com cde2c91f.infoseek.com cca26215.infoseek.com
www.intags.de	Mole2/1.0 webmaster@intags.de	217.160.75.10
http://mp3bot.de/	MP3Bot	<..>
www.ip3000.com	C-PBWF-ip3000.com-crawler ip3000.com-crawler	www.ip3000.com
www.istarthere.com	http://www.istarthere.com spider@istarthere.com	66.220.24.80
www.knowledge.com	Knowledge.com/	213.170.2.69
www.kuloko.com	kuloko-bot/0.2	66.90.81.41
www.lexis-nexis.com	LNSpiderguy	firewall5.lexis-nexis.com
www.linknz.co.nz	Linknzbot	202.191.32.67
www.look.com	lookbot	magma.com
www.looksmart.com	MantraAgent	fjupiter.looksmart.com
www.loopimprovements.com (see also www.incywincy.com)	NetResearchServer www.loopimprovements.com/robot.html	leg-64-133-109-250-STK.sprinthome.com
www.lycos.com	Lycos_Spider_(T-Rex)	bos-spider<n>.bos.lycos.com 216.35.194.188
www.joocer.com	JoocerBot	80.46.38.169
www.mirago.co.uk	HenryTheMiragoRobot	194.202.39.46
www.mojeek.com	MojeekBot	???
www.mozdex.com	mozDex/	(within comcast.net)
http://search.msn.com/	MSNBOT/0.1 http://search.msn.com/msnbot.htm)	131.107.163.47
www.navadoo.com	Navadoo Crawler	???
www.northernlight.com	Gulliver	marvin.northernlight.com taz.northernlight.com
www.objectssearch.com	ObjectsSearch/0.01	68.88.244.177
http://szukaj.onet.pl/	OnetSzukaj/	???
www.picosearch.com	PicoSearch/	pipe.picosearch.com
www.portaljuice.com	PJspider	timber.nextopia.com
www.powerinter.net but it won't let us in :-(	DIIbot	node-d8e93393.powerinter.net
http://navi.ocn.ne.jp/	nttdirectory_robot super-robot@super.navi.ocn.ne.jp griffon griffon@super.navi.ocn.ne.jp	lilis00.navi.ocn.ne.jp lilis04.navi.ocn.ne.jp
www.maxbot.com	Spider/maxbot.com admin@maxbot.com	search.wport.com
???	various (fakes agent on each access)	pool0058.cvx2-bradley.dialup.earthlink.net
???	gazz/1.0 gazz@nttrd.com	deleuze.infobee.ne.jp derrida.infobee.ne.jp
???	???	search-8.xift.com
www.nationaldirectory.com	NationalDirectory-SuperSpider	spider.nationaldirectory.com 209.116.58.143
www.naver.com	dloader(NaverRobot)/ dumrobo(NaverRobot)/	211.218.151.209
www.noxtrum.com	noxtrumbot/	194.224.199.52
www.openfind.com (Chinese language)	Openfind piranha,Shark robot-response@openfind.com.tw Openbot/	??? abovenet4.openfind.com
www.picsearch.org	psbot www.picsearch.org/bot.html	217.75.104.26
www.pinpoint.com	CrawlerBoy Pinpoint.com	nitrogen.pinpoint.com
www.petersnews.com	user<n>.ip3000.com	news<n>.petersnews.com
www.qweery.nl	QweeryBot http://qweerybot.qweery.com)	84.82.133.41
www.vestris.com/alkaline	AlkalineBOT	host130.uv-ray.com
www.rambler.ru	StackRambler/	81.222.64.10
www.seznam.cz	SeznamBot	212.80.76.87
www.search-10.com	Search-10	82.41.144.99
www.searchhippo.com	Fluffy the spider info@searchhippo.com)	208.148.122.27
www.scrubtheweb.com	Scrubby/	208.145.190.254
www.singingfish.com	asterias	grouper.singingfish.com
www.speedfind.de	speedfind ramBot xtreme	BWEB.highway.telekom.at
www.s.u-tokyo.ac.jp	Kototoi/0.1	crawler-red3.is.s.u-tokyo.ac.jp
www.searchbyusa.com	SearchByUsa	???
www.searchspider.com	Searchspider/	24.90.243.203
www.sightquest.com	SightQuestBot/ http://www.sightquest.com/bot.htm	64.49.245.212
www.spidermonkey.ca	Spider_Monkey/	66.163.18.197
www.surfnomore.com	Surfnomore Spider v1.1	165.90.194.245
www.supersnooper.com	Robot@SuperSnooper.Com	207.8.212.162
www.teoma.com	teoma_agent1 teoma_admin@hawkholdings.com	63.236.92.148
http://mapper.teradex.com	Teradex_Mapper mapper@teradex.com	65.110.6.26
www.travel-finder.com	ESISmartSpider	202.46.33.15
www.traficdublu.ro	Spider TraficDublu	81.196.., 193.16.218.66
www.tutorgig.com	Tutorial Crawler http://www.tutorgig.com/crawler	216.40.225.75
www.updated.com	updated/0.1beta crawler@updated.com	38.119.96.107
www.uksearcher.co.uk	UK Searcher Spider	-
www.vivante.com (coming soon)	Vivante Link Checker	216.93.167.106
www.walhello.com	appie	uses an address at planet.nl, a Dutch ISP
www.websmostlinked.com	Nazilla	-
www.webwombat.com.au	www.WebWombat.com.au	202.139.99.131
www.webseek.de	marvin/infoseek marvin-team@webseek.de	arthur4.sda.t-online.de
www.webtop.com	MuscatFerret	ferret<nn>.webtop.com
www.whizbanglabs.com	WhizBang! Lab	216.250.143.108
www.wisenut.com	ZyBorg (info@WISEnut.com)	-
www.wire.co.uk	WIRE WebRefiner: webrefiner@wire.co.uk	brighton.wire.co.uk
www.worldsearchcenter.com	WSCbot	???
www.yandex.com	Yandex	ya.yandex.ru
www.yellowpet.com pet-based search engine	Yellopet-Spider	212-82-36-23.ip.zeitraum.com
www.yelo.no	Findexa Crawler	???
www.yourbettersearch.com	YBSbot search engine indexer	12.25.90.3
<client sites>	libwww-perl	www.linpro.no/lwp/
http://verno.ueda.info.waseda.ac.jp/
	Iron33	207.18.183.251

Browsers

Most browsers identify themselves with a string that begins "Mozilla...". I've chosen not to document those (as yet). Here are a few of the rarer browser identifiers that I've seen.

Browser identifier	Information
AmigaVoyager	http://v3.vapor.com/ Voyager browser for the Amiga
xChaos_Arachne	http://browser.arachne.cz/ (DOS-compatible browser. Linux version under development)
IBrowse	www.hisoft.co.uk (search for IBrowse) Amiga-based browser
ICab	www.icab.de/index.html (Macintosh-only)
JustView	http://www3.justsystem.co.jp/download/justview/3.01win1a.html (I think this is a browser. Site is in Japanese)
KMeleon	http://kmeleon.sourceforge.net/ (Light browser based on the Mozilla code base)
Konqueror	www.konqueror.org/konq-browser.html (Linux KDE browser)
Lynx	http://lynx.browser.org/ (Cross-platform text based browser)
OmniWeb	www.omnigroup.com/products/omniweb/ (Macintosh-only)
Opera	www.opera.com (Cross-platform, small, efficient and standards lead browser)
Plucker	www.plkr.org/index.pl/faq#1.1 (Palm handhelds. Written in Python)
pwWebSpeak	www.prodworks.com/issound/catalog/catalog_pwwebspeak.html Audio Browser
QWeb	http://sunsite.auc.dk/qweb/ (Linux browser) (see also http://browswerwatch.internet.com/news/story/qweb8.html)
retawq	http://retawq.sourceforge.net/ Text-based browser for text terminals. Runs under Linux
SlimBrowser	www.flashpeak.com/sbrowser/sbrowser.htm Freeware tabbed browser
Sleipnir	http://sleipnir.pos.to/software/sleipnir/index.html (Japanese) Japanese browser with apparantly an English version available.
VMS_Mosaic	http://vaxa.wvnet.edu/vmswww/vms_mosaic.html (OpenVMS only version of Mosaic, a pre-Netscape browser)
WannaBe	http://mindstory.com/wb2/ (Macintosh text-only browser)
w3m	http://w3m.sourceforge.net/ (text-based browser)

Link Checkers, Link monitors and bookmark managers

Link checkers and bookmark managers are run by people wanting to keep their pages and bookmarks up to date. Being visited by a link checker is good news as it means that someone has linked to you, and cares that you're still alive. Link monitors regularly check your pages for changes, usually because someone has selected your page as "one to watch".

(pause for warm glow :-)

If you have access to the server log, check the referrer page to try and get the URL from which you are linked. Sometimes these URLs are inside password protected parts of sites, so you won't be able to view the page.

If you build up a list of sites that link to you, these are the guys you should tell when you move (moral - never move)

It's also quite common for the Link checker to give no indication of which URL it's coming from. Some link checkers always come from the same IP address, more usually they come from the client's site. It depends on whether the site owner has purchased a copy of the link checking software, or signed up to some centralized link checking service. If you get the client's IP address you can always try visiting that if they blank the referrer URL field, and surfing their site.

Some of these tools appear to imply they're extracting email addresses (e.g. emailSiphon). As such they're probably unwelcome visitors since these addresses are probably being collected for spammers.

A page listing various link checkers (and other tools) can be found at www.softwareqatest.com/qatweb1.html#LINK

Robot identifier	IP address(es)	Link Checker home page
ActiveBookmark	<client site>	http://libmaster.com/software.php
ALink	<client site>	http://www.info-pack.com/alink/ Reciprocal Link Checker, Manager and Page Generator.
AMeta	<client site>	http://www.info-pack.com/ameta/ Meta Tag Generator
ASPSearch URL Checker	<client site>	http://search.santry.com/downloads/ a site search engine/index maintenance tool
BlogBot	<client site>	http://sourceforge.net/projects/blogbot/
BMChecker	<client site>	www.fureai.or.jp/~yoichi37/soft/bmchecker.html (Japanese Bookmark Checker)
Bookmark Buddy	<client site>	www.bookmarkbuddy.net/about.shtml
Check&Get	<client site>	www.checkget.com
CheckWeb	<client site>	www.checkweb.com
CNET_Snoop		www.download.com (only if you have software listed at that site)
CSE HTML Validator	<client site>	www.htmlvalidator.com HTML page validator that includes a link checker amongst it's functions.
DRKSpider	<client site>	www.drk.com.ar/spider/ (An Open Source project)
DISCo Watchman	<client site>	www.t-guild.com/gamesite/Software/Disco_w/Disco_w.htm
DoctorHTML	draco.imagiware.com	http://www2.imagiware.com/RxHTML/
Email Extractor	<client site>	<email collector> We don't list links to email collectors on this site
EmailSiphon	<client site>	<email collector> We don't list links to email collectors on this site
EmailWolf	<client site>	www.pixeltech.com.au/~msw/ewolf/index.html
FavOrg	<client site>	http://www.pcmag.com/article2/0,1759,1558477,00.asp A utility written by PC Magazine to fetch icons files (favicon.ico) for your IE favorites
Favorites Sweeper	<client site>	www.manitoolssoftware.cjb.net Another "favorites" tidy-up utility
FreshLinks.exe	<client site>	www.resqpc.com/features.html
Funnel Web Profiler	<client site>	www.quest.com/funnel_web/profiler/ Profiles your site, including links to/from it
Html Link Validator	<client site>	www.lithopssoft.com/hlv/index.html
HTMLParser	<client site>	http://htmlparser.sourceforge.net/ an open source HTML parser, that is probably exercising it's link-checking features.
The Informant The Intraformant	cosmo.dartmouth.edu	http://informant.dartmouth.edu/
InternetLinkAgent	<client site>	http://www1.odn.ne.jp/freeware/rank/ineternet/internetlinkagent.html (in Japanese)
InternetPeriscope	<client site>	www.lokboxsoftware.com/internetperiscope.asp
javElink	salix.ingetech.com	www.dailydiffs.com
jdwhatsnew.cgi	<client site>	www.jdrowell.com/projects/jdwhatsnew/view
JRTS Check Favorites Utility	<client site>	www.jrtwine.com/Products/CheckFavs/
Lambda LinkCheck	195.139.70.25	www.stud.ifi.uio.no/~lmariusg/download/python/LinkCheck.html
LinkLint-checkonly	--	www.goldwarp.com/bowlin/linklint/
LinkAlarm	linkalarm.com	www.linkalarm.com
Linkbot	<client site>	www.tetranetsoftware.com/products/linkbot.htm
Linkman (Mozilla...)	66.89.128.242	http://www.outertech.com/product.php?product=5
LinkProver	<client site>	www.tafweb.com/linkprover.html
Links	--	http://gossamer-threads.com/scripts/links/ (Link management cgi script)
LinkScan Server	<client site>	www.elsop.com
LinkSweeper	<client site>	www.lss.com.au/lss/windows/ls/linksweeper.htm
Link Valet Online	195.82.114.5	www.htmlhelp.com/tools/valet/
LinkVerify Spider	frances.yourwebhost.com	www.enduser.co.uk/linkverify/
LinkWalker	lw.seventwentyfour.com 209.167.50.23	www.seventwentyfour.com
Morning Paper	<client site>	www.boutell.com/morning/
MoveAnnouncer	--	www.moveannouncer.com (notifies webmasters when your pages have moved)
mylinkcheck	--	www.mylinkcheck.de (German)
NetLookout	--	www.frugalsoft.com
NetMechanic www.elsop.com	gamma.netmechanic2.com	www.netmechanic.com
NetMind-Minder	marvin.netmind.com (retired) gary.netmind.com meg.netmind.com inyanga.netmind.com leo.netmind.com gemini.netmind.com	www.netmind.com
NetMonitor	--	www.modemwizard.com/netmonitor.html
Netprospector JavaCrawler	<client site>	www.actaddons.com/products/netprospector.asp
online link validator	216.93.171.138	www.dead-links.com (online link checker - submit your URL)
Rational SiteCheck	<client site>	www.rational.com/products/teamtest/prodinfo/sitecheck.jtmpl
Robozilla	h-206-<n>-<n>-<n>.netscape.com	http://dmoz.org/ (checks links in the dmoz directory)
RPT-HTTPClient	<client site>	www.purplefrog.com/~thoth/jchecklinks/ Java utility that uses the Java HTTPClient class library
SiteBar	<client site>	www.sitebar.org
SpurlBot	???	www.spurl.net Online bookmark agent
SurfMaster	<client site>	www.maskbit.com/surfmaster.htm
SyncIT	<client site>	www.bookmarksync.com
Watchfire WebXM	<client site>	www.watchfire.com/products/webxm.asp
WatzNew Agent	<client site>	www.watznew.com
WebSite-Watcher	<client site>	www.aignes.com
WebTrends Link Analyzer	<client site>	www.webtrends.com
Weblink Scanner	<client site>	www.iterix.com/products/WeblinkScanner/weblinkScanner.asp
Xenu's Link Sleuth	<client site>	www.snafu.de/~tilman/xenulink.html
Z-Add Link Checker	<client site?>	http://w3.z-add.co.uk/linkcheck/

Validators

Validators check your web pages for HTML correctness and standards compliance. Since other people are unlikely to send a validator to your site, you don't usually see much of this. Consequently the "list" below is restricted to the on-line validators I've used myself.

However if you choose to validate your own site, then the validation attempts will appear in your logs. The following list is thus limited to the on-line validator I use (and recommend) and a URL submission service that I use.

Robot Identifier	IP address	Validator home page
W3C_Validator	abyss.w3.org	http://validator.w3.org/
WDG_Validator/	64.29.16.182	www.htmlhelp.com/tools/validator/
Tooter	selfpromotion.com	www.selfpromotion.com. This is used as part of a link submission agent (trebor@animeigo.com)

FTP clients and download managers

If you offer files for download, then you'll start to be visited by various FTP clients. Clients like Go!Zilla and GetRight are smart in that they can resume downloads that have been interrupted. This relies on your web server supporting the necessary protocol, but that's fairly standard these days.

If your download files are over 1Mb in size (or if your server is slow), you'll often see the same IP address make multiple partial downloads of your file (look at the file size). In the case of Clients line Go!Zilla and GetRight if these add up to the right number of bytes, then chances are the download succeeded.

Client Identifier	FTP Client home page
Alligator	www.nearsoftware.com/alligator/maininfo/
BatchFTP	www.dynamicnet.net/products/batchftp.htm
ChinaClaw	http://download.pchome.net/internet/download/860.html (Chinese) (Chinese download utility)
DA	www.lidan.com www.downloadaccelerator.com
DLExpert	www.yanew.com (English and Chinese versions available)
Download Demon	www.netzip.com
Download Master	www.one.com.ua/dm/ (Russian)
Download Ninja	www.h-fd.org/~mkro/mt/archives/000585.html (Japanese)
Download Wonder	www.forty.com
Ez Auto Downloader	www.anatari.com/ezad/index.html Downloads all files of a given type from a site, so it's more like a site grabber
FreshDownload	www.freshdevices.com/freshdown.html
Go!Zilla	www.gozilla.com
GetRight MyGetRight	www.getright.com
GetSmart	http://getsmart.hypermart.net/
HiDownload	www.hidownload.com
JetCar (or FlashGet)	www.amazesoft.com
Kapere	www.kapere.com/menu.php?lang=english
Kontiki Client	www.kontiki.com/products/index.html
LeechFTP	http://stud.fh-heilbronn.de/~jdebis/leechftp/
LeechGet	www.leechget.de
LightningDownload	www.lightningdownload.com
Mass Downloader	www.geocities.com/SiliconValley/Vista/2865/md.htm
MetaProducts Download Express	www.metaproducts.com/DE.html
NetZip Downloader SmartDownload	www.netzip.com
NetAnts	www.netants.com
NetButler	www.webcelerator.com/netbutler/
NetPumper	www.netpumper.com
Net Vampire	www.netvampire.com
Nitro Downloader	www.klsofttools.com/nitro.html
Octopus	http://moskalyuk.com/octopus/
PuxaRapido	www.puxarapido.com.br
RealDownload	http://service.real.com/help/faq/rdown4/rdownfaqa01.html
SpeedDownload	www.yazsoft.com (for Macintosh)
WebDownloader for X 1.30	www.krasu.ru/soft/chuchelo/features.php3 (Linux web downloader with X GUI)
WebLeacher	www.webleacher.dk (down last time I tried it) more details at www.davecentral.com/projects/thewebleacher/
WebPictures Downloader	www.fullstrong.com Locates and downloads pictures
X-Uploader	Can't find the home page, but it's described (in Russian) on www.compulenta.ru/2002/1/17/24333/

Research projects

These agents come from research projects. Of course that's how Google started...

citenikbot/	http://www.citenik.co.uk/bot.html. One-man project due for release in 2004.
CLIPS-index	http://clips-index.imag.fr/ (French) French research robot from a linguistics project (?)
Computer_and_Automation_Research_Institute_Crawler
	Robot from the research centre at Hungarian Acedemy of Sciences at www.sztaki.hu Crawls from IP 195.111.1.93
cosmos robot@xyleme.com	Spider from www.xyleme.com which is a project to locate and index XML content on the web. The company is a spin off from project at INRIA in France, a frequent source of web robots. The word "xyleme" apparantly relates to the vascular system in plants, but cleverly must be one of the very few words to contain the letters "X", "M" and "L" (although not in that order ;-)
D2KWebCrawler	http://archive.ncsa.uiuc.edu/TechFocus/Projects/NCSA/D2K-_Data_To_Knowledge.html "Data to Knowledge" data miner. Crawls from 141.142.15.21
DiaGem/	Experimental spider from Mitsibushi R&D division www.skyrocket.gr.jp/diagem.html Crawls from IP 203.178.88.244
Digimarc WebReader	Digimarc search images on the web looking for digital watermatrs More details at www.digimarc.com
EchO!/2.0	Spiders from 194.254.160.3, which would seem to be part of www.voila.com, a French-based search engine.
FinaleRobot robot-master@expressus.com	The www.expressus.com site describes an Interactive Natural Language encyclopedia that will become a search engine at www.final-e.com. Good name, but at present it just maps back onto the ExpressUs site (not such a good name). Crawls from IP address 64.114.34.115
Ideare - SignSite	www.ideare.com. Spiders from spider3.tiscalinet.it. Ideare are a research company producing search engine technology, and are part owned by Tiscali in Italy, who seem to use their various tools for different search engines (mp3, images etc).
GentleSpider	Some sort of spider that usually visits using an IP address from within www.research.att.com or crawler.tivra.com
Gulper Web Bot	www.ecsl.cs.sunysb.edu/~maxim/cgi-bin/Link/GulperBot (Open research project to produce opinion-based search engine)
larbin sebastien.ailleret@inria.fr ghi@lcs.mit.edu cosmos	And from the people that brought you xyro (see below), comes another, newer bot. This one seems to crawl from the IP address cremant.inria.fr. Update more recently it's also been seen coming from barracutta.lcs.mit.edu And then there was "cosmos", crawling from pomelos.inria.fr Seems these people are a webbot factory. Cosmos doesn't offer an email address.
IRLbot	http://irl.cs.tamu.edu/crawler. Crawls from 128.194.135.80 crawls randomly to determine the topology of the web.
KnowItAll	www.cs.washington.edu/research/knowitall/ a project that "extracts massive amounts of information from the Web in an autonomous, scalable manner". Don't they know that everyone hates a know-it-all? :-)
MJ12bot	www.majestic12.co.uk/projects/dsearch/ A dsitributed search engine project
MultiText	Research project to index the last weeks' news items http://canola1.uwaterloo.ca/
NEC Research Agent	http://heavenly.nj.nec.com/ Research "Inquirus" (meta?) search engine
OntoSpider	http://ontospider.i-n.info Dutch robot for a research project. Crawls from 195.11.244.52
sherlock_spider	www.sherlock.com.cn. A course project from http://burrowww.cs.indiana.edu:15003/b659/ Crawls from 129.79.245.98
S.T.A.L.K.E.R.	www.seo-tools.net/en/bot.aspx. "My first robot" :-) Crawls from 195.71.117.89
Steeler	www.tkl.iis.u-tokyo.ac.jp/~crawler/crawler.html.en Japanese research robot.
ru-robot 0.1_hseo(at)cs.rutgers.edu	Unable to find details on this, but I'm guessing it's a research spider from www.rutgers.edu. Crawls using the IP teal.rutgers.edu
USyd-NLP-Spider	www.it.usyd.edu.au/~vinci/webcorpus.html research into Natural Language Processing at University of Sydney, Australia
WebGather	http://pccms.pku.edu.cn:8000/ Chinese search project
xyro xcrawler@inria.fr	Seems to be a spider associated with a French research institute. Usually crawls using the IP address vamos.inria.fr
Zao/0.2	www.kototoi.org/zao/ Another Japanese research robot Crawls from 133.11.36.41.
Zao-Crawler	Same as above, but crawled from 133.11.36.40

Software packages

These agents are the default identifiers for various software packages. Software developers uses these packages to add Internet functionality to their own applications. As such it's impossible to say without looking at the pattern of access what these agents are being used for as the same agent name may be used by different developers fo achieve differemt results.

While many of these packages allow you to change the user agent, some do not, and many developers are too lazy to change the agent string.

GT::WWW	Apparantly some form of web-accessing perl module. Possible included in the Links SQL product produced by www.gossamer-threads.com/scripts/index.htm.
HTTPClient	Default agent name used by the Java HTTPClient class. www.innovation.ch/java/HTTPClient/ (See also RPT-HTTPClient below)
HTTP::Lite	Default identifier for a set of light-weight perl modules for retrieving web documents . See www.toybox.ca/http-lite/
IP*Works!	Set of TCP/IP components used in cross-platform development of internet tools www.nsoftware.com/products/ipworks.aspx
libwww-perl	The PERL programming language comes with a number of routines for constructing web-aware scripts. This and related strings are the default user agent identifiers, although it's perfectly easy to change this to be whatever you want.
libghttp	The GNOME http library. A Linux software library the offers connectivity to the web. Found in many places on the web. There is a description at www.fifi.org/doc/libghttp-dev/html/ghttp.html
Macromedia Flash Player	Flash movies can contain scripts that can fetch content from the web (such as other Flash movies or images)
MFC_Tear_Sample	Agent name used in the sample code supplied with Visual C++ for accessing the web. This may be therefore be someone running a program they've written based on that code.
PEAR HTTP_Request class	TPEAR is a framework and distribution system for reusable PHP components http://pear.php.net/
Python-urllib	Presumably the default identifier for the urllib module in the Python programming language www.lib.uchicago.edu/keith/courses/python/class/7/
RPT-HTTPClient	The Java HTTPClient class library
TeamSoft WinInet Component	www.winsoft.sk/wininet.htm (menus require Java) Internet software component suite
wget	www.gnu.org/software/wget/wget.html Free Unix/Linux package for retrieving web pages
WinScripter iNet Tools	www.winscripter.com/wsh/tools/wsInetTools.asp COM/DLL object that supports the SMTP and HTTP protocols
W3CRobot/	A fast web-spidering robot included with the libwww package (?). See www.w3.org/Robot/
W3C-WebCon/	www.w3.org/ComLine a command-line toolkit that allows you to perform HTTP operations
wxWidgets	www.wxwidgets.org cross-platform open source C++ GUI builder which includes "HTML viewing" and much, much more.
Zeus <nnnn> Webster Pro	www.homepagesw.com/webster_overview.htm

Offline browsers and other agents

Agent Identifier	Agent home page
DigOut4U	www.arisem.com/Enu/
DISCoFinder	www.ars.ru/eng/products/discof.asp
eCatch	www.ecatch.com
EirGrabber	http://www2p.biglobe.ne.jp/~eir/index.htm (Japanese software from the "Eir Project")
ExtractorPro	(Bulk email marketing tool. URL deliberately omitted)
FairAd Client	www.hager.co.at/fordelka/fairad.htm (German) A German pay-to-surf client
JoBo	www.matuschek.net/software/jobo/index.html a site downloader
iSiloWeb	www.isilo.com (for palm pilot)
Kenjin Spider	www.autonomy.com
MSIECrawler MSProxy	(Microsoft IE4.0)
NexTools WebAgent	www.vector.co.jp/soft/win95/net/se053030.html
Offline Explorer	www.metaproducts.com/OE.html
NetAttache	Offline browser and search engine agent
PageDown	Details (in Japanese) at http://www01.u-page.so-net.ne.jp/fa2/y_yutaka/share/pagedown.htm
ParaSite	www.ianett.com/parasite/
Searchworks Spider	www.nedesign.com/Phipps/products.html
SiteMapper	www.trellian.com/mapper/index.html
SiteSnagger	http://www.pcmag.com/article2/0,1759,1559896,00.asp
SuperBot	www.sparkleware.com/superbot/index.html
Teleport Pro	www.tenmax.com/teleport/pro/home.htm
URL2File	www.chami.com/free/url2file_wincon.html
Web2Map	www.web2map.com/us/index.htm Web site copier. English/German versions available
WebAuto	www.yanasoft.co.jp/webauto.html I think this is an offline browser. Site is in Japanese
WebCopier	www.maximumsoft.com
Webdup	www.webdup.com (Chinese software. Not 100% sure what it does)
WebFetch	www.webfetch.com
WebReaper	http://www.webreaper.net/
Webrobot	www.multimania.com/dilletb/WebRobot/
Website eXtractor	www.asona.org
WebSnatcher	www.theronwelch.com/websnatcher/
WebStripper	www.solentsoftware.com/webstripper/
WebTwin	www.WebTwin.com Convert websites into help files.
WebVCR	www.netresultscorp.com/fs_webvcr_info.html
WebZIP	www.spidersoft.com
WWWOFFLE	www.gedanken.demon.co.uk/wwwoffle/
Xaldon WebSpider	www.xaldon.de/produkte_webspider.html (German) Offline browser

Other miscellaneous agents

These agents are ones that we've seen, but been unable to get information for, or which are slightly unusual in origin. If you have any additional information on any of these, feel free to send it to info@jafsoft.com

User Agent	Information
Ad Muncher	www.admuncher.com Browser plug-in that monitors the pages as you view them, and removes all adverts, popup windows etc.
ADSAComponent ADSARobot	http://cnds.ucd.ie/adsa/ distributed search engine project Contact postmaster@cnds.ucd.ie browses from acropolis.ucd.ie (which doesn't make sense for a distributed search engine :-)
Albert Indexer	www.albert.com Multi-lingual search technology
AnswerChase	www.answerchase.com a personal search robot.
ASPSeek	www.aspseek.org/about.html. An open source search engine project
ATA-Translation-Service	Looks to be an online translation tool, much like Babelfish. Possibly related to www.atanet.org/
AVSearch	Seems to be the AltaVista personal search agent. The crawling site is sometimes referred to in the agent name
Avant Browser	www.avantbrowser.com Browser add-on for Internet Explorer
Beamer	www.pagebeamer.org/fr/index.php (French). A browser accelerator that requires sites to create a "pagebeamer.txt" file that is fetched by this agent to do predictive downloads.
beholder or e-sense	www.vigiltech.com/esensedisclaim.html www.vigiltech.com/esensedisclaim.html
BravoBrian	http://bstop.bravobrian.it/ (may require IE). A content filtering service that offers protection from pornography and other unwanted content for children. Comes from IP 213.215.133.19
bumblebee@relevare.com	Software used to build "Vortals" (vertical portals). Details (requires Flash) can be found at www.relevare.com/site/
Checkbot	Seems to come from www.oxxfordinfo.com who offer B2B services
contype	Possibly Adobe Acrobat or Reader or Adobe Acrobat Reader used with MSIE (I have been unable to confirm this)
Convera Internet Spider	A "RetrievalWare" product which claims to be a multimedia web cralwer. www.convera.com/Products/rw_ancillis.asp
ConveraCrawler	Probably related to the above
ccubee	Crawler technology from http://empyreum.com/technologies/platforms/ccubee/
Custo	Tool to map the structure of a web site www.netwu.com/custo/
CyberNavi_WebGet	UA points to www.cybertech-inc.co.jp, but there's not much there. It crawls from 222.151.213.124 which is http://bsearchtech.com/ (Japanese). Bablefish suggests this is a Japanese company offering search products
DaviesBot	www.wholeweb.net/web/
deepweb	Also calls itself an "Intelligent Deep-Web Robotic Agent" A search engine indexer that will index dynamic content. www.deepweb.com. Indexs from IP 66.96.221.180
EbiNess	http://sourceforge.net/projects/ebiness An Open Source project to display Internet information ina 3D format.
EmailWolf	www.pixeltech.com.au/~msw/ewolf/ email program no longer available - that's the only reason I'm prepared to list it on this page.
Excalibur Internet Spider	www.excalib.com/products/ispi/index.shtml
Expired Domain Sleuth	Hunts down popular, yet expired domain names with a view to letting you purchase an already popular domain name. www.expireddomainsleuth.com
Everest-Vulcan Inc./	http://everest.vulcan.com/crawlerhelp Next-generation services rechnology (under development)
GigaBaz GigaBazVStheWeb crawler@brainbot.com	http://brainbot.com/
Giskard	www.oralco.com (Trivia note: Giskard is probably named after the Isaac Asimov robot)
grub-client	Grub is a distributed, open source web crawler. Users download the client which then indexes the web as part of a distibuted effort www.grub.org/html/documents.php
heritrix	Open-source, extensible web crawler project http://crawler.archive.org/
htdig	www.htdig.org search engine software for companies and universities
http://webwarper.net	A browser accelerator. The idea is that you browser "through" their site, taking advantage of their faster Internet connection, caching and - most importantly - compression (of the file sent to your browser) in return for their adverts added to the viewed pages. Such accesses give the webwarper URL as the User Agent, concealing the true agent of the original user. More details at http://webwarper.net/ww.pl/0/wwgz/about.htm?*
infoGIST	www.infogist.com
InterGO	www.teachersoft.com http://browserwatch.internet.com/news/story/intergo1.html This was a child-safe browser, nut it seems no associated page remains
InternetArchive	Presumably www.internetarchive.com, but that's in "stealth mode"
Internet Ninja	www.ifour.co.jp (Japanese Macintosh browser?)
InternetSeer	A web monitoring service. More details at www.internetseer.com/
ipiumBot	www.laurion.com/ipium-analysis.html (French) A tool that searches for copies of your documents on the web. Crawls from petula.laurion.net
InternetAmi IOR	www.internetami.se/ior.html robot gathering data for an English/Swedish translation service.
InsumaScout/	www.insuma.de/insuma/de/SEscout.html Searches data situated in open data sources.
Katriona	Something to do with the European Regional Internet Registry (RIPE) Browses using IP address 213.219.19.148
larbin	http://pauillac.inria.fr/~ailleret/prog/larbin/index-eng.html
LEIA	Unable to find (Too many "Star Wars" references get in the way)
LexiBot	www.lexibot.com
LimeBot	www.cruiselime.com/LimeBot.php Robot searching for information on cruises. Browses using IP address 24.42.113.89
logikabot	www.logika.net
Mata Hari	www.thewebtools.com (Internet search agent)
metabot	Geographical-based text search tool. Crawls from 66.28.23.147 www.metacarta.com/products.htm
Mister Pix II	Picture finder www.mister-pix.com/en/home.htm
MOSES 2.0 Spider	www.ideas2internet.com/products/moses2/ NOTE Site crashes my version of netscape 4.7
MonkeyCrawl	www.monkeymethods.org. "Futuristic play".
NetCruiser	www.netcruiser-software.com/products.html It's not clear to me which of these products this might be, but I'm assuming it's one of them.
NPBot	www.nameprotect.com crawls from 12.148.209.196 (crawler1.crawler918.com) A trademark protection service
NetZippy	www.innerprise.net/usp-spider.asp
NutchCVS	http://lucene.apache.org/nutch/bot.html. Open source web-search project
NZBot	www.navigationzone.com Offers "information management" tools
Opencola	www.opencola.com A search application, combining data from multiple sources
ORA_checksite	www.oreilly.com/openbook/webclient/ch06.html Identifier used in a sample perl program in the online book "Web Client Programming with Perl". The program is used to check links. Obviously people have tried it, and it works :-)
Onekit.com - PAD File Get.	PAD file poller. PAD files describe software applications to download sites.
Oxxbot1	www.oxxfordinfo.com (Data mining bot on IP 216.0.86.75)
Pansophica	http://homepage.mac.com/zigkit/Pansophica/index.html A Web search agent with neural net intelligence which organizes and personalizes Web sites and searches.
Phoaks	www.phoaks.com/index.html. An index or web resources listed in UseNet. See also www.public.iastate.edu/~CYBERSTACKS/Aristotle.htm
phpMySearch-Crawler	http://phpMySearch.web4.hm a search engine for individual sites.
PICgrabber	A free picture and movie locator www.movies-free.net
PictureOfInternet erik@malfunction.org	www.malfunction.org/poi Seems to be a project to create a collage of images gathered from the Internet.
PicSpider	www.bildkiste.de.vu (German). Site offers a "picture crate" according to babelfish, which seems to be some form of repository. Not sure why it's spidering, but crawls from 217-20-118-26 which is part of internetserviceteam.com
PintaSpider	Unable to find But the spider came from www.cnet.fr
Pita (Chub.Stanford.EDU)	--
PitSpyder Thread<n>0	Unable to find
psbot	www.picsearch.org/bot.html A bot indexing pictures. Crawls from ps.direct2internet.com
PolyBot	http://cis.poly.edu/polybot/ crawls from weasel.poly.edu, grampus.poly.edu, bumblebee.poly.edu
PureSight	www.puresight.com/Products/PureSightHomeDescription.htm (child-safe content filtering)
Rumours-Agent	Comes from IP 202.214.69.131, which a lookup identifies as "Cross Lingual Info Research" in Japan.
RepoMonkey Bait & Tackle	A bit of detective work here. Recent entries in the the log file link this to the site www.hungryhippo.com, although the robot always appears to come from an IP address at backflip.com (a bookmarking service). Visiting www.hungryhippo.com reveals a "coming soon" site. Looking at the HTML source leads to another page at www.mezzaluna.net/hungryhippo.com/ (appears identical). The META tags for this page all appear to be references to day trading, futures, training and the like, although we did spot the word "fibonacci" (our favourite :-). So... possibly a future search engine related to stock trading?, or maybe the Monkey and Hippo are just feeding me a red herring? There's more. The picture on the Kenjin site at www.kenjin.com/kenjin/info.html is currently the same as that at HungryHippo. Kenjin is an Autonomy company.
Robot2.0(PingSoft)	There are several "PingSoft"s around, but I suspect that this belongs to one of the products listed at http://www.pingsoft.net/ (e.g. SmartHunter) since I was visited froma Chinese IP address.
SilentSurf	www.silentsurf.com. A surf anonymizer service
SlySearch slysearch@slysearch.com	www.slysearch.com. A site that hunts down infringements of intellectual property rights.
SpaceBison	http://www.proxomitron.org/ A web filter that is "ShonenWare", i.e. you should purchase a Shonen Knife CD if you use it. Shonen Knife are a great Japanese band, much loved by the late Kurt Cobain. Sometimes this sets the referrer page to the band's home page at www.mmjp.or.jp/knife/ (or maybe the users just happen to go there themselves).
CrawlWave	www.spiderwave.aueb.gr (Greek, and requires login) Crawls from 195.251.252.44, which is part of the Athens University of Economics and Business (www.aueb.gr)
SpotOn	www.spoton.com (IE add-on that organizes your browsing)
SQ Webscanner	http://macinsearch.com/users/webscanner/ (on holiday last time I looked)
Squid	www.squid-cache.org An open-source web proxy cache for Unix systems
SquidClamAV_Redirector	http://freshmeat.net/projects/scavr/?branch_id=54042&release_id=188491 An open-source anti-virus program that I saw accessing icons on my site (!)
Sqworm	Not 100% sure about this one. When it visited me it came from the WebSense site 63.212.171.* (and a Google search show others seem to see the same). At the WebSense site you can find WebCatcher, a product used to monitor employees web-surfing habits (as near as I can tell). But as I say, I'm not 100% sure... www.websense.com/products/about/webcatcher/index.cfm
Steganos Internet Anonym	www.steganos.com/?layout=default&content=products_siapro&language=en A surf anonymizer utility
SurfControl	www.surfcontrol.com/products/web/default.aspx content tracking product
Tagword	Tool that surveys the links in the Open Directory at http://dmoz.org, checking their status etc. See http://tagword.com/dmoz_survey.php
TaWWWantula	Unable to find
Tcl http client package	The default identifier for any software built using the Tcl HTTP package http://tcl.activestate.com/software/tcltk/ http://tcl.activestate.com/man/tcl8.0/TclCmd/http.htm
TeraCrawl	Unable to find
TurnitinBot	www.turnitin.com Plagarism prevention system. Crawls from 64.140.48.25
UCmore	www.ucmore.com A broswer plug-in (initially IE only) that searches for related pages and categories. In my experience this seems to entail accessing a favicon.ico file on a daily basis (presumably to refresh the "favorites" list)
UdmSearch	http://search.mnogo.ru/ Search engine technology, as used at sites such as www.maplesearch.com. Now called mnoGoSearch.
unchaos_crawler	www.unchaos.com. A search engine that offers a "hybrid" of human and machine intelligence, but no search box that I could see :-). Crawls from 192.115.134.201
unlostBot unlostBot@unlost.com	www.unlost.com is "under construction". The robot came from IP address 212.37.219.147 which is in France.
URLBlaze	File/web search utility www.urlblaze.net
utopy crawler@utopy.com	Coming soon at www.utopy.com (requires flash). This venture-capital funded site is "running in stealth mode" before launching the "new new thing" (is that a typo?). One of the Flash pages defines Utopia (geddit?), and some of the browsing is done by IP addresses at ...myutopy.com.
UtilMind HTTPGet	A component intended for downloading pages from the web using standard Microsoft Windows Internet library (winInet.dll) Listed on www.utilmind.com/delphi2.html
UrlScope	Unable to find
Vagabondo	Appears to be a log analyzer for Russian BBS systems. (I may have got that wrong). I found reference to it being copyright John Gladkih 1998, but I've not found any URL that gives a description (not even a Russian one).
VCI WebViewer	Web browser object, that may be incorporated into software www.homepagesw.com/webster_dl.htm
vspider	www.verity.com/products/intspider/ A commercial spidering product.
WAVETools	A set of Delphi components offered to build Internet applications from www.transerve.com
Webbandit	http://softwaresolutions.net/webbandit/index.htm Collates search engine results
Webclipping.com	www.Webclipping.com News-gathering agent
webcollage	Forms collage from randomly select web images www.jwz.org/webcollage/ pet project of one of the authors of Netscape. Seems to come from differing IP nodes.
WebCompass	??? (quarterdeck search engine software)
WebGenie	www.webgenie.com/products.html. presumably one of the CGI-based products available on this site. Possibly the "Site Sleuth"
Web Hound	Unable to find Or rather, I found several different "web hounds", so can't tell which this was,
Web Magnet	www.webmagnet.com this appears to be a tool used by this web consultancy.
WebMiner	Either http://www.tribolic.com/webminer/ or http://www.webminer.com/webminer/index.cfm?section=overview A tool to track down and target visitors to your website
WebPix	Tool to fetch all pictures from a web site www.netwu.com/webpix/
Webpush	www.webhauler.com/webpush.htm
WebSymmetrix	Originates in Korea, and is possibly related to their National Computerization Agency. Uses IP address 210.183.28.39
webrank	www.webrank.com/features.asp Search engine popularity meter.
webwasher	www.webwasher.com/en/products/wwash/functions.htm (browser filter)
WhosTalking	http://softwaresolutions.net/whostalking/ Software that tracks Trademark usage last time I saw it it was creating 404 errors by adding &dg.. to each URL. Hopefully they'll fix this
www.MacroX.de	www.macrox.de (German). Appears to be an interpreter designed to help automate regular tasks on a Windows PC.
XupiterToolbar	A toolbar that sets up www.xupiter.com as the default search engine. There appears to be a lot of negative press regarding this toolbar
yacy	http://yacy.net/home.html. An open source and distributed search engine project. The above URL seems to redirect to an IP-based one
YottaShopping_Bot	http://www-yottashopping-com/. User arent clains this is a Shopping Search Engine, but the URL requires a login so I was unable to verify (so I deliberately made it's URL non-clickable). Crawled from 64.62.175.133

Sites that regularly visit

Some IP addresses, or sites may regularly visit you, although the user agent may be obscure, blank, or even change.

Here are a few that I've been able to work out

Site address(es)	Description
proxy.netsetter.org	This is a site thet offers a speed-up to your surfing, in return for being able to monitoring people's surfing habits. The speed-ups are acheived through a variety of techniques, and the monitoring info is sold on, although your privacy is protected. Visit www.netsetter.org for more details.
pwoshoes.transport.com	Not known
...lightrealm.com	This site daily reads any xml files submitted to a shareware site in PAD format. PAD is a means for describing shareware devised by the Association of Shareware Professionals (www.asp-shareware.org). This site is performing daily checks, looking to automatically update its lists with any changes.

Other useful sites

Here are links to other sites you might find useful when looking into web robots

www.botspot.com	A Bot monitor site, with regular updates and links to the bot's home pages.
www.htmlhelp.com/links/validators.htm	A list of HTML validators
www.iplists.com	A site that lists IP addresses of search engine bots and others. More comprehensive (and probably more up to date) that the IP addresses shown on this page (which tends to record the first IP address seen)
http://tool.motoricerca.info/robots-checker.phtml	An online syntax checker for robots.txt files. Enter the URL of your robots.txt file to get it checked and to see a summary of what effect it will have.
www.mozilla.org/build/revised-user-agent-strings.html	Mozilla web browser project. This page describes the conventions used for formatting the User Agent in the form "Mozilla..."
www.robotstxt.org/wc/robots.html	A site dedicated to the robots.txt file. This page gives some background to how robots work, although there list of robots is quite small.
www.searchtools.com/robots/	A page collecting together a number of resources to do with all aspects of web robots.
www.spiderhunter.com	A site primarily about "cloaking" sites - the art of making a site look different to different visitors. Contains articles on how to detect spiders.
www.webcab.de/wapua.htm	A site listing WAP user agent strings. These will mostly be mobile phones
www.webmasterworld.com/forum11/index.htm	This site contains a number of forums for topics of interest to webmasters everywhere. This particular forum actively discusses robots and search engines that visit your site.

...And finally, some fakers

Increasingly security and privacy concerns mean that users and companies are wary about giving away information to sites they visit through the user agent and other fields that appear in server logs.

Some browsers will allow you to select the user agent you present when visiting a site. The Opera browser does this, for example, to allow it's users to pretend to be either IE or Netscape when visiting web sites coded in a way that forgets there are other browsers in use.

Also as firewalls become more common, we will see more and more user agent fields beling blocked by the firewall, that will prevent this information being transmitted to the outside world.

Just to prove that you can never rely on the user agent, here is a selection of user agent strings I've seen in my log files that tell us nothing about the software being used (although some of them speak volumes about the person driving the software). I'm omitting any IP addresses I may have to protect the identities of those concerned :-)

"user agent" seen	Comments
Bruciebot	I'm assured this was created by a regular in alt.www.webmasters :-)
Blocked by Norton Geblokkeerd door Norton Blockeriet von Norton	The agent has been blocked by Norton Utilities. The refferrer is also withheld. The second version is Dutch. No doubt other languages occur
Don't Like AOL	Oh dear. This could start a trend!
Don't be so nosey ;-)	Hey! you came to my site first, remember? :-)
Don't you wish you knew.	Obviously.
Go Away	A bit rich from someone who came to my site! :-)
Field blocked by AtGuard	Surfer is behind the AtGuard firewall (now part of Norton Internet Security 2000) which prevents the true User Agent being transmitted. http://home.pages.at/atguard/
Field blocked by Outpost	www.agnitum.com Again field is witheld by the software
Isch habe gar kein Browser ;-)	German for "I have no browser" :-) Or so I thought, until I received the following from Clemens Marschner Actually it is German - with Italian accent! The word refers to an advertisement of the Nescafe coffee, where a smart Italian convinces a beautiful lady to stay and drink coffee with her after she knocks at his door to complain that his car is in the way of hers. And after she stayed and listened to him while he prepares the coffee with lots of gestures and Italian speak, she again asks him to move his car, and he goes "Isch 'abe gar keine Auto, Signorina"* (I* don't even have a car, signorina). Since that commercial was shown for years, presumably all German web masters know it...
My Web browser is not of your business	True, but no fun.
multiBlocker browser	www.multiblocker.com/home.html Although this seems to mainly offer protection against visitor to your site, they obviously also provide a user agent blocker for people browsing
Wabbit's don't use browsers	Probably the proxy service at http://rabbit-proxy.sourceforge.net/
Wot, no browser? (Win67; X; SK)	Win67 ?!? Ah... a dream come true!
Who gives a shit? It's as least as good as Lynx	Ah yes, but how do we know that?
Who wants to know?	I do. :-)

Awards for this page

Spider award for achieving a top 10 position in search engines

Spidering Hacks by Kevin Hemenway and Tara Calishain

I've been told this page is referenced in the book Spidering Hacks

All awards gratefully received :-)

This page is © 2000-2005 John A Fotheringham. It may not be reproduced without permission,
although you are welcome to save a copy for personal use to your hard disk.

home - search engines - contact us - news - product index - search this site
Affiliated sites: Starmount - suppliers of CD/DVD duplicators
For more information contact info@jafsoft.com.