$_$_BEGIN_HTML
$_$_END_HTML
$_$_TITLE Search engine robots
$_$_DESCRIPTION This page lists the search engine robots known to JafSoft Limited
$_$_KEYWORDS scooter, gulliver, slurp, googlebot, netmind, alexa, ia_archiver, architectspider,
$_$_KEYWORDS ultraseek, lycos_spider, diibot, nttdirectory_robot, Linkwalker,
$_$_KEYWORDS linkalarm, linklint, linkscan, linkchecker, linkverify, linkbot,
$_$_KEYWORDS xenu's link sleuth, go!zilla, getright, getsmart, download wonder,
$_$_KEYWORDS netzip download, ecatch, MSIECrawler, MSProxy, CNET_Snoop, search engine robots
$_$_TABLE_HEADER_ROWS 1
$_$_TABLE_MIN_COLUMN_SEPARATION 2
$_$_CHANGE_POLICY column merging factor : 0
$_$_CHANGE_POLICY default table width : 75%
$_$_RESET_HTML_FRAGMENT HTML_HEADER
$_$_BEGIN_HTML
Are you using your clipboard to it's fullest potential? |
|
|
|
Search engine robots that visit your web site
$_$_END_HTML
*Contents of this page*
$_$_CONTENTS_LIST
Search engines and other sites send robots to read and index your pages. This page
reverses that process and indexes the robots. This information has been
gleaned by looking at the server logs for www.jafsoft.com. Whenever a page is
read from a web site, the log file records a number of details including the
time, the IP address and usually the referrer page and the user agent. You
can see this in our [[HYPERLINK URL,log_sample.html,"analysis of a server log sample"]].
Unlike many pages that list web robots, this page actually tries to
go visit the robots themselves. Where possible links are provided to the
robots home pages, and descriptions are given of what they're up to. This page is
updated regularly as more information is found (the last update was on *[[TIMESTAMP]]*).
Well behaved robots will identify themselves, often supplying web or email
addresses you can contact. In any case, the pattern of pages being read and the
IP addresses being used soon sorts the men from the robots.
Good robots will read robots.txt to see what your site policy is, but there are
other ways of spotting robots. In addition to the search engine robots, other
"user agents" will visit your site, e.g. to validate links to your site from
other people's pages. Often these will just access the HEAD of the file, rather
than doing a GET on the whole file.
You can also visit our page [search_engines].
_*This page is regularly converted from this [[SOURCE_FILE "text file"]] by the author's
own text to HTML converter [AscToHTM_abs]. The last update was on [[TIMESTAMP]]. This
software is available as shareware (cost $40)*_
Search engine robots and others
===============================
The following table lists the search engines that spider the web, the IP
addresses that they use, and the robot names they send out to visit your site.
Version numbers are usually included in the robot names, but are omitted here
except where it implies a visit from a different IP address or (as in inktomi)
a different search engine.
Often multiple IP addresses are used, in which case we just give a flavour of the
names or numbers. Inktomi is a company that offers search engine technology and
is used by a number of sites (e.g. www.snap.com and www.hotbot.com)
Wherever appears this indicates a number of different digits may be used.
$_$_BEGIN_TABLE
$_$_TABLE_MAY_BE_SPARSE
$_$_TABLE_ALIGN CENTER
Home page/search engine | Robot identifier | IP address(es)
=======================================================================================
www.aesop.com | AESOP_com_SpiderMan | 209.189.115.49
| |
www.alexa.com | ia_archiver | green.alexa.com
| | sarah.alexa.com
| |
www.altavista.com | Scooter | test-scooter.pa.alta-vista.net
| | brillo.pa.alta-vista.net
| | av-dev4.pa.alta-vista.net
| | scooter.aveurope.co.uk
| | bigip1-snat.sv.av.com
| Mercator | mercator.pa-x.dec.com
| | scooter.pa.alta-vista.net
| | election2000crawl-complaints-to-admin.webresearch.pa-x.dec.com
| Scooter2_Mercator_3-1.0 | scooter.sv.av.com
| roach.smo.av.com-1.0 | avfwclient.sv.av.com
| Tv_Merc_resh_26_1_D-1.0 | tv.sv.av.com
| |
www.altavista.co.uk | AltaVista-Intranet | host-119.altavista.se
| jan.gelin@av.com |
| |
www.alltheweb.com | FAST-WebCrawler | 209.67.247.154
| crawler@fast.no |
| www.fast.no/faq/faqfastwebsearch/faqfastwebcrawler.html
| |
| Wget | ext-gw.trd.fast.no
| |
www.acoon.de | Acoon Robot | 194.231.42.178
| |
www.atomz.com | Atomz | router-sc.atomz.com
| |
www.crawler.de | Crawler | crawlit.crawler.de
| admin@crawler.de |
| |
www.daum.net | RaBot | 210.183.28.46
| Agent-admin/ phortse@hanmail.net |
| contact/jylee@kies.co.kr | 211.50.57.6
| |
| RaBot | 202.30.94.34
| Agent-admin/ webmaster@kisco.go.kr |
| |
www.excite.com | ArchitextSpider | Musical instrumentss are used
| | in the name such as viola.excite.com
| | cello.excite.com
| | piano.excite.com
| | kazoo.excite.com
| | ride.excite.com
| | sabian.excite.com
| | sax.excite.com
| | bugle.excite.com
| | snare.excite.com
| | ziljian.excite.com
| | bongos.excite.com
| | maturana.excite.com
| | mandolin.excite.com
| | piccolo.excite.com
| | kettle.excite.com
| | ichiban.excite.com
| | (and the rest of the band)
| | more recently first names are being
| | used like philip.excite.com
| | peter.excite.con
| | perdita.excite.com
| | macduff.excite.com
| | agouti.excite.com
| |
| |
| |
(excite) | ArchitectSpider | crimpshrine.atext.com
| | ichiban.atext.com
| |
www.euroseek.net | Arachnoidea | 212.209.54.134
| arachnoidea@euroseek.net |
| |
www.ezresults.com | EZResult | 216.28.23.59
| |
www.findsame.com | DIIbot | 207.230.106.188
(see also www.powerinter.net | robot@digital-integrity.com |
below) | |
| |
www.fireball.de | KIT-Fireball | ????
| |
www.geckobot.com | geckobot | ???.rdc1.az.coxatwork.com
| |
www.gendoor.com | GenCrawler | ????
(Genealogical Search Engine) | |
| |
www.google.com | Googlebot | c.googlebot.com
| googlebot@googlebot.com |
| http://googlebot.com/ |
| |
www.goo.ne.jp | moget/2.0 | 202.229.31.13
| moget@goo.ne.jp |
| |
(inktomi) | Slurp.so/1.0 | q2004.inktomisearch.com
| slurp@inktomi.com | j5006.inktomisearch.com
| |
(inktomi) | Slurp/2.0j | 202.212.5.34
| slurp@inktomi.com | goo313.goo.ne.jp
| www.inktomisearch.com |
| |
(inktomi) | Slurp/2.0-KiteHourly | y400.inktomi.com
| slurp@inktomi.com; |
| www.inktomi.com/slurp.html |
| |
(inktomi) | Slurp/2.0-OwlWeekly | 209.185.143.198
| spider@aeneid.com |
| www.inktomi.com/slurp.html |
| |
(inktomi) | Slurp/3.0-AU | j6000.inktomi.com
| slurp@inktomi.com |
| www.inktomisearch.com |
| |
www.hubat.com | Hubater | 209.114.176.250
| |
www.infoseek.com | UltraSeek | cde2c923.infoseek.com
| | cde2c91f.infoseek.com
| InfoSeek Sidewinder | cca26215.infoseek.com
| |
www.informatch.com/mediabot/ | MP3Bot | 212.204.169.52
| |
www.ip3000.com | C-PBWF-ip3000.com-crawler | www.ip3000.com
| ip3000.com-crawler |
| |
www.lexis-nexis.com | LNSpiderguy | firewall5.lexis-nexis.com
| |
www.looksmart.com | MantraAgent | fjupiter.looksmart.com
| |
www.lycos.com | Lycos_Spider_(T-Rex) | bos-spider.bos.lycos.com
| | 216.35.194.188
| |
www.mirago.co.uk | HenryTheMiragoRobot | 194.202.39.46
| |
www.northernlight.com | Gulliver | marvin.northernlight.com
| | taz.northernlight.com
| |
www.portaljuice.com | PJspider | timber.nextopia.com
| |
www.powerinter.net | DIIbot | node-d8e93393.powerinter.net
but it won't let us in :-( | |
| |
http://navi.ocn.ne.jp/ | nttdirectory_robot | lilis00.navi.ocn.ne.jp
| super-robot@super.navi.ocn.ne.jp |
| griffon | lilis04.navi.ocn.ne.jp
| griffon@super.navi.ocn.ne.jp |
| |
www.maxbot.com | Spider/maxbot.com | search.wport.com
| admin@maxbot.com |
| |
??? | various (fakes agent on each access) | pool0058.cvx2-bradley.dialup.earthlink.net
| |
??? | gazz/1.0 | deleuze.infobee.ne.jp
| gazz@nttrd.com | derrida.infobee.ne.jp
| |
??? | ??? | search-8.xift.com
| |
www.nationaldirectory.com | NationalDirectory-SuperSpider | spider.nationaldirectory.com
| | 209.116.58.143
| |
www.pinpoint.com | CrawlerBoy Pinpoint.com | nitrogen.pinpoint.com
| |
www.petersnews.com | user.ip3000.com | news.petersnews.com
| |
http://www.vestris.com/alkaline | AlkalineBOT | host130.uv-ray.com
| |
www.singingfish.com | asterias | grouper.singingfish.com
| |
www.speedfind.de | speedfind ramBot xtreme | BWEB.highway.telekom.at
| |
www.surfnomore.com | Surfnomore Spider v1.1 | 165.90.194.245
| |
www.supersnooper.com | Robot@SuperSnooper.Com | 207.8.212.162
| |
www.travel-finder.com | ESISmartSpider | 202.46.33.15
| |
www.uksearcher.co.uk | UK Searcher Spider | -
| |
www.walhello.com | appie | ...speed.planet.nl
| |
www.websmostlinked.com | Nazilla | -
| |
www.webwombat.com.au | www.WebWombat.com.au | 202.139.99.131
| |
www.webtop.com | MuscatFerret | ferret.webtop.com
| |
www.whizbanglabs.com | WhizBang! Lab | 216.250.143.108
| |
| |
www.wisenut.com | ZyBorg | -
(in beta) | (info@WISEnut.com) |
| |
www.wire.co.uk | WIRE WebRefiner: | brighton.wire.co.uk
| webrefiner@wire.co.uk |
| |
www.worldsearchcenter.com | WSCbot | ???
| |
| libwww-perl | www.linpro.no/lwp/
| |
http://verno.ueda.info.waseda.ac.jp/ |
| Iron33 | 207.18.183.251
$_$_END_TABLE
Link Checkers, Link monitors and bookmark managers
==================================================
Link checkers and bookmark managers are run by people wanting to keep their
pages and bookmarks up to date. Being visited by a link checker is good news
as it means that someone has linked to you, and cares that you're still alive.
Link monitors regularly check your pages for changes, usually because someone
has selected your page as "one to watch".
(pause for warm glow :-)
If you have access to the server log, check the referrer page to try and get
the URL from which you are linked. Sometimes these URLs are inside password
protected parts of sites, so you won't be able to view the page.
If you build up a list of sites that link to you, these are the guys you should
tell when you move (moral - never move)
It's also quite common for the Link checker to give no indication of which URL
it's coming from. Some link checkers always come from the same IP address,
more usually they come from the client's site. It depends on whether the site
owner has purchased a copy of the link checking software, or signed up to some
centralized link checking service. If you get the client's IP address you can
always try visiting that if they blank the referrer URL field, and surfing their
site.
Some of these tools appear to imply they're extracting email addresses
(e.g. emailSiphon). As such they're probably unwelcome visitors
since these addresses are probably being collected for spammers. You can read
more about this at www.csc.ncsu.edu/~brabec/antispam.html
A page listing various link checkers (and other tools) can be found at
www.softwareqatest.com/qatweb1.html#LINK
$_$_BEGIN_TABLE
$_$_TABLE_ALIGN CENTER
Robot identifier IP address(es) Link Checker home page
=======================================================================================
LinkWalker lw.seventwentyfour.com www.seventwentyfour.com
209.167.50.23
LinkAlarm linkalarm.com www.linkalarm.com
NetMind-Minder marvin.netmind.com (retired) www.netmind.com
gary.netmind.com
meg.netmind.com
inyanga.netmind.com
leo.netmind.com
gemini.netmind.com
Check&Get http://checkget.udm.net/ (also shown as referrer page)
CheckWeb www.asi.fr/~duby/chkweb.htm
CNET_Snoop www.download.com
(only if you have software listed at that site)
EmailSiphon We don't list information
like this on this site.
EmailWolf www.pixeltech.com.au/~msw/ewolf/index.html
The Informant cosmo.dartmouth.edu http://informant.dartmouth.edu/
The Intraformant
jdwhatsnew.cgi www.jdrowell.com/Linux/Projects/jdwhatsnew
LinkLint-checkonly -- www.goldwarp.com/bowlin/linklint/
javElink salix.ingetech.com www.dailydiffs.com
Lambda LinkCheck 195.139.70.25 www.stud.ifi.uio.no/~lmariusg/download/python/LinkCheck.html
LinkScan Server www.elsop.com
LinkSweeper www.lss.com.au/lss/windows/ls/linksweeper.htm
LinkVerify Spider frances.yourwebhost.com www.enduser.co.uk/linkverify/
Linkbot www.tetranetsoftware.com/products/linkbot.htm
Morning Paper www.boutell.com/morning/
NetLookout -- www.frugalsoft.com/lookout/
NetMechanic gamma.netmechanic2.com www.netmechanic.com
www.elsop.com
Rational SiteCheck www.rational.com/products/teamtest/prodinfo/sitecheck.jtmpl
Robozilla h-206---.netscape.com http://directory.mozilla.org/
(checks links in the dmoz directory)
SyncIT www.bookmarksync.com
WatzNew Agent www.watznew.com
WebTrends Link Analyzer www.webtrends.com
Xenu's Link Sleuth www.snafu.de/~tilman/xenulink.html
$_$_END_TABLE
Validators
==========
Validators check your web pages for HTML correctness and standards compliance.
Since other people are unlikely to send a validator to *your* site, you don't
usually see much of this. Consequently the "list" below is restricted to the
on-line validators I've used myself.
However if you choose to validate your own site, then the validation attempts
will appear in your logs. The following list is thus limited to the on-line
validator I use (and recommend) and a URL submission service that I use.
$_$_BEGIN_TABLE
Robot Identifier IP address Validator home page
====================================================================
W3C_Validator abyss.w3.org http://validator.w3.org/
Tooter selfpromotion.com www.selfpromotion.com. This is
used as part of a link submission
agent (trebor@animeigo.com)
$_$_END_TABLE
FTP clients and download managers
=================================
If you offer files for download, then you'll start to be visited by various FTP
clients. Clients like Go!Zilla and GetRight are smart in that they can resume
downloads that have been interrupted. This relies on your web server supporting
the necessary protocol, but that's fairly standard these days.
If your download files are over 1Mb in size (or if your server is slow), you'll
often see the same IP address make multiple partial downloads of your file (look
at the file size). In the case of Clients line Go!Zilla and GetRight if these
add up to the right number of bytes, then chances are the download succeeded.
$_$_BEGIN_TABLE
$_$_TABLE_LAYOUT 2,"31","255"
$_$_TABLE_ALIGN CENTER
Client Identifier FTP Client home page
================================================
BatchFTP www.dynamicnet.net/products/batchftp.htm
ChinaClaw http://go2.163.com/~22787/chinaclaw.htm (Chinese)
(Chinese download utility)
DA www.lidan.com
www.downloadaccelerator.com
Download Demon www.netzip.com
Download Wonder www.forty.com
Go!Zilla www.gozilla.com
GetRight www.getright.com
MyGetRight
GetSmart http://members.xoom.com/m507/
JetCar (or FlashGet) www.amazesoft.com
LeechFTP http://stud.fh-heilbronn.de/~jdebis/leechftp/
Mass Downloader www.geocities.com/SiliconValley/Vista/2865/md.htm
NetZip Downloader www.netzip.com
SmartDownload
NetAnts www.netants.com
Net Vampire www.netvampire.com
Octopus http://moskalyuk.com/octopus/
RealDownload http://service.real.com/help/faq/rdown4/rdownfaqa01.html
$_$_END_TABLE
Browsers
========
Most browsers identify themselves with a string that begins "Mozilla...".
I've chosen not to document those (as yet). Here are a few of the rarer
browser identifiers that I've seen.
$_$_BEGIN_TABLE
$_$_TABLE_ALIGN CENTER
Browser identifier Information
-------------------------------------------
xChaos_Arachne http://browser.arachne.cz/
(DOS-compatible browser. Linux version under development)
IBrowse http://www.hisoft.co.uk/ (search for IBrowse)
Amiga-based browser
ICab http://www.icab.de/index.html
(Macintosh-only)
Konqueror http://www.konqueror.org/konq-browser.html
(Linux KDE browser)
Lynx http://lynx.browser.org/
(Cross-platform text based browser)
OmniWeb http://www.omnigroup.com/products/omniweb/
(Macintosh-only)
Opera http://www.opera.com/
(Cross-platform, small, efficient and standards lead browser)
pwWebSpeak http://www.prodworks.com/issound/catalog/catalog_pwwebspeak.html
Audio Browser
QWeb http://sunsite.auc.dk/qweb/ (Linux browser)
(see also http://browswerwatch.internet.com/news/story/qweb8.html)
VMS_Mosaic http://vaxa.wvnet.edu/vmswww/vms_mosaic.html
(OpenVMS only version of Mosaic, a pre-Netscape browser)
WannaBe http://mindstory.com/wb2/
(Macintosh text-only browser)
$_$_END_TABLE
Offline browsers and other agents
=================================
$_$_BEGIN_TABLE
$_$_TABLE_ALIGN CENTER
Agent Identifier Agent home page
=================================================
AnswerChase www.answerchase.com/advan.html a personal
search robot.
beholder or www.vigiltech.com/esensedisclaim.html
e-sense www.vigiltech.com/esensedisclaim.html
contype Possibly Adobe Acrobat or Reader or Adobe Acrobat Reader
used with MSIE (I have been unable to confirm this)
DaviesBot www.wholeweb.net/web/
DigOut4U www.arisem.com/Enu/
DISCoFinder www.ars.ru/eng/products/discof.asp
eCatch www.ecatch.com
EirGrabber http://www2p.biglobe.ne.jp/~eir/index.htm
(Japanese software from the "Eir Project")
Excalibur Internet Spider www.excalib.com/products/ispi/index.shtml
ExtractorPro --
FairAd Client www.hager.co.at/fordelka/fairad.htm (German)
A German pay-to-surf client
FavOrg http://www.zdnet.com/pcmag/stories/solutions/0,8224,2649295,00.html
A utility written by PC Magazine to fetch icons files
(favicon.ico) for your IE favorites
Favorites Sweeper www.manitoolssoftware.cjb.net.
Another "favorites" tidy-up utility
GigaBaz http://brainbot.com/web/en/
GigaBazVStheWeb
crawler@brainbot.com
Giskard http://212.145.12.170/ (Spanish)
www.oralco.com
(Trivia note: Giskard is probably named after the Isaac Asimov robot)
infoGIST www.infogist.com
iSiloWeb www.isilo.com/screensh.htm (for palm pilot)
larbin http://pauillac.inria.fr/~ailleret/prog/larbin/index-eng.html
LexiBot www.lexibot.com
Links http://gossamer-threads.com/scripts/links/
(Link management cgi script)
logikabot www.logika.net
Kenjin Spider www.kenjin.com/kenjin/info.html
Mata Hari www.thewebtools.com
(Internet search agent)
MoveAnnouncer www.moveannouncer.com
(notifies webmasters when your pages have moved)
MSIECrawler (Microsoft IE4.0)
MSProxy
NEC Research Agent http://heavenly.nj.nec.com/
Research "Inquirus" (meta?) search engine
NexTools WebAgent www.igsnet.com/igs/wagent.html
Offline Explorer www.metaproducts.com/OE.html
Oxxbot1 www.oxxfordinfo.com
(Data mining bot on IP 216.0.86.75)
NetAttache Offline browser www.tympani.com/store/NAProDownload.html
ParaSite www.ianett.com/parasite/
Phoaks www.phoaks.com/index.html. An index or web resources
listed in UseNet. See also
www.public.iastate.edu/~CYBERSTACKS/Aristotle.htm
Pita (Chub.Stanford.EDU) --
PolyBot http://cis.poly.edu/polybot/
crawls from weasel.poly.edu and grampus.poly.edu
PureSight www.puresight.com/Products/PureSightHomeDescription.htm
Searchworks Spider www.nedesign.com/Phipps/products.html
SilentSurf http://www4.silentsurf.com/
SiteMapper www.trellian.com/mapper/index.html
SiteSnagger www.zdnet.com/pcmag/pctech/content/17/04/ut1704.001.html
SpaceBison http://members.tripod.com/Proxomitron/features.html
A web filter that is "ShonenWare", i.e. you should
purchase a Shonen Knife CD if you use it. Shonen Knife
are a great Japanese band, much loved by the late Kurt
Cobain. Sometimes this sets the referrer page to the
band's home page at http://www.mmjp.or.jp/knife/ (or maybe
the users just happen to go there themselves).
SpotOn www.spoton.com
(IE add-on that organizes your browsing)
SQ Webscanner http://macinsearch.com/users/webscanner/
(on holiday last time I looked)
SuperBot www.sparkleware.com/superbot/index.html
Teleport Pro www.tenmax.com/teleport/pro/home.htm
teoma_agent1 www.teoma.com
teoma_admin@hawkholdings.com Another coming soon search tool. Crawls from IP address
63.236.92.148. Hawk holdings is the holding company. The
venture is between qwest.net and Baxter Investments
UCmore www.ucmore.com
A broswer plug-in (initially IE only) that searches for
related pages and categories. In my experience this
seems to entail accessing a favicon.ico file on a daily
basis (presumably to refresh the "favorites" list)
UdmSearch http://search.mnogo.ru/
Search engine technology, as used at sites such as
www.maplesearch.com. Now called mnoGoSearch.
vspider www.verity.com/products/intspider/
A commercial spidering product.
Webbandit http://softwaresolutions.net/webbandit/index.htm
Webclipping.com www.Webclipping.com
webcollage Form collage from randomly select web images
www.jwz.org/webcollage/ pet project of one of
the authors of Netscape. Seems to come from
differing IP nodes.
WebCompass ??? (quarterdeck search engine software)
WebCopier www.maximumsoft.com
WebFetch www.webfetch.com
WebGather http://pccms.pku.edu.cn:8000/
Chinese search project
Webpush www.webhauler.com/webpush.htm
WebReaper www.otway.com/webreaper/
Webrobot www.multimania.com/dilletb/WebRobot/
WebVCR www.netresultscorp.com/fs_webvcr_info.html
WebStripper www.solentsoftware.com/webstripper/
WebTwin www.WebTwin.com
Convert websites into help files.
webwasher www.webwasher.com/en/products/wwash/functions.htm
(browser filter)
WebZIP www.spidersoft.com
Zeus 1500 Webster Pro www.homepagesw.com/webster_overview.htm
Zeus 2500 Webster Pro
Zeus 4300 Webster Pro
$_$_END_TABLE
Other miscellaneous agents
==========================
These agents are ones that we've seen, but been unable to get information
for, or which are slightly unusual in origin. If you have any additional
information on any of these, feel free to send it to search@jafsoft.com
[[IGNORE_THIS table is broken. highlighting * is lost]]
$_$_BEGIN_TABLE
$_$_TABLE_ALIGN CENTER
User Agent Information
-------------------------------------------------
Albert Indexer www.albert.com/papers.htm
Multi-lingual search technology
Aranha Seems to be from a yet-to-be launched site
www.girafa.com. Spiders using IP 212.150.51.90
which also seems to be Aranha.girafa.com
AVSearch Seems to be the AltaVista personal search agent. The
crawling site is sometimes referred to in the agent name
Checkbot Seems to come from www.oxxfordinfo.com who offer B2B
services
Digimarc WebReader Digimarc search images on the web looking for digital watermatrs
More details at www.digimarc.com/about/index.shtml
EchO!/2.0 Spiders from 194.254.160.3, which would seem to be part
of www.voila.com, a French-based search engine.
FinaleRobot The www.expressus.com site describes an Interactive Natural
robot-master@expressus.com Language encyclopedia that will become a search engine
at www.final-e.com. Good name, but at present it just
maps back onto the ExpressUs site (not such a good name).
Crawls from IP address 64.114.34.115
GentleSpider Some sort of spider that usually visits using
an IP address from within www.research.att.com or
crawler.tivra.com
Gulper Web Bot www.ecsl.cs.sunysb.edu/~maxim/cgi-bin/Link/GulperBot
(Open research project to produce opinion-based search engine)
InterGO www.teachersoft.com
http://browserwatch.internet.com/news/story/intergo1.html
This was a child-safe browser, nut it seems no associated
page remains
InternetArchive Presumably www.internetarchive.com, but that's in "stealth mode"
Internet Ninja www.ifour.co.jp (Japanese Macintosh browser?)
InternetSeer A web monitoring service.
More details at www.internetseer.com/support/faq.jsp
Katriona Something to do with the European Regional Internet Registry (RIPE)
Browses using IP address 213.219.19.148
larbin And from the people that brought you xyro (see below),
sebastien.ailleret@inria.fr comes another, newer bot. This one seems to crawl from
ghi@lcs.mit.edu the IP address cremant.inria.fr. *Update* more recently
it's also been seen coming from barracutta.lcs.mit.edu
cosmos And then there was "cosmos", crawling from pomelos.inria.fr
Seems these people are a webbot factory. Cosmos doesn't
offer an email address.
LEIA *Unable to find*
(Too many "Star Wars" references get in the way)
libwww-perl The PERL programming language comes with a number of
routines for constructing web-aware scripts. This and
related strings are the default user agent identifiers,
although it's perfectly easy to change this to be whatever
you want.
MultiText Research project to index the last weeks' news items
http://multitext.uwaterloo.ca/NetSearch.html
NetCruiser www.netcruiser-software.com/products.html
It's not clear to me *which* of these products this might be,
but I'm assuming it's one of them.
ORA_checksite http://www.oreilly.com/openbook/webclient/ch06.html
Identifier used in a sample perl program in the online
book "Web Client Programming with Perl". The program is
used to check links. Obviously people have tried it, and it works :-)
PintaSpider *Unable to find* But the spider came from www.cnet.fr
PitSpyder Thread0 *Unable to find*
psbot www.picsearch.org/bot.html
A bot indexinx pictures. Crawls from ps.direct2internet.com
RepoMonkey Bait & Tackle A bit of detective work here. Recent entries in the
the log file link this to the site www.hungryhippo.com,
although the robot always appears to come from an IP
address at backflip.com (a bookmarking service).
Visiting www.hungryhippo.com reveals a "coming soon"
site. Looking at the HTML source leads to another page
at http://www.mezzaluna.net/hungryhippo.com/ (appears
identical).
The META tags for this page all appear to be references
to day trading, futures, training and the like, although
we did spot the word "fibonacci" (our favourite :-).
So... possibly a future search engine related to stock
trading?, or maybe the Monkey and Hippo are just feeding
me a red herring?
There's more. The picture on the Kenjin site at
www.kenjin.com/kenjin/info.html is currently the same as
that at HungryHippo. Kenjin is an Autonomy company.
Robot2.0(PingSoft) There are several "PingSoft"s around, but I suspect that
this belongs to one of the products listed at
http://pingsoft.com.cn/english/e_index.html (e.g. SmartHunter)
since I was visited froma Chinese IP address.
ru-robot Unable to find details on this, but I'm guessing it's
0.1_hseo(at)cs.rutgers.edu a research spider from www.rutgers.edu. Crawls using
the IP teal.rutgers.edu
TaWWWantula *Unable to find*
TeraCrawl *Unable to find*
unlostBot www.unlost.com is "under construction". The robot came
unlostBot@unlost.com from IP address 212.37.219.147 which is in France.
utopy Coming soon at www.utopy.com (requires flash). This
crawler@utopy.com venture-capital funded site is "running in stealth mode"
before launching the "new new thing" (is that a typo?).
One of the Flash pages defines Utopia (geddit?), and some
of the browsing is done by IP addresses at ...myutopy.com.
UtilMind HTTPGet Probably the perl-based (uses the httpget library) web page
grabber "Web Thief", described at www.utilmind.com/scripts/webthief.html
UrlScope *Unable to find*
VCI WebViewer Web browser object, that may be incorporated into software
www.homepagesw.com/webster_dl.htm
WAVETools A set of Delphi components offered to build Internet
applications from www.transerve.com
Web Hound *Unable to find*
Or rather, I found several different "web hounds", so can't tell
which this was,
Web Magnet www.webmagnet.com
this appears to be a tool used by this web consultancy.
WebSymmetrix Originates in Korea, and is possibly related to their
National Computerization Agency. Uses IP address
210.183.28.39
WhosTalking http://softwaresolutions.net/whostalking/
Software that tracks Trademark usage
xyro Seems to be a spider associated with a French
xcrawler@inria.fr research institute. Usually crawls using the IP
address vamos.inria.fr
$_$_END_TABLE
Sites that regularly visit
==========================
Some IP addresses, or sites may regularly visit you, although the user agent
may be obscure, or even change.
Here are a few that I've been able to work out
$_$_BEGIN_TABLE
Site address(es) Description
--------------------------------------------------------
proxy.netsetter.org This is a site thet offers a speed-up
to your surfing, in return for being able to
monitoring people's surfing habits. The speed-ups
are acheived through a variety of techniques,
and the monitoring info is sold on, although your
privacy is protected. Visit www.netsetter.org
for more details.
pwoshoes.transport.com *Not known*
...lightrealm.com This site daily reads any xml files submitted to
a shareware site in PAD format. PAD is a means for
describing shareware devised by the Association of
Shareware Professionals (www.asp-shareware.org). This site
is performing daily checks, looking to automatically
update its lists with any changes.
$_$_END_TABLE
Awards for this page
====================
$_$_BEGIN_HTML
$_$_END_HTML
All awards gratefully received :-)