|
Search
Engine Spiders
In-depth
info on search engine spiders and how they index your page. The
focus to increasing your sites rankings on the major search engines.
Let me clear the confusion lot of people have between a Search
engine and a Directory system.
Search Engines
: A search engine is a database system designed to index Internet
address (urls,usenet,ftp,image location etc.). The typical search
engine contains a special program called spiders( also sometime
called a "bot" or "crawler"), the spider accepts
a url, it then goes to that web site and retrieves the copy of the
file found there. Sometime later, the search engine will process
that copy of the file, distilling it down to the bare essential
data it needs for the data base. While most search engines request
both a url and an e-mail address, the search engine makes the determination
as to what data ends up in the database. In short, given a url,
an automated process occurs which results in your site being included
into the index.
Directories
: A directory is basically a manual entry database system.
You, as the end user submitting your url, will supply the directory
with all of the needed information during the submission process.
At a minimum, this information includes, url, title and a short
summary of your website. Rarely will the directory have any program
capable of visiting your website, although a few directories do
have a simple spider capable of verifying that the url you provided
was a valid url.
There are two
spider classes, Deep and Shallow. A deep spider will
take a url and spider all of the pages within the site, no matter
how many levels of directories it needs to traverse. A shallow spider
can do one of two things, it can either spider the url given and
stop, or only spider those urls it finds within a single level of
directories.
All different
search engines have different spiders and their way of searching:
Infoseek
(Sidewinder)- The Infoseek search engine, indexes pages with
its spider Sidewinder. It only indexes the pages that you have
submitted (it wont traverse or crawl through your site). Infoseek,
gives highest priority for keywords in the page title.
Inktomi
(Slurp)- Inktomi (pronounced INK TUH ME : stands for
a crafty Indian Spider) is the result of a couple super geeks
at the University of California at Berkeley. The Inktomi Spider
Slurp gives highest priority for keywords in the page title. It
also gives a higher priority for keywords in META tags and relevance
in page text.
Hotbot/AOL
(Inktomi's Slurp)- Even though Hotbot uses the Inktomi search
service, it will rank and prioritize results to its own suiting
using both Direct Hit influenced data and it's own internal data.
Hotbot is owned by Wired Digital but was recently purchased by
Lycos.
Altavista
(Scooter)-The Altavista search engine starts by spidering
your entire site with its spider Scooter. Although lately Scooter
hasn't been scooting too well. Scooter may take up to three months
to spider and index your entire site (if it is going to crawl
your site at all). It normally spiders about 2-10 pages per site
in any week. Sometimes Scooter needs a good swift kick to get
it to index certain pages.
Lycos (T-Rex)-
Lycos has now full integrated the Open Directory Project (ODP)
into its mainstream results pages. Certain categories of ODP sites
pull extremely well from the Lycos version of the ODP. It is very
wise to get your site properly listed in the ODP.
Excite
(Architext)-The Excite indexer uses an in-depth indexing algorithm
to determine keyword relevance. The spider Architext, traverses
the site and indexes the data found on the pages. The indexer
attempts to summarize the site by selecting the most relevant
sentence for the summary
WebCrawler
(Excites Architext)- Webcrawlers indexing methods are sporadic
and inconsistent. It would seem that WC crawls pages when it can.
In the last year, I've had pages show up in one day, and other
times it was two months. This makes researching WC site ranking
methods on an ongoing basis not impossible but very difficult.
Google
(Backrub or GoogleBot)- Google uses a PageRank system for
the central basis of its indexer. It gives a higher priority to
site linkage. There are also other complex factors that go into
the ranking system.
So now you know
all about search engines and how they index web pages. It is very
important that your page is optimized to get a higher rank like
using proper keywords and contents to promote your web site, there
is only one product in the market that you can relay on and that
is Submission
2000 . It can optimize your web page for you and also check
your ranking
in the major search engine so you can tell how well the site is
working, whether it need extra work or not.
Imagine free
web site promotion, with search engine, once you obtain those top
positions, they will bring in new customers for free and not just
normal traffic, these are people who are specifically looking for
the products and/or the services that you are providing.
Check following
links for more information on web site promotion:
Why
search engine are so important in E-commerce?
Apex
Pacific web site promotion specialist and maker of Dynamic Software
series
Tips
on making your page search engine friendly
Great
info on how search engine works
|