|
Understanding
Search
Engines
How Search Engines
Work
Search engines use software robots to survey the Web and build
their databases. Web documents are retrieved and indexed. When you enter a
query at a search engine website, your input is checked against the search
engine's keyword indices. The best matches are then returned to you as hits.
There are two primary methods of text searching--keyword and
concept.
Keyword Searching
This is the most common form of text search on the Web. Most
search engines do their text query and retrieval using keywords.
Unless the author of the Web document specifies the keywords for
her document (this is possible by using meta
tags in the latest version of HTML), it's up to the search engine to
determine them. Essentially, this means that search engines pull out and index
words that are believed to be significant. Words that are mentioned towards the
top of a document and words that are repeated several times throughout the
document are more likely to be deemed important.
Some sites index every word on every page. Others index only
part of the document. For example, Lycos indexes the title, headings,
subheadings and the hyperlinks to other sites, along with the first 20 lines of
text.
Full-text indexing systems generally pick up every word in the
text except commonly occurring stop words such as "a," "an," "the," "is," "and,"
"or," and "www." AltaVista claims to index all words, even the articles, "a,"
"an," and "the." Some of the search engines discriminate upper case from lower
case; others store all words without reference to capitalization.
|