How Search Engines Work
Internet search engines are special sites on the Web that are designed to help people find information stored on other sites. There are differences in the ways various search engines work, but they all perform three basic tasks:
- They search the Internet — or select pieces of the Internet — based on important words.
- They keep an index of the words they find, and where they find them.
- They allow users to look for words or combinations of words found in that index.
A top search engine will index hundreds of millions of pages, and respond to tens of millions of queries per day. In this article, we’ll tell you how these major tasks are performed, and how Internet search engines put the pieces together in order to let you find the information you need on the Web.
When most people talk about Internet search engines, they really mean World Wide Web search engines. Before the Web became the most visible part of the Internet, there were already search engines in place to help people find information on the Net.
Before a search engine can tell you where a file or document is, it must be found. To find information on the hundreds of millions of Web pages that exist, a search engine employs special software robots, called spiders, to build lists of the words found on Web sites. When a spider is building its lists, the process is called Web crawling. (There are some disadvantages to calling part of the Internet the World Wide Web — a large set of arachnid-centric names for tools is one of them.) In order to build and maintain a useful list of words, a search engine’s spiders have to look at a lot of pages.
When the Google spider looked at an HTML page, it took note of two things:
- The words within the page
- Where the words were found
Words occurring in the title, subtitles, meta tags and other positions of relative importance were noted for special consideration during a subsequent user search. The Google spider was built to index every significant word on a page, leaving out the articles “a,” “an” and “the.” Other spiders take different approaches.
Franklin, Curt. “How Internet Search Engines Work” 27 September 2000. HowStuffWorks.com.10 April 2012.
Spider by Jiri Hodan