Google and Other Search Engines

I'm going to start by talking about search engines, particularly Google. One of the reasons I'm starting here is that though there are lots of friendly helpful people on the Internet, they get tired of answering the same old questions every day. It's not very interesting to them. There are even some IRC channels where they have a rule that is often called the "One Minute Google Rule". The rule states that they won't answer your question if you could have found the answer in under a minute using Google.

A proper treatment of the art of using search engines goes well beyond the scope of this article, but there are some simple guidelines that can make the experience a lot easier and more rewarding than you might be used to.

First it's good to know a little bit about how search engines like Google work. Google has a very large collection of computers that run programs called spiders that scour the web in an attempt to index as much of the web content as possible. These spiders store a copy of each page they find and follow every link they can. The spiders index the pages according to the text in the pages themselves and according to key words that the author of the page may have placed in the page header. When you put in word or phrase to search on, Google compares that information against the index it has constructed and then returns the pages that match, best matches first.

Usually when you are trying to solve a problem, you have some key words in mind. If you are having trouble installing a printer you might try a search like: "printer install acme printer company goes boom". What I finally came to realize after having it suggested to me enough times that it finally sunk in, is that the best set of words to search on was the error message itself! It may take some time to sort through the results, but you'll often find what you are looking for.

Sometimes, when you search using a search engine like Google, the page you get back will appear to be totally unrelated to the keywords or error message you entered. That's when it's good to know that Google keeps a copy of the page it uses to build it's index in it's cache. Most search results will give you a series of matches and at the end of most of the matches there will be a link that says "Cached". Clicking on that link usually give you the original page from which Google constructed it's index.

Mismatches in Google

The reason that you find such mismatches is that many of the sites on the web, including my own, are organized with web logging software. That software allows the editors of the sites to continually add new stories to the site, so what is on the front page one day is frequently quite different from what you will find the next day. The information you were looking for is often still on the site, it might just be back in the old stories. It's worth a look.

Another good link to know about at the end of Google matches is the "Similar pages" link that will bring up a series of pages that closely match this particular page you've hit. This can be useful if you find a page that talks about the problem you are trying to solve but it doesn't actually provide you with a solution. Sometimes the "Similar pages" link will. Below is an example of a Google search showing pages that have the similar pages link.

Figure 1. Google Search with Similar Pages Link

Example Google Search with Similar Pages Link

Other Search Engines

Google isn't the only search engine you can use to find information, though it is the most popular. Others include Yahoo!, Alta Vista, Alltheweb, Teoma, WiseNut, Lycos, and many others. One of the things that distinguishes search engines is the number of documents indexed. Currently Google is the leader with some 4.2 billion pages indexed. You can learn a lot more about search engines than I can cover here at SearchEngineWatch. There is a good article on the size of the various search engines here. The site also has an article on major search engines and directories that goes into far more detail about the available search engines than this document has room for.

With so many search engines available, it's no surprise that there are now a number of meta search engines that take your search and run in through a number of different search engines and blend the results together. Since the different search engines on the web may cover different parts of the web, this is a way of insuring that you do the most thorough search. Dogpile is one of the more popular meta search engines available.

dmoz.org provides a directory of search engines. Another large directory of search engines can be found at SearchEngineGuide. SearchEngineGuide also has a page on how to search.