Untangling the Web
URLs in this document have been updated. Links enclosed in {curly brackets} have been changed. If a replacement link was located, the new URL was added and the link is active; if a new site could not be identified, the broken link was removed.


Spiders and Worms and Crawlers, Oh My: Searching on the World Wide Web

Ann Eagan
Science-Engineering/Undergraduate Services Librarian
Laura Bender
Science-Engineering/Undergraduate Services Librarian
University of Arizona, Tucson


Copyright 1996, Ann Eagan and Laura Bender. Used with permission.

Abstract

Searching on the world wide web can be confusing. A myriad of search engines exist, often with little or no documentation, and many of these search engines work differently from the standard commercial search engines we are used to using.

The workshop will begin with a guided search exercise. At the completion of the exercise, participants will be given a detailed information packet containing information on all the material to be covered during the session. We will then describe and demonstrate the use of several representative web search engines, explain some of the differences between web search engines, provide guided exercises for hands-on participation, and answer questions from the audience.

This workshop is aimed at librarians desiring to know how, when and why to search the Internet.


Searching on the world wide web can be confusing. A myriad of search engines exist, often with little or no documentation, and many of these search engines work differently from the standard commercial search engines we normally use. There are also many directories that attempt to organize the Internet by subject, and, today, there are many search engines that combine directory and keyword search capability. This paper will define search engines, directories, spiders and robots, cover some basics of searching, provide criteria for choosing search engines as well as a comparison of some of the search engines available.

Some caveats before we begin. There are dozens of search engines and several search engines for search engines, making it impossible to cover all of them. Also, much of what is written in this paper today is likely to be superseded by new information by the time you read it.

What are Search Engines and Directories?

Search engines in use on the Internet use automated programs, called robots, to search the web. These automated programs are also known as spiders, crawlers, wanderers and worms. The robots crawl about the web indexing web sites. Some of them index web sites by title, some by uniform resource locators (URLs), some by words in each document in a web site, and some by combinations of these. Because the Internet is always growing and because these search engines search in different ways and search different parts of the Internet, doing the same search using different search engines will often give you wildly differing results.

Many directories on the Internet were created by humans tired of stumbling about the Internet looking for topics of interest. These personal lists grew in size and complexity, and eventually the humans started to use the available search engines to assist them in their quest to bring order to the mess. Yahoo is perhaps the best known of the directories. It was started by a couple of students at Stanford and now employs a variety of people, including librarians, who review and categorize web sites. Yahoo also now employs a search engine, as do most of the other directories. In addition, many of the search engines offer directories of topics for those who prefer to browse.

How to Search

Browsing a directory is a simple matter of following the links for the topic of interest. Searching either a directory or the portion of the web that a search engine covers works very much the same in almost all search engines. The basic format is that of a dialogue box, pane, or line where search terms can be entered followed by options to either submit or clear the search.

Once the search request is received, the search engine searches its own indexed database first, then, based on design, sends out spiders or other robots to add to the database. Results are sent back to the searcher, some annotated extensively, with links to the sources retrieved.

Full featured search engines also have options to expand or limit searches in a variety of ways. For example, in Lycos, the basic search assumes a boolean "or", which means that two or more terms will return results if any of the terms occur in documents indexed by Lycos. To obtain documents containing all the terms in a search, the Enhance Your Search option must be chosen and adjustments made to the default options.

Choosing a Search Engine

Choosing a search engine depends on the results you're looking for, though there are some criteria that may be useful. These criteria include: For example, searching for some information on the Native American squash blossom design using WebCrawler will bring relevant results, but either OpenText or InfoSeek would be better first choices because they both give more information to help you determine relevancy.

Comparing Search Engines

In the following chart, we have divided search engines into four categories: Classics, Leaders, Newer Kids on the Block, and Search Engines for Search Engines. By Classics we mean search engines that have been around for awhile, that are well-known and well-used. Leaders are search engines that may or may not have been around for awhile, but are well-known, have high use and return relevant results. Newer Kids on the Block is our designation for more recent arrivals on the search engine scene. And Search Engines for Search Engines covers two meta-search tools that give you a single interface for searching multiple search engines at the same time. We will not be covering the collections of search engines such as Search.com, All-in-One, and CUSI(Configurable Unified Search Engine) that allow you to search different search engines in sequence.

The information given for each search engine is the name, the URL, how big the database is (if available), what it searches, general information on how to search, and why you might want to use it. Also included are characteristics specific to a given search engine. For example, MetaCrawler will check the links in the documents retrieved to ensure that they are valid, and OpenText allows you to see the keywords from your search in the context of the document.

Finding the information for the comparison chart was the result of an archaeological expedition -- a lot of digging in obscure places -- most of it on the help screens of the search engines themselves. OpenText is a good example of digging in obscure places: The help screen only shows up after you have done a search. The rest of the information comes from company information and the articles listed in the bibliography.

Comparison Chart--Search Engines

Classics

Leaders

Newer Kids on the Block

Search Engines for Search Engines a.k.a. Meta Search Engines

Conclusion

This is only a small portion of the ever-growing number of available search engines. There are many similarities and many differences in the way the search engines work. Think about what you want to get out of your search, try out a number of the search engines, and understand that the Internet and the search engines are changing daily. Yesterday's favorite search engine may be completely different today, and, most certainly, yesterday's search will provide completely different results today. The concept of an expert as someone who knows almost everything about a subject is no longer valid. A better definition may be that an expert is someone who adapts to new information, digests it more quickly, and soon is hungry for more.


A Definitely Not Comprehensive Bibliography of Articles on World Wide Web Search Engines

Compiled by Ann Eagan and Laura Bender
"Comparing Search Engines." [{http://www.hamline.edu/library/links/comparisons.html}].
A compilation of various articles on comparing search engines.

Decy, Don E. 1995. "All Aboard the Internet: Searching the World-Wide Web". Techtrends 40(4):7-8.
A good, overall explanation of searching and search engines on the Web.

Gralla, Preston. 1995. "Underground Internet." PC Computing 8(11):195-200+.
A compilation of information on browsers, multi-media viewers and players, and search engines.

Moeller, Michael. 1995. "Open Text, Yahoo Meld Search Engines." PC Week 12(38):135.
A brief article announcing the joining of OpenText and Yahoo.

Notess, Greg R. 1995. "Searching the World-Wide Web: Lycos, WebCrawler and More." Online 19:48-50+.
A good, general explanation of several popular search engines. The information is a little dated on Lycos. Also, the author favors traditional search techniques.

Scales, B. Jane and Elizabeth Caufield Felt. 1995. "Diversity of the World Wide Web: Using Robots to Search the Web." Library Software Review 14(3):132-136.
Another good overview of web search engines with information on how to choose a search engine. Some of the search engines covered are less well known than those covered in other articles. Also, while the article date is Fall 1995, much of the specific search engine information is out of date.

Udell, Jon. 1995. "Web Search." Byte 20(9):223-224+.
A techie article mostly on tools you can use to make your own information searchable. Contains a good quote: "My advice to major Web contributors (and to creators of Web authoring tools) is to hire a library scientist".

Venditto, Gus. 1996. "Search Engine Showdown." Internet World 7(5):79-86.
A recent article comparing seven Internet search engines in lay terms. The results of Internet World's tests show InfoSeek Guide providing the most relevant results and Alta Vista the most comprehensive results.

HTML 3.2 Checked!