- ERIC ARTICLE: UNCOVERING THE HIDDEN WEB, PART I: FINDING WHAT THE SEARCH ENGINES DON'T - Teachers.Net Gazette, featuring columns and articles by top names in education and your teacher colleagues around the world! The Teachers.Net Gazette is a clearinghouse for teacher writing, from well-reasoned education essays to teacher prose, poems and humor! Bookmark the Teachers.Net Gazette and tell a friend!

TEACHERS.NET GAZETTE

Volume 3 Number 7

COVER STORY
Barbara & Sue Gruber help us "to stay energized and enthusiastic about teaching" during our summer break...

ARTICLES

Five Reasons to Stop Saying "Good Job!" by Alfie Kohn

Prepare for Discouragement? by Hg

Using The Summer To Improve Your Teaching by Bill Page

What I Know I Know by Bill Page

Consistency in Congress: Yet Another Child On-line Protection Law that Can't Possibly Work by Dr. Rob Reilly

Simple Tips to Increase Student Achievement at the High School Level by Geneva Glanzer

Dear Old Golden Rule Days, Chapter 1 - First Test by Janet Farquhar

Classroom Management Tips You Wish You'd Known "Back Then" from the Primary Elementary Chatboard

Teaching for Peace by Jay Davidson

Book Reviews - The "Power" of Two & Brain Based Teaching: Building Excitement for Learning by Susan Gingras Fitzell

Classrooms as Discourse Communities by Daniel Chang

Keeping Records on Students with IEP's from the Special Education Teachers' Chatboard

The Robinson Residence for Retired Teachers In Quebec by Dave Melanson

What To Do With Education Catalogs Instead of Tossing Them from: The Teachers.Net Chatboard

Uncovering the Hidden Web, Part I: Finding What the Search Engines Don't from: ERIC Clearinghouse

July Columns

July Regular Features

July Informational Items

Gazette Home Delivery:

Database of U.S. Department of Education Publications in ERIC
From The U.S. Department of Education, http://www.ed.gov

Find any publication produced or funded by the U.S. Department of Education since its creation in 1980 which appears in ERIC, the world's largest bibliographic database of education literature. The database currently lists 28,506 publications entered into the ERIC database through November 2001.

Use one of the five available search techniques:

Try the basic search form first.
There's also an advanced search form with additional fields and options for information professionals and serious data hounds.
Controlled vocabulary fans can search using the Thesaurus of ERIC Descriptors.
If you like to rummage, you can use the Index feature to search by browsing lists of the words and names that appear in the database's various fields.
To identify and order current U.S. Department of Education publications and products, use the ED Pubs On-Line Ordering System.

Once you identify the titles you're interested in, you can find the full text of most documents on microfiche at over a thousand ERIC Resource Collections at libraries, universities, and other locations nationwide. Microfiche can also be purchased from the ERIC Document Reproduction Service (EDRS). Many recent documents are available in full text on the ED Web site and other ED-sponsored Web sites.

The Counseling Chatboard
is dedicated to discussions related to school guidance counseling and student advising. Visit the Counseling Chatboard today!

NEW: ASL/Sign Mailring...
Teachers.Net is thrilled to introduce our newest addition ot our teacher resources - the Teachers.Net ASL/Sign Language Mailring. This latest addition rounds out the Teachers.Net Language Center, and provides teachers of mainstream as well as special needs students a place to discuss using and teaching American Sign Language and other sign languages in the classroom. Subscribe today at the Teachers.Net Mailring Center.

Beginning Teachers Chatboard...
Teachers.Net has two resources dedicated exclusively to beginning teachers - the Beginning Teachers Chatboard, and the Beginning Teachers Mailring. Join a community of new teachers and mentors from around the world, and help make your start in teaching effortless and painless! The Teachers.Net community specializes in teacher peer support, bookmark the Beginning Teachers Chatboard and join the Beginning Teachers Mailring and start your career off on the right foot.

Teachers.Net Job Center...
Teachers, looking for that perfect teaching job? Check out the Teachers.Net Job Center. Teacher job listings from around the world, all free to browse and continuously updated. Join the JobAlert Network and have teacher job listings emailed to you the imnstant they're posted! Or network with thousands of other teachers advancing their careers on the Job Talk Support Board. Visit the Teachers.Net Job Center today!

Substitute Teachers Chatboard...
Finding the perfect teaching job is not an overnight process, and many of the most talented teachers develop their skills through their time spent as substitute teachers. Tap into a peer group of both substitute and contract teachers and make the most out of the subtitute teaching experience! Bookmark the Substitute Teachers Chatboard.

Arts & Crafts Chatboard...
Need fresh ideas for your Art lessons and projects? Visit the Arts & Crafts Chatboard to ask for advice or help a fellow teacher solve a problem.

Explore A Postcard Project...
Your students don't need to leave the earth or even the classroom to explore strange lands and cultures. Check out the Teachers.Net Postcard Projects Chatboard and connect with people around the world, and introduce your students to fascinating new people ands places. Click here to visit the Postcard Projects Chatboard and don't miss this month's featured lesson Postcard Project Lesson in the Teachers.Net Lesson Bank.

In Focus...

Uncovering the Hidden Web, Part I: Finding What the Search Engines Don't

ERIC Identifier: ED456863
Publication Date: 2001-10-00
Author: Marcia Mardis
Source: ERIC Clearinghouse on Educational Management Eugene OR.

THIS DIGEST WAS CREATED BY ERIC, THE EDUCATIONAL RESOURCES INFORMATION CENTER. FOR MORE INFORMATION ABOUT ERIC, CONTACT ACCESS ERIC 1-800-LET-ERIC

Currently, the World Wide Web contains an estimated 7.4 million sites (OCLC, 2001). Yet even the most experienced searcher, using the most robust search engines, can access only about 16% of these pages (Dahn, 2001). The other 84% of the publicly available information on the Web is referred to as the "hidden," "invisible," or "deep" Web.

Despite the explosion in Web content, commonly used search processes have not changed significantly since the Web's inception. Information is commonly found now as it was ten years ago, with directories and search engines. But the ever-quickening pace of the World Wide Web's growth demands an expanded set of search tools and skills. This article provides tips on augmenting traditional search techniques with knowledge of the hidden Web, helping readers to access some of the Web's most valuable content.

THE WRATH OF THE MATH

Recent studies estimate the size of the hidden Web to be about 500 times larger than the size of the known "surface" Web indexed by search engines. There are billions of documents obscured in databases, written in non-HTML formats, and hosted through non-http means. According to experts (Bergman, 2000), the hidden Web is comprised of:

Nearly 550 billion individual documents

The largest growing category of new information on the Internet

Content that is highly relevant to every information need, market and domain

More focused content than surface Web sites

Total quality content that is up to 2,000 times greater than that of the surface Web

95% publicly accessible information not subject to fees or subscriptions.

What do all of these characterizations mean in terms of content? The hidden Web contains current news articles, image collections from museums, and Specialized databases full of discipline specific research and reports (ERIC documents being only one example of thousands), U.S. Census information, and so on. Tools to access more of the Web are nascent, but they are growing.

THE WAY WE ARE NOW

Directories like Yahoo (http://www.yahoo.com) and About.com (http://www.about.com) are human-mediated collections of reviewed and categorized links. Users browse through categories by clicking ever-narrower subject lists. Since a directory's staff can only review and classify a finite number of sites, directories simply cannot keep pace with the explosion of Web content.

AllTheWeb (http://www.alltheweb.com) and Google (http://www.google.com) are examples of traditional search engines that use spidering programs. When the spider program executes, it starts at a specified Web page, indexes that page's content, and follows any hyperlinks on that page. The process is repeated at the destination of each of the hyperlinks. In this way, the program crawls and indexes a web of hyperlinked pages.

When a user enters terms into the engine's search box, those terms are matched in the engine's index; the terms are found on the "live" Web. Therefore, search engines allow users to go beyond the classification preferences of directory editors to gain term level control over search results. Metasearch tools like Ixquick (http://www.ixquick.com) and MetaCrawler (http://www.metacrawler.com) extend the search engine principle by allowing users to run a query in multiple search engines simultaneously.

While Web directories are obviously constrained by human limits, search engines fail because they primarily index documents written in HTML. Spiders cannot index pages generated dynamically like those in Microsoft's Searchable Knowledge Base and documents written using methods like Adobe Acrobat, Active Server Pages, or Cold Fusion. Likewise, database contents are excluded from the indexing process; spiders cannot transform search terms in database queries or complete a login process. And, in many instances, protocols other than HTTP (e.g., FTP, gopher) are excluded.

FINDING THE HIDDEN WEB

The first step to accessing the hidden Web is much like that of other search processes: use familiar and reliable resources. Although directories offer limitations as primary search tools, directory categories often contain hidden Web databases. Also, professional journals and magazines provide a wealth of current knowledge; look for reviews of new reference tools and subject directories. In addition to these basic steps, Web-based and desktop solutions are available to access the hidden Web.

With over 7,000 topic-specific databases, there is no way to access every hidden Web resource. But, Web-based gateways, collections, and desktop tools point to specialized databases. These tools are most effective when a few of them are used regularly and integrated into an overall search strategy.

A SMATTERING OF SOLUTIONS

Around the Web in 80 Sites: The Best of the Invisible Web (http://websearch.about.com/
library/blow2000.htm) The search gurus at About.com created this list of hidden Web resources strong In categorization and expert selection.
Provides access to the search interfaces of resources that are not easily located with major search engines. This resource is considered by many librarians to be the key hidden Web resource.
Infomine (http://infomine.ucr.edu)
A virtual library and reference tool containing highly useful Internet resources including databases, electronic journals, electronic books and many other types of information in a broad range of subject areas and reading levels.
LexiBot (http://www.lexibot.com)
Desktop software that is able to make dozens of queries simultaneously. Surface and hidden Web results are tested for dead links and presented in a format that allows previewing or Web browser viewing. Made for PCs only, this tool is free to try.
Searchability: Guides To Specialized Search Engines
(http://www.searchability.com) A gateway site with an annotated list of thousands of search engines covering hundreds of subjects. Descriptions include size, specificity, and some aspects of collection quality.
SearchEngineGuide.Com: The Guide to Search Engines, Portals, and Directories (http://searchengineguide.com) Currently indexes almost four thousand search engines. Browse for search engines by category or use the keyword search feature. Each entry provides a brief summary.

QUALITY SHOULD BE JOB ONE

The claim that the hidden Web surpasses the quality of the surface Web is justified by compelling arguments. First, the hidden Web is primarily composed of databases. A site that employs a searchable database is probably current, since Web-accessible databases are fairly new phenomena. Also, a site that puts effort toward collecting and publishing information in a database is usually vested in the topic area.

For example, the Researching Librarian (http://www2.msstate.edu/~kerjsmit/trl) lists many sites that contain information of interest to information scientists; the most valuable and current information can be found in the sites listed in the database section.

The second argument for the hidden Web's superior quality is that traditional search engines overwhelmingly favor sites in the burgeoning commercial domain (O'Leary, 2000). Since search engines can only find sites that have links to them from other pages, users tend to put links on their pages to popular and well-known commercial sites. Also, sites produced by nonprofit and educational entities do not receive the same advertising and brand name recognition that commercial sites enjoy.

Commercial sites are by no means consistently unreliable. However, educational and nonprofit entities that conduct research in certain disciplines are often excluded in traditional searches. The best source of information is an expert; hidden Web databases point to specialized and authoritative resources.

WANTED: MAGIC BULLET

Although the Web is often the first place to look, it is not necessarily the best place to look. The hidden Web and other Web-based information resources should be seen as part of an information retrieval process that includes books, serials, and subscription databases. The frontier of the World Wide Web yields a range of quality, currency, authority, and stability levels, so quality issues should be a priority in discussions of information retrieval and in searching instruction.

After using the hidden Web sites, many searchers are disappointed by the need to search each database individually. But search tools have not evolved to the point where the power of a search engine can be seamlessly combined with the quality and depth of the hidden Web. There is no magic bullet; research is a process of carefully uncovering obscured information, not exposing the obvious.

REFERENCES AND FURTHER READING

Bergman, M. (2000, n.d.). "The deep web: Surfacing hidden value." BrightPlanet.com LLC. Retrieved August 15, 2001, from the World Wide Web: http://www.completeplanet.com/
Tutorials/DeepWeb/contents04.asp

Dahn, M. (2000, January/February). Counting angels on a pinhead: Critically interpreting web size estimates" "Online," 35-40.

Diaz, K. (2000). The invisible Web: Navigating the Web outside traditional Search engines. "Reference & User Services Quarterly," 40 (2), 131-134.

Ensor, P. (2001, June 14). "Toolkit for the expert web searcher." Library Information Technology Association. Retrieved August 15, 2001, from the World Wide Web: http://www.lita.org/committe/
toptech/toolkit.htm

OCLC (Online Computer Library Center). (2001, July 13). "Statistics." Online Computer Library Center, Inc. Retrieved August 15, 2001, from the World Wide Web: http://wcp.oclc.org/stats.html

O'Leary, M. (2000, January). Invisible Web uncovers hidden treasures. "Information Today," 16-18.

Price, G., & Sherman, C. (2001, July/August). Exploring the invisible Web. "Online," 32-34.

Price, G. & Sherman, C. (2001). "The invisible Web: Uncovering information Sources search engines can't see." CyberAge Books.

Sherman, C. (2000, n.d.). "Worth a look: Searching the invisible Web." About.com. Retrieved August 15, 2001, from the World Wide Web: http://websearch.about.com/library/
searchwiz/bl_invisibleweb_apra.htm

Sherman, C., & Price, G. (2001). The invisible Web. "Searcher," 9 (6), 62-74.

Snow, B. (2000, May). The Internet's hidden content and how to find it. "Online," 24. (EJ 613 396).

ABOUT THE AUTHOR

Marcia Mardis, MILS, a former K-12 media specialist, is Program Coordinator And Internet Media Specialist at the Center to Support Technology in Education at Merit Network, Inc. She presents on Web searching issues at conferences around the country and writes frequently on K-12 use of the Internet.

A Product of the ERIC Clearinghouse on Educational Management, College of Education, University of Oregon, Eugene, Oregon, 97403-5207.

This publication was prepared with funding from the Office of Educational Research and Improvement, U.S. Department of Education, under contract No. ED-99-C0-0011. The ideas and opinions expressed in this Digest do not necessarily reflect the positions or policies of OERI, ED, or the Clearinghouse. This Digest is in the public domain and may be freely reproduced. The text of this Digest may be viewed electronically at http://eric.uoregon.edu.

ERIC Digests are in the public domain and may be freely reproduced and disseminated.

ERIC Clearinghouse on Information & Technology, Syracuse University, 621 Skytop Road, Suite 160, Syracuse, NY 13244-5290; 800-464-9107; 315-443-3640 ; Fax: 315-443-5448; e-mail: eric@ericit.org; URL: http://www.ericit.org.

This publication is funded in part with Federal funds from the U.S. Department of Education under contract number ED-99-CO-0005. The content of this publication does not necessarily reflect the views or policies of the U.S. Department of Education nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. government. Visit the Department of Education's Web site at http://www.ed.gov


The Gazette Current Issue Submit! Subscribe Back Issues Chatboards Teachers Administrators Grade Level Subject Area States Tech Chatboards Projects Interest Groups Mailrings Classifieds Help Wanted Books for Sale Items Wanted Teaching Supplies Teacher Created Manipulatives Educ Games Educ Software Fundraising Miscellaneous Educ Programs Distance Learning Distance Teaching Continuing Education Tutors Wanted Tutoring Services Lessons New Lessons Add a Lesson Browse Lessons Search Lessons Jobs Search Jobs Post Resume Post Job Listings Resume Search Distance Learning Harry Wong Projects Project Switchboard Classroom Centers Professional Readings Grant Writing Fundraising Eco-Chatboard 100 Days Traveling Buddies Classroom Pets Pen Pals Post Cards Chatroom Meetings Advertise