Monday, 29 October 2007

The Invisible Web

As it's a slow news day, here's the latest instalment of an occasional series of search tips / tricks which appear on this blog. Today's post concerns the Invisible Web (sometimes called the Deep Web), what it is, and why it matters.

How big is the World Wide Web? No-one really knows, but what is known is that a huge number of web pages are not indexed by search engines like Google - in other words, no matter how cleverly you put together a search, you will be unable to uncover these pages. There are several reasons why a webpage may not be retrievable by a search engine - click here to read a useful article which explains it all in relative layman's terms, and provides links to related items.

You may be wondering why this matters, when Google or similar search engines are still capable of returning millions of results. The reason it causes a problem is that the very type of academic information that is most useful to students is often held in the sorts of websites which search engines cannot penetrate. For instance, I regularly use the ERIC service to help me find useful materials on education-related topics, but if you're using a search engine then you'll need to construct a very clever search if you want to access most of the references contained within the site; it's much easier to simply go direct to the ERIC homepage. Similarly, many of the materials held within Athens will not appear in the results of a normal Google search, and those that do will require a password and possibly payment if you want to access the really good stuff (ie the full text of journal articles); it makes more sense to go directly to the source, and reserve the search engines for when you really don't have a clue where to start, or have drawn a blank elsewhere.

Hopefully that all makes sense! If you're really keen, the library at Summer Row has an excellent book entitled The Invisible Web which contains much more on this topic. Also, the Complete Planet website attempts to bring together many of the resources which cannot be accurately picked up by normal search engines. The fella below seems to have got the idea...

No comments: