Skip to main content

Information Retrieval on the Web

  • Chapter
  • First Online:
Lectures on Information Retrieval (ESSIR 2000)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1980))

Included in the following conference series:

  • 530 Accesses

Abstract

Information Retrieval (IR) on the Web can be considered from many different perspectives, but one objective and relevant aspect to consider is that on mid-1999 the estimated number of pages being published and available for indexing in the Web was 800 millions for 6 terabytes of textual data. Those Web pages were estimated to be distributed over 3 millions Web servers. This means that anyone cannot effort to explore all the information distributed over those pages, but anyone necessarily needs to be supported by tools that help the end users to choose the most relevant Web pages to answer any specific request of information. The Web has started to operate only 10 years ago, and just few years after the first information retrieval tools have been made available to help Web users to find Web pages with relevant information. To deal with the complexity and heterogeneity of the Web, we need search tools implementing algorithms for indexing and retrieval that are more advanced than those currently employed in IR. These advanced algorithms need to exploit the structure of, and the inter-relationships among Web pages.

From a research point of view, we need also to re-think evaluation because of the different characteristics of Web IR, which can be expressed in terms of data, functionalities, architecture, and tools. These characteristics affect ‘how’ to carry evaluation out and ‘what’ to evaluate.

This chapter faces the different aspects of IR on the Web that can be considered and analysed, that is: history of IR on the Web, different types of tools for performing IR on the Web which have been designed and developed to answer different user requirements, architecture and components of those IR Web tools, indexing and retrieval algorithms that can be employed for making Web IR effective, and methods for evaluation of Web IR.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. M. Agosti and J. Allan, editors. Special issue on methods and tools for the automatic construction of hypertexts, volume 33(2) of Information Processing & Management, 1997.

    Google Scholar 

  2. M. Agosti, F. Crestani, and M. Melucci. Design and implementation of a tool for the automatic construction of hypertexts for Information Retrieval. Information Processing & Management, 32(4):459–476, July 1996.

    Article  Google Scholar 

  3. M. Agosti and M. Melucci. Information retrieval techniques for the automatic construction of hypertext. In A. Kent, editor, Encyclopedia of Information Science, volume 66, pages 139–172. Marcel Dekker, New York, 2000.

    Google Scholar 

  4. M. Agosti and A.F. Smeaton, editors. Information Retrieval and Hypertext. Kluwer Academic Publishers, Boston, USA, 1996.

    Google Scholar 

  5. J. Allan. Building hypertexts using information retrieval. Information Processing & Management, 33(2):145–159, 1997.

    Article  MathSciNet  Google Scholar 

  6. K. Bharat and M. R. Henzinger. Improved algorithms for topic distillation in a hyperlinked environment. In Proceedings of the ACM International Conference on Research and Development in Information Retrieval (SIGIR), pages 104–111, Melbourne, Australia, August 1998. ACM Press, New York.

    Chapter  Google Scholar 

  7. J. Blustein, R. E. Webber, and J. Tague-Sutcliffe. Methods for evaluating the quality of hypertext links. Information Processing & Management, 33(2):255–271, 1997.

    Article  Google Scholar 

  8. R. A. Botafogo. Cluster analysis for hypertext systems. In Proceedings of the ACM International Conference on Research and Development in Information Retrieval (SIGIR), pages 116–125, Pittsburgh, PA, June 1993.

    Google Scholar 

  9. R.A. Botafogo, E. Rivlin, and B. Shneiderman. Structural analysis of hypertext: identifying hierarchies and useful metrics. ACM Transactions on Information Systems, 10(2):142–180, 1992.

    Article  Google Scholar 

  10. S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1–7):107–117, 1998. Reprinted from [11].

    Article  Google Scholar 

  11. S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the World Wide Web Conference, 1998. http://www7.scu.edu.au/programme/fullpapers/1921/com1921.htm.

  12. S. Chakrabarti, B. E. Dom, D. Gibson, J. Kleinberg, P. Raghavan, and S. Rajagopalan. Automatic resource compilation by analyzing hyperlink structure and associated text. In Proceedings of the World Wide Web Conference, 1998. http://www7.scu.edu.au/programme/fullpapers/1898/com1898.html.

  13. W. B. Croft and H. R. Turtle. Retrieval strategies for hypertext. Information Processing & Management, 29(3):313–324, 1993.

    Article  Google Scholar 

  14. S. Davis Herring. The value of interdisciplinarity: A study based on the design of Internet search engines. Journal of the American Society for Information Science, 50(4):358–365, 1999.

    Article  Google Scholar 

  15. D. Ellis, N. Ford, and J. Furner. In search of the unknown user: indexing, hypertext and theWorld WideWeb. Journal of Documentation, 54(1):28–47, 1998.

    Article  Google Scholar 

  16. E. A. Fox. Characterization of two new experimental collections in computer and information science containing textual and bibliographic concepts. Technical Report TR83-561, Cornell University, Computer Science Department, September 1983.

    Google Scholar 

  17. J. Furner, D. Ellis, and P. Willett. The representation and comparison of hyper-text structures using graphs. In M. Agosti and A. Smeaton, editors, Information retrieval and hypertext, chapter 4, pages 75–96. Kluwer Academic, 1996.

    Google Scholar 

  18. E. Garfield. Citation analysis as a tool in journal evaluation. Science, 178:471–479, 1972.

    Article  Google Scholar 

  19. D. Gibson, J. Kleinberg, and P. Raghavan. Inferring Web communities from link topology. In Proceedings of ACM Hypertext Conference, pages 225–234, 1998.

    Google Scholar 

  20. M. Gordon and P. Pathak. Finding information on the World Wide Web: the retrieval effectiveness of search engines. Information Processing & Management, 35(2):141–180, 1999.

    Article  Google Scholar 

  21. V. N. Gudivada, V. V. Raghavan, W. I. Grosky, and R. Kasanagottu. Information Retrieval on the World Wide Web. IEEE Internet Computing, 1(5):58–68, 1997.

    Article  Google Scholar 

  22. D. Harman. Relevance feedback and other query modification techniques. In W. B. Frakes and R. Baeza-Yates, editors, Information Retrieval: data structures and algorithms, chapter 11. Prentice Hall, Englewood Cliffs, NJ, 1992.

    Google Scholar 

  23. D. Hawking, N. Craswell, and P. Thistlewaite. Overview of TREC-7 Very Large Collection track. In Proceedings of TREC, 1999.

    Google Scholar 

  24. D. Hawking, N. Craswell, P. Thistlewaite, and D. Harman. Results and challenges in Web search evaluation. In Proceedings of the World Wide Web Conference, Toronto, Canada, April 1999.

    Google Scholar 

  25. D. Hawking, E. Voorhees, N. Craswell, and P. Bailey. Overview of TREC-8 Web track. In Proceedings of TREC, 2000.

    Google Scholar 

  26. P. Ingwersen. Web impact factors. Journal of Documentation, 54(2):236–243, 1998.

    Article  Google Scholar 

  27. J. Kleinberg. Authorative sources in a hyperlinked environment. Journal of the ACM, 46(5):604–632, September 1999.

    Article  MATH  MathSciNet  Google Scholar 

  28. M. Klusch, editor. Intelligent Information Agents: Agent-Based Information Discovery and Management on the Internet. Springer-Verlag, Berlin, Germany, 1999.

    Google Scholar 

  29. S. Lawrence and C. L. Giles. Accessibility of information on the Web. Nature, 400:107–109, July 1999.

    Article  Google Scholar 

  30. S. Lowley. The evaluation of WWW search engines. Journal of Documentation, 56(2):190–211, 2000.

    Article  Google Scholar 

  31. Thelwall. M. Web impact factors and search engine coverage. Journal of Documentation, 56(2):185–189, 2000.

    Article  Google Scholar 

  32. M. Maudlin. A history of search engines. (visited 4 August 2000), 1998. http://www.wiley.com/compbooks/sonnenreich/history.html

  33. M. Melucci. An evaluation of automatically constructed hypertexts for information retrieval. Information Retrieval, 1(1):57–80, 1999.

    Article  Google Scholar 

  34. G. Salton. Evaluation problems in interactive information retrieval. Technical Report 69–39, Department of Computer Science, Cornell University, Ithaca, NY, August 1969.

    Google Scholar 

  35. G. Salton, J. Allan, C. Buckley, and A. Singhal. Automatic analysis, theme generation, and summarization of machine-readable texts. In M. Agosti and A. Smeaton, editors, Information retrieval and hypertext, pages 51–73, 1996.

    Google Scholar 

  36. G. Salton and C. Buckley. Term weighting approaches in automatic text retrieval. Information Processing & Management, 24(5):513–523, 1988.

    Article  Google Scholar 

  37. G. Salton and M. J. McGill. Introduction to modern Information Retrieval. McGraw-Hill, New York, NY, 1983.

    MATH  Google Scholar 

  38. J. Savoy. Citation schemes in hypertext information retrieval. In M. Agosti and A. Smeaton, editors, Information retrieval and hypertext, pages 99–116. Kluwer Academic, 1996.

    Google Scholar 

  39. A. Smith. A tale of two web spaces: comparing sites using web impact factors. Journal of Documentation, 55(5):577–592, 1999.

    Google Scholar 

  40. H. Snyder and H. Rosenbaum. Can the search engines be used as tools for web-link analysis? A critical view. Journal of Documentation, 55(4):375–384, 1999.

    Article  Google Scholar 

  41. J. Tague-Sutcliffe. The pragmatics of Information Retrieval experimentation, revisited. Information Processing & Management, 28(4):467–490, 1992.

    Article  Google Scholar 

  42. C. J. van Rijsbergen. Information Retrieval. Butterworths, London, second edition, 1979.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Agosti, M., Melucci, M. (2000). Information Retrieval on the Web. In: Agosti, M., Crestani, F., Pasi, G. (eds) Lectures on Information Retrieval. ESSIR 2000. Lecture Notes in Computer Science, vol 1980. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45368-7_11

Download citation

  • DOI: https://doi.org/10.1007/3-540-45368-7_11

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41933-4

  • Online ISBN: 978-3-540-45368-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics