Abstract
Information Retrieval (IR) on the Web can be considered from many different perspectives, but one objective and relevant aspect to consider is that on mid-1999 the estimated number of pages being published and available for indexing in the Web was 800 millions for 6 terabytes of textual data. Those Web pages were estimated to be distributed over 3 millions Web servers. This means that anyone cannot effort to explore all the information distributed over those pages, but anyone necessarily needs to be supported by tools that help the end users to choose the most relevant Web pages to answer any specific request of information. The Web has started to operate only 10 years ago, and just few years after the first information retrieval tools have been made available to help Web users to find Web pages with relevant information. To deal with the complexity and heterogeneity of the Web, we need search tools implementing algorithms for indexing and retrieval that are more advanced than those currently employed in IR. These advanced algorithms need to exploit the structure of, and the inter-relationships among Web pages.
From a research point of view, we need also to re-think evaluation because of the different characteristics of Web IR, which can be expressed in terms of data, functionalities, architecture, and tools. These characteristics affect ‘how’ to carry evaluation out and ‘what’ to evaluate.
This chapter faces the different aspects of IR on the Web that can be considered and analysed, that is: history of IR on the Web, different types of tools for performing IR on the Web which have been designed and developed to answer different user requirements, architecture and components of those IR Web tools, indexing and retrieval algorithms that can be employed for making Web IR effective, and methods for evaluation of Web IR.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
M. Agosti and J. Allan, editors. Special issue on methods and tools for the automatic construction of hypertexts, volume 33(2) of Information Processing & Management, 1997.
M. Agosti, F. Crestani, and M. Melucci. Design and implementation of a tool for the automatic construction of hypertexts for Information Retrieval. Information Processing & Management, 32(4):459–476, July 1996.
M. Agosti and M. Melucci. Information retrieval techniques for the automatic construction of hypertext. In A. Kent, editor, Encyclopedia of Information Science, volume 66, pages 139–172. Marcel Dekker, New York, 2000.
M. Agosti and A.F. Smeaton, editors. Information Retrieval and Hypertext. Kluwer Academic Publishers, Boston, USA, 1996.
J. Allan. Building hypertexts using information retrieval. Information Processing & Management, 33(2):145–159, 1997.
K. Bharat and M. R. Henzinger. Improved algorithms for topic distillation in a hyperlinked environment. In Proceedings of the ACM International Conference on Research and Development in Information Retrieval (SIGIR), pages 104–111, Melbourne, Australia, August 1998. ACM Press, New York.
J. Blustein, R. E. Webber, and J. Tague-Sutcliffe. Methods for evaluating the quality of hypertext links. Information Processing & Management, 33(2):255–271, 1997.
R. A. Botafogo. Cluster analysis for hypertext systems. In Proceedings of the ACM International Conference on Research and Development in Information Retrieval (SIGIR), pages 116–125, Pittsburgh, PA, June 1993.
R.A. Botafogo, E. Rivlin, and B. Shneiderman. Structural analysis of hypertext: identifying hierarchies and useful metrics. ACM Transactions on Information Systems, 10(2):142–180, 1992.
S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1–7):107–117, 1998. Reprinted from [11].
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the World Wide Web Conference, 1998. http://www7.scu.edu.au/programme/fullpapers/1921/com1921.htm.
S. Chakrabarti, B. E. Dom, D. Gibson, J. Kleinberg, P. Raghavan, and S. Rajagopalan. Automatic resource compilation by analyzing hyperlink structure and associated text. In Proceedings of the World Wide Web Conference, 1998. http://www7.scu.edu.au/programme/fullpapers/1898/com1898.html.
W. B. Croft and H. R. Turtle. Retrieval strategies for hypertext. Information Processing & Management, 29(3):313–324, 1993.
S. Davis Herring. The value of interdisciplinarity: A study based on the design of Internet search engines. Journal of the American Society for Information Science, 50(4):358–365, 1999.
D. Ellis, N. Ford, and J. Furner. In search of the unknown user: indexing, hypertext and theWorld WideWeb. Journal of Documentation, 54(1):28–47, 1998.
E. A. Fox. Characterization of two new experimental collections in computer and information science containing textual and bibliographic concepts. Technical Report TR83-561, Cornell University, Computer Science Department, September 1983.
J. Furner, D. Ellis, and P. Willett. The representation and comparison of hyper-text structures using graphs. In M. Agosti and A. Smeaton, editors, Information retrieval and hypertext, chapter 4, pages 75–96. Kluwer Academic, 1996.
E. Garfield. Citation analysis as a tool in journal evaluation. Science, 178:471–479, 1972.
D. Gibson, J. Kleinberg, and P. Raghavan. Inferring Web communities from link topology. In Proceedings of ACM Hypertext Conference, pages 225–234, 1998.
M. Gordon and P. Pathak. Finding information on the World Wide Web: the retrieval effectiveness of search engines. Information Processing & Management, 35(2):141–180, 1999.
V. N. Gudivada, V. V. Raghavan, W. I. Grosky, and R. Kasanagottu. Information Retrieval on the World Wide Web. IEEE Internet Computing, 1(5):58–68, 1997.
D. Harman. Relevance feedback and other query modification techniques. In W. B. Frakes and R. Baeza-Yates, editors, Information Retrieval: data structures and algorithms, chapter 11. Prentice Hall, Englewood Cliffs, NJ, 1992.
D. Hawking, N. Craswell, and P. Thistlewaite. Overview of TREC-7 Very Large Collection track. In Proceedings of TREC, 1999.
D. Hawking, N. Craswell, P. Thistlewaite, and D. Harman. Results and challenges in Web search evaluation. In Proceedings of the World Wide Web Conference, Toronto, Canada, April 1999.
D. Hawking, E. Voorhees, N. Craswell, and P. Bailey. Overview of TREC-8 Web track. In Proceedings of TREC, 2000.
P. Ingwersen. Web impact factors. Journal of Documentation, 54(2):236–243, 1998.
J. Kleinberg. Authorative sources in a hyperlinked environment. Journal of the ACM, 46(5):604–632, September 1999.
M. Klusch, editor. Intelligent Information Agents: Agent-Based Information Discovery and Management on the Internet. Springer-Verlag, Berlin, Germany, 1999.
S. Lawrence and C. L. Giles. Accessibility of information on the Web. Nature, 400:107–109, July 1999.
S. Lowley. The evaluation of WWW search engines. Journal of Documentation, 56(2):190–211, 2000.
Thelwall. M. Web impact factors and search engine coverage. Journal of Documentation, 56(2):185–189, 2000.
M. Maudlin. A history of search engines. (visited 4 August 2000), 1998. http://www.wiley.com/compbooks/sonnenreich/history.html
M. Melucci. An evaluation of automatically constructed hypertexts for information retrieval. Information Retrieval, 1(1):57–80, 1999.
G. Salton. Evaluation problems in interactive information retrieval. Technical Report 69–39, Department of Computer Science, Cornell University, Ithaca, NY, August 1969.
G. Salton, J. Allan, C. Buckley, and A. Singhal. Automatic analysis, theme generation, and summarization of machine-readable texts. In M. Agosti and A. Smeaton, editors, Information retrieval and hypertext, pages 51–73, 1996.
G. Salton and C. Buckley. Term weighting approaches in automatic text retrieval. Information Processing & Management, 24(5):513–523, 1988.
G. Salton and M. J. McGill. Introduction to modern Information Retrieval. McGraw-Hill, New York, NY, 1983.
J. Savoy. Citation schemes in hypertext information retrieval. In M. Agosti and A. Smeaton, editors, Information retrieval and hypertext, pages 99–116. Kluwer Academic, 1996.
A. Smith. A tale of two web spaces: comparing sites using web impact factors. Journal of Documentation, 55(5):577–592, 1999.
H. Snyder and H. Rosenbaum. Can the search engines be used as tools for web-link analysis? A critical view. Journal of Documentation, 55(4):375–384, 1999.
J. Tague-Sutcliffe. The pragmatics of Information Retrieval experimentation, revisited. Information Processing & Management, 28(4):467–490, 1992.
C. J. van Rijsbergen. Information Retrieval. Butterworths, London, second edition, 1979.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Agosti, M., Melucci, M. (2000). Information Retrieval on the Web. In: Agosti, M., Crestani, F., Pasi, G. (eds) Lectures on Information Retrieval. ESSIR 2000. Lecture Notes in Computer Science, vol 1980. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45368-7_11
Download citation
DOI: https://doi.org/10.1007/3-540-45368-7_11
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41933-4
Online ISBN: 978-3-540-45368-0
eBook Packages: Springer Book Archive