Abstract
The development of the World Wide Web (WWW) makes a huge amount of information available on-line, and the amount of information continues to increase. As of March 2001 the Google search engine searches 1,346,966,000 Web pages. Many search systems have been developed to manage this massive collection of information. Investigation shows that the primary method used by these systems is classification. Unfortunately, classification has an intrinsic restriction. Consider this example. Recently, we sent a query that consists of the word x“computer ” to Google, and Google found 33,220,000 relevant Web pages. This number far exceeds anything that people can possibly begin to read. This problem is intrinsic to classification, which means it cannot be avoided. The problem is explained by the Pigeonhole Principle (i.e. Dirichlet’s Box Principle) [10]. Suppose we can classify Web pages using all the English words in a dictionary. Given a particular keyword, let us calculate on average how many Web pages will be classified as relevant. Let totalKeywords be the number of all keywords in a vocabulary list. Let averageKeywords be the average number of keywords that a Web document may have. Let the number of all Web pages be n. Let the number of relevant Web pages be numberRelevant. Then we have:
If n = 1346966000, averageKeywords = 100, and totalKeywords = 10000, then numberRelevant is 13469660.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
G. Arfken. Curvilinear coordinates. In 3rd, editor, Mathematical Methods for Physicists, pages 86–90. Academic Press, Orlando, FL, 1985. ç2.1.
P. Bollmann and S.K.M. Wong. Adaptive linear information retrieval models. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, pages 157–163, 1987.
Robert T. Craig. Modern Principles of Mathematics. Prentice-Hall, Inc./ Englewood Cliffs, N.J., 1969.
A. Gray. Modern Differential Geometry of Curves and Surfaces with Mathematica, chapter Metrics on Surfaces. CRC Press, Boca Raton, FL, 2nd edition, 1997.
M. E. Maron and J. L. Kuhns. On relevance, probabilistic indexing and informatiin retrieval.Journal of the Association for Computing Machinery, 7:216–244, 1960.
M. J. McGill, M. Koll, and T. Noreault. An evaluation of factors affecting document ranking by information retrieval systems. School of Information Studies, Syracuse University, Syracuse, New York 13210, 1979.
P. M. Morse and H. Feshbach. Methods of Theoretical Physics, Part I, chapter Curvilinear Coordinates, pages 21–31. McGraw-Hill, New York, 1953.
G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983.
H.J. Schneider, P. Bollmann, F. Jochum, E. Konrad, U. Reiner, and V. Weissmann. Leistungsbewertung von information retrieval verfahren (live). Projektbericht, Technische Universitat, Berlin, 1986.
D. Shanks. Solved and Unsolved Problems in Number Theory, page 161. Chelsea, New York, 4th edition, 1993.
H. F. Stiles. The association factor in information retrieval. Journal of the ACM, 8:271–279, 1961.
Z. W. Wang. An analysis on vector space model based on computational geometry. Master’s thesis, Department of Computer Science, University of Regina, 1993.
Z. W. Wang. Riemann space model and similarity-based web retrieval. Ph.D. thesis, Department of Computer Science, University of Regina, 2001.
Z. W. Wang, R.B. Maguire, and Y. Y. Yao. A non-Euclidean model for web retrieval. In The First International Conference on Web-Age Information Management (WAIM’2000), Shanghai, 2000. Accepted.
S. K. M. Wong, W. Ziarko, Raghavan, and P. C. N. Wong. On modeling of information retrieval concepts in vector spaces. ACM Transactions on Database Systems, 12(2):229–321, 1987.
Y. Y. Yao. measuring retrieval performance based on user preference of documents. Journel of the American Society for Information Science, 46(2):133–145, 1995.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, Z.W., Maguire, R.B. (2001). A Theory and Approach to Improving Relevance Ranking in Web Retrieval. In: Zhong, N., Yao, Y., Liu, J., Ohsuga, S. (eds) Web Intelligence: Research and Development. WI 2001. Lecture Notes in Computer Science(), vol 2198. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45490-X_37
Download citation
DOI: https://doi.org/10.1007/3-540-45490-X_37
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42730-8
Online ISBN: 978-3-540-45490-8
eBook Packages: Springer Book Archive