A Theory and Approach to Improving Relevance Ranking in Web Retrieval

Wang, Z. W.; Maguire, R. B.

doi:10.1007/3-540-45490-X_37

Z. W. Wang⁵ &
R. B. Maguire⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2198))

Included in the following conference series:

Asia-Pacific Conference on Web Intelligence

661 Accesses
1 Citations

Abstract

The development of the World Wide Web (WWW) makes a huge amount of information available on-line, and the amount of information continues to increase. As of March 2001 the Google search engine searches 1,346,966,000 Web pages. Many search systems have been developed to manage this massive collection of information. Investigation shows that the primary method used by these systems is classification. Unfortunately, classification has an intrinsic restriction. Consider this example. Recently, we sent a query that consists of the word x“computer ” to Google, and Google found 33,220,000 relevant Web pages. This number far exceeds anything that people can possibly begin to read. This problem is intrinsic to classification, which means it cannot be avoided. The problem is explained by the Pigeonhole Principle (i.e. Dirichlet’s Box Principle) [10]. Suppose we can classify Web pages using all the English words in a dictionary. Given a particular keyword, let us calculate on average how many Web pages will be classified as relevant. Let totalKeywords be the number of all keywords in a vocabulary list. Let averageKeywords be the average number of keywords that a Web document may have. Let the number of all Web pages be n. Let the number of relevant Web pages be numberRelevant. Then we have:

$$ number Relevant \approx \frac{{n \times average Keywords}} {{total Keywords}}. $$

If n = 1346966000, averageKeywords = 100, and totalKeywords = 10000, then numberRelevant is 13469660.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

G. Arfken. Curvilinear coordinates. In 3rd, editor, Mathematical Methods for Physicists, pages 86–90. Academic Press, Orlando, FL, 1985. ç2.1.
Google Scholar
P. Bollmann and S.K.M. Wong. Adaptive linear information retrieval models. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, pages 157–163, 1987.
Google Scholar
Robert T. Craig. Modern Principles of Mathematics. Prentice-Hall, Inc./ Englewood Cliffs, N.J., 1969.
Google Scholar
A. Gray. Modern Differential Geometry of Curves and Surfaces with Mathematica, chapter Metrics on Surfaces. CRC Press, Boca Raton, FL, 2nd edition, 1997.
Google Scholar
M. E. Maron and J. L. Kuhns. On relevance, probabilistic indexing and informatiin retrieval.Journal of the Association for Computing Machinery, 7:216–244, 1960.
Google Scholar
M. J. McGill, M. Koll, and T. Noreault. An evaluation of factors affecting document ranking by information retrieval systems. School of Information Studies, Syracuse University, Syracuse, New York 13210, 1979.
Google Scholar
P. M. Morse and H. Feshbach. Methods of Theoretical Physics, Part I, chapter Curvilinear Coordinates, pages 21–31. McGraw-Hill, New York, 1953.
Google Scholar
G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983.
Google Scholar
H.J. Schneider, P. Bollmann, F. Jochum, E. Konrad, U. Reiner, and V. Weissmann. Leistungsbewertung von information retrieval verfahren (live). Projektbericht, Technische Universitat, Berlin, 1986.
Google Scholar
D. Shanks. Solved and Unsolved Problems in Number Theory, page 161. Chelsea, New York, 4th edition, 1993.
Google Scholar
H. F. Stiles. The association factor in information retrieval. Journal of the ACM, 8:271–279, 1961.
Article Google Scholar
Z. W. Wang. An analysis on vector space model based on computational geometry. Master’s thesis, Department of Computer Science, University of Regina, 1993.
Google Scholar
Z. W. Wang. Riemann space model and similarity-based web retrieval. Ph.D. thesis, Department of Computer Science, University of Regina, 2001.
Google Scholar
Z. W. Wang, R.B. Maguire, and Y. Y. Yao. A non-Euclidean model for web retrieval. In The First International Conference on Web-Age Information Management (WAIM’2000), Shanghai, 2000. Accepted.
Google Scholar
S. K. M. Wong, W. Ziarko, Raghavan, and P. C. N. Wong. On modeling of information retrieval concepts in vector spaces. ACM Transactions on Database Systems, 12(2):229–321, 1987.
Article Google Scholar
Y. Y. Yao. measuring retrieval performance based on user preference of documents. Journel of the American Society for Information Science, 46(2):133–145, 1995.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada, S4S 0A2
Z. W. Wang & R. B. Maguire

Authors

Z. W. Wang
View author publications
You can also search for this author in PubMed Google Scholar
R. B. Maguire
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Systems and Information Engineering, Maebashi Institute of Technology, 460-1 Kamisadori-Cho, Maebashi-City, 371-0816, Japan
Ning Zhong
Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada, S4S 0A2
Yiju Yao
Department of Computer Science, Hong Kong Baptist University, 224 Waterloo Road, Kowloon, Hong Kong, China
Jiming Liu
Department of Information and Computer Science, Waseda University, 3-4-1 Okubo Shinjuku-Ku, Tokyo, 169, Japan
Setsuo Ohsuga

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Z.W., Maguire, R.B. (2001). A Theory and Approach to Improving Relevance Ranking in Web Retrieval. In: Zhong, N., Yao, Y., Liu, J., Ohsuga, S. (eds) Web Intelligence: Research and Development. WI 2001. Lecture Notes in Computer Science(), vol 2198. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45490-X_37

Download citation

DOI: https://doi.org/10.1007/3-540-45490-X_37
Published: 19 October 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42730-8
Online ISBN: 978-3-540-45490-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics