Abstract
The performance of web search engines may often deteriorate due to the diversity and noise contained within web pages. Some methods proposed to use clickthrough data to achieve more accurate information for web pages as well as improve the search performance. However, sparseness became the great challenge in exploiting clickthrough data. In this paper, we propose a novel algorithm to exploit the user clickthrough data. It first explores the relationship between queries and web pages to mine out co-visiting as the associative relationship among the Web pages, and then Spreading Activation mechanism is used to re-rank the results of Web search. Our approach could alleviate such sparseness and the experimental results on a large set of MSN clickthrough log data show a significant improvement on search performance over the DirectHit algorithm as well as the baseline search engine.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Jansen, B.J., Spink, A., Bateman, J., Saracevic, T.: Real life information retrieval: a study of user queries on the Web. ACM SIGIR Forum 32(1), 5–17 (Spring, 1998)
Brian, D.D., David, G.D., David, B.L.: Finding Relevant Website Queries. In: Proceedings of the Twelfth International World Wide Web Conference (2003)
Bollen, J., Vandesompel, H., Rocha, L.M.: Mining associative relations from web site logs and their application to context-dependent retrieval using spreading activation. In: Proceedings of the Workshop on Organizing Web Space (WOWS). ACM Digital Libraries, New York (1999)
Crestani, F., Lee, P.L.: Searching the web by constrained spreading activation. Inf. Proc. Manage. 36, 585–605 (2000)
Huang, C.-K., Chien, L.-F., Oyang, Y.-J.: Relevant term suggestion in interactive web search based on contextual information in query session logs. JASIST 54(7), 638–649 (2003)
Cui, H., Wen, J.R., Nie, J.Y., Ma, W.Y.: Query Expansion by Mining User Logs. IEEE Transaction on Knowledge and Data Engineering 15(4) (July/August 2003)
Collins, A.M., Loftus, E.F.: A spreading activation theory of semantic processing. Psych. Rev. 82, 6, 407–428 (1975)
Beeferman, D., Berger, A.: Agglomerative clustering of a search engine query log. In: Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 407–415 (2000)
Funas, G.W., Landauer, T.K., Gomez, L.M., Dumais, S.T.: The vocabulary problem in human-system communication. Communications of the ACM 20,11, 946–971 (1987)
Jeh, G., Widom, J.: SimRank: A measure of structural-context similarity. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada (July 2002)
Salton, G., Buckley, C.: On the use of spreading activation methods in automatic information. In: Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval, Grenoble, France, May 1988, pp. 147–160 (1988)
Small, H.: Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science 24, 265–269 (1973)
Wen, J.-R., Nie, J.-Y., Zhang, H.-J.: Clustering user queries of a search engine. In: Proceedings of the Tenth International World Wide Web Conference, Hong Kong (May 2001)
Joachims, T.: Optimizing Search Engine using Clickthrough Data. In: Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (2002)
Kessler, M.M.: Bibliographic coupling between scientific papers. American Documentation 14, 10–25 (1963)
MSN Search Engine, http://www.msn.com
Nick, C., David, H., Stephen, R.: Effective Site Finding using Link Anchor Informationa. In: ACM SIGIR 2001, New Orleans (2001)
Belkin, N.J.: Helping people find what they don’t know. Communications of the ACM 43(8), 58–61 (2000)
Pirolli, P., Pitkow, J., Rao, R.: Silk from a sow’s ear: Extracting usable structures from the web. In: Proceedings of the ACMCHI 1996 Conference on Human Factors in Computing Systems, pp. 118–125 (1996)
Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)
Robertson, S.E., et al.: Okapi at TREC-3. In: Overview of the Third Text REtrieval Conference(TREC-3), 109-126 (1995)
Larson, R.R.: Bibliometrics of the World-Wide Web: An exploratory analysis of the intellectual structure of cyberspace. In: Proceedings of the Annual Meeting of the American Society for Information Science, Baltimore, Maryland (October 1996)
Brin, S., Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. In: Proceedings of the 7th international World Wide Web Conference, vol. 7 (1998)
Chakrabarti, S., et al.: Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text. In: Proceedings of the 7th International World Wide Web Conference (1998)
Thijs, W., Wessel, K., Djoerd, H.: Retrieving Web Pages using Content, Links, URLs and Anchors. TREC10 (2002)
Raghavan, V.V., Sever, H.: On the reuse of past optimal queries. In: Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, WA, July 1995, pp. 344–350 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jiang, XM., Song, WG., Zeng, HJ. (2005). Applying Associative Relationship on the Clickthrough Data to Improve Web Search. In: Losada, D.E., Fernández-Luna, J.M. (eds) Advances in Information Retrieval. ECIR 2005. Lecture Notes in Computer Science, vol 3408. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31865-1_34
Download citation
DOI: https://doi.org/10.1007/978-3-540-31865-1_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25295-5
Online ISBN: 978-3-540-31865-1
eBook Packages: Computer ScienceComputer Science (R0)