skip to main content
10.1145/1458502.1458523acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Boosting the ranking function learning process using clustering

Published: 30 October 2008 Publication History

Abstract

As the Web continuously grows, the results returned by search engines are too many to review. Lately, the problem of personalizing the ranked result list based on user feedback has gained a lot of attention. Such approaches usually require a big amount of user feedback on the results, which is used as training data. In this work, we present a method that overcomes this issue by exploiting all search results, both rated and unrated, in order to train a ranking function. Given a small initial set of user feedback for some search results, we first perform clustering on all results returned by the search. Based on the clusters created, we extend the initial set of rated results, including new, unrated results. Then, we use a popular training method (Ranking SVM) to train a ranking function using the expanded set of results. The experiments show that our method approximates sufficiently the results of an "ideal" system where all results of each query should be rated in order to be used as training data, something that is not feasible in a real-world scenario.

References

[1]
E. Agichtein, E. Brill, and S. Dumais. Improving web search ranking by incorporating user behavior information. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 19--26, 2006.
[2]
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30:107--117, 1998.
[3]
C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In Proceedings of the 22nd international conference on Machine learning, pages 89--96, 2005.
[4]
Y. Cao, J. Xu, T.-Y. Liu, H. Li, Y. Huang, and H.-W. Hon. Adapting ranking svm to document retrieval. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 186--193, 2006.
[5]
J. Diez, J. J. del Coz, O. Luaces, and A. Bahamonde. Clustering people according to their preference criteria. Expert Systems with Applications: An International Journal, 34:1274--1284, 2008.
[6]
Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. The Journal of Machine Learning Research, 4:933--969, 2003.
[7]
T. Joachims. Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 133--142, 2002.
[8]
X. Li, N. Wang, and S.-Y. Li. A fast training algorithm for svm via clustering technique and gabriel graph. In Proceedings of the International Conference on Intelligent Computing, 2007.
[9]
T.-Y. Liu, J. Xu, T. Qin, W. Xiong, and H. Li. Letor: Benchmark dataset for research on learning to rank for information retrieval. In SIGIR 2007 Workshop on Learning to Rank for Information Retrieval, 2007.
[10]
S. Pandey, S. Roy, C. O. J. Cho, and S. Chakrabarti. Shuffling a stacked deck: the case for partially randomized ranking of search engine results. In Proceedings of the 31st international conference on Very large data bases, pages 781--792, 2005.
[11]
F. Radlinski and T. Joachims. Query chains: Learning to rank from implicit feedback. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 239--248, 2005.
[12]
F. Radlinski and T. Joachims. Active exploration for learning rankings from clickthrough data. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 570--579, 2007.
[13]
S. E. Robertson. Overview of the okapi projects. Journal of Documentation, 53(1):3--7, 1997.
[14]
G.-R. Xue, H.-J. Zeng, Z. Chen, Y. Yu, W.-Y. Ma, W. Xi, and W. Fan. Optimizing web search using web click-through data. In Proceedings of the thirteenth ACM international conference on Information and knowledge management, pages 118--126, 2004.
[15]
C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 334--342, 2001.
[16]
Y. Zhao and G. Karypis. Empirical and theoretical comparisons of selected criterion functions for document clustering. Machine Learning, 55(3):311--331, 2004.
[17]
Y. Zhao, G. Karypis, and U. Fayyad. Hierarchical clustering algorithms for document datasets. Data Mining and Knowledge Discovery, 10(2):141--168, 2005.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WIDM '08: Proceedings of the 10th ACM workshop on Web information and data management
October 2008
164 pages
ISBN:9781605582603
DOI:10.1145/1458502
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 October 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. clickthrough data
  2. clustering
  3. ranking
  4. relevance judgement
  5. search engine
  6. training

Qualifiers

  • Research-article

Conference

CIKM08
CIKM08: Conference on Information and Knowledge Management
October 30, 2008
California, Napa Valley, USA

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 226
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media