ABSTRACT
During the past 10--15 years offline learning to rank has had a tremendous influence on information retrieval, both scientifically and in practice. Recently, as the limitations of offline learning to rank for information retrieval have become apparent, there is increased attention for online learning to rank methods for information retrieval in the community. Such methods learn from user interactions rather than from a set of labeled data that is fully available for training up front.
Below we describe why we believe that the time is right for an intermediate-level tutorial on online learning to rank, the objectives of the proposed tutorial, its relevance, as well as more practical details, such as format, schedule and support materials.
- Peter Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learnng Research, 3: 397--422, 2003. Google ScholarDigital Library
- Alexey Borisov, Pavel Serdyukov, and Maarten de Rijke. Using metafeatures to increase the effectiveness of latent semantic models in web search. In WWW 2016: 25th International World Wide Web Conference. ACM, April 2016. Google ScholarDigital Library
- Christopher J.C. Burges. From ranknet to lambdarank to lambdamart: An overview. Technical Report MSR-TR-2010-82, June 2010.Google Scholar
- Giuseppe Burtini, Jason Loeppky, and Ramon Lawrence. A survey of online experiment design with the stochastic multi-armed bandit. CoRR, abs/1510.00757, 2015. URL http://arxiv.org/abs/1510.00757.Google Scholar
- Róbert Busa-Fekete and Eyke Hüllermeier. A survey of preference-based online learning with bandit algorithms. In Algorithmic Learning Theory: 25th International Conference, ALT 2014, Bled, Slovenia, October 8-10, 2014. Proceedings, pages 18--39, Cham, 2014. Springer International Publishing.Google ScholarCross Ref
- Róbert Busa-Fekete and Eyke Hüllermeier. A survey of preference-based online learning with bandit algorithms. In ALT '14, number 8776 in LNCS, pages 18--39. Springer, 2014.Google ScholarCross Ref
- Susan T. Dumais. The web changes everything: Understanding and supporting people in dynamic information environments. In Research and Advanced Technology for Digital Libraries, 14th European Conference, ECDL 2010, 2010. Google ScholarDigital Library
- Artem Grotov and Maarten de Rijke. Online learning to rank for information retrieval: A survey. Draft, 2016.Google Scholar
- Artem Grotov, Shimon Whiteson, and Maarten de Rijke. Bayesian ranker comparison based on historical user interactions. In SIGIR 2015: 38th international ACM SIGIR conference on Research and development in information retrieval. ACM, August 2015. Google ScholarDigital Library
- Artem Grotov, Maarten de Rijke, and Shimon Whiteson. Online LambdaRank. In Submitted, 2016.Google Scholar
- Katja Hofmann, Shimon Whiteson, and Maarten de Rijke. Balancing exploration and exploitation in learning to rank online. In ECIR 2011: 33rd European Conference on Information Retrieval. Springer, April 2011. Google ScholarDigital Library
- Katja Hofmann, Anne Schuth, Shimon Whiteson, and Maarten de Rijke. Reusing historical interaction data for faster online learning to rank for information retrieval. In WSDM 2013: International Conference on Web Search and Data Mining. ACM, February 2013. Google ScholarDigital Library
- Katja Hofmann, Shimon Whiteson, and Maarten de Rijke. Balancing exploration and exploitation in listwise and pairwise online learning to rank for information retrieval. Information Retrieval Journal, 16 (1): 63--90, February 2013. Google ScholarDigital Library
- Katja Hofmann, Shimon Whiteson, and Maarten de Rijke. Fidelity, soundness, and efficiency of interleaved comparison methods. ACM Transactions on Information Systems, 31 (3): Article 18, October 2013. Google ScholarDigital Library
- Katja Hofmann, Shimon Whiteson, Anne Schuth, and Maarten de Rijke. Learning to rank for information retrieval from user interactions. ACM SIGWEB Newsletter, (Spring): 5:1--5:7, April 2014. Google ScholarDigital Library
- Thorsten Joachims. Optimizing search engines using clickthrough data. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '02, pages 133--142, New York, NY, USA, 2002. ACM. Google ScholarDigital Library
- Youngho Kim, Ahmed Hassan, Ryen W. White, and Imed Zitouni. Modeling dwell time to predict click-level satisfaction. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining, WSDM '14, pages 193--202, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
- Ron Kohavi, Roger Longbotham, Dan Sommerfield, and Randal M. Henne. Controlled experiments on the web: Survey and practical guide. Data Mining and Knowledge Discovery, 18 (1): 140--181, 2009. Google ScholarDigital Library
- Branislav Kveton, Csaba Szepesvári, Zheng Wen, and Azin Ashkan. Cascading bandits. CoRR, abs/1502.02763, 2015. URL http://arxiv.org/abs/1502.02763.Google Scholar
- John Langford and Tong Zhang. The epoch-greedy algorithm for multi-armed bandits with side information. In J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 817--824. Curran Associates, Inc., 2008.Google Scholar
- Damien Lefortier, Pavel Serdyukov, and Maarten de Rijke. Online exploration for detecting shifts in fresh intent. In CIKM 2014: 23rd ACM Conference on Information and Knowledge Management. ACM, November 2014. Google ScholarDigital Library
- Tie-Yan Liu. Learning to rank for information retrieval. Found. Trends Inf. Retr., 3 (3): 225--331, March 2009. Google ScholarDigital Library
- Harrie Oosterhuis, Anne Schuth, and Maarten de Rijke. Probabilistic multileave gradient descent. In ECIR 2016: 38th European Conference on Information Retrieval, LNCS. Springer, March 2016.Google ScholarCross Ref
- Mark Sanderson. Test collection based evaluation of information retrieval systems. Found. & Tr. Inform. Retr., 4 (4): 247--375, 2010.Google ScholarCross Ref
- Anne Schuth, Katja Hofmann, Shimon Whiteson, and Maarten de Rijke. Lerot: an online learning to rank framework. In Living Labs for Information Retrieval Evaluation workshop at CIKM'13., 2013. Google ScholarDigital Library
- Anne Schuth, Krisztian Balog, and Liadh Kelly. Overview of the living labs for information retrieval evaluation (ll4ir) clef lab 2015. In Experimental IR Meets Multilinguality, Multimodality, and Interaction, pages 484--496. Springer, 2015. Google ScholarDigital Library
- Anne Schuth, Harrie Oosterhuis, Shimon Whiteson, and Maarten de Rijke. Multileave gradient descent for fast online learning to rank. In WSDM 2016: The 9th International Conference on Web Search and Data Mining. ACM, February 2016. Google ScholarDigital Library
- Aleksandrs Slivkins, Filip Radlinski, and Sreenivas Gollapudi. Ranked bandits in metric spaces: learning optimally diverse rankings over large document collections. Technical report, arXiv preprint arXiv:1005.5197, 2010.Google Scholar
- Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. MIT Press., 1998. Google ScholarDigital Library
- Adith Swaminathan and Thorsten Joachims. Counterfactual risk minimization: Learning from logged bandit feedback. CoRR, abs/1502.02362, 2015. URL http://arxiv.org/abs/1502.02362.Google ScholarDigital Library
- Aibo Tian and Matthew Lease. Active learning to maximize accuracy vs. effort in interactive information retrieval. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '11, pages 145--154, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- 016)}trec-openTREC. OpenSearch track. http://trec-open-search.org, 2016.Google Scholar
- Aleksandr Vorobev, Damien Lefortier, Gleb Gusev, and Pavel Serdyukov. Gathering additional feedback on search results by multi-armed bandits with respect to production ranking. In Proceedings of the 24th International Conference on World Wide Web, pages 1177--1187. ACM, 2015. Google ScholarDigital Library
- Yisong Yue and Thorsten Joachims. Interactively optimizing information retrieval systems as a dueling bandits problem. In ICML '09, 2009. Google ScholarDigital Library
- Masrour Zoghi, Shimon Whiteson, Maarten de Rijke, and Remi Munos. Relative confidence sampling for efficient on-line ranker evaluation. In 7th ACM WSDM Conference (WSDM2014). ACM, February 2014. Google ScholarDigital Library
- Masrour Zoghi, Shimon Whiteson, Remi Munos, and Maarten de Rijke. Relative upper confidence bound for the k-armed dueling bandit problem. In ICML 2014: International Conference on Machine Learning, June 2014.Google Scholar
- Masrour Zoghi, Shimon Whiteson, and Maarten de Rijke. Mergerucb: A method for large-scale online ranker evaluation. In WSDM 2015: The Eighth International Conference on Web Search and Data Mining. ACM, February 2015. Google ScholarDigital Library
- Masrour Zoghi, Shimon Whiteson, Zohar Karnin, and Maarten de Rijke. Copeland dueling bandits. In NIPS 2015, December 2015. Google ScholarDigital Library
- Masrour Zoghi, Tomáš Tunys, Lihong Li, Damien Jose, Junyan Chen, Chun Ming Chin, and Maarten de Rijke. Click-based hot fixes for underperforming torso queries. In SIGIR 2016: 39th international ACM SIGIR conference on Research and development in information retrieval. ACM, July 2016. Google ScholarDigital Library
Index Terms
- Online Learning to Rank for Information Retrieval: SIGIR 2016 Tutorial
Recommendations
Online Learning to Rank for Cross-Language Information Retrieval
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information RetrievalOnline learning to rank for information retrieval has shown great promise in optimization of Web search results based on user interactions. However, online learning to rank has been used only in the monolingual setting where queries and documents are in ...
How do Online Learning to Rank Methods Adapt to Changes of Intent?
SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information RetrievalOnline learning to rank (OLTR) uses interaction data, such as clicks, to dynamically update rankers. OLTR has been thought to capture user intent change overtime - a task that is impossible for rankers trained on statistic datasets such as in offline and ...
Reinforcement online learning to rank with unbiased reward shaping
AbstractOnline learning to rank (OLTR) aims to learn a ranker directly from implicit feedback derived from users’ interactions, such as clicks. Clicks however are a biased signal: specifically, top-ranked documents are likely to attract more clicks than ...
Comments