Abstract
How to obtain an unbiased ranking model by learning to rank with biased user feedback is an important research question for IR. Existing work on unbiased learning to rank (ULTR) can be broadly categorized into two groups—the studies on unbiased learning algorithms with logged data, namely, the offline unbiased learning, and the studies on unbiased parameters estimation with real-time user interactions, namely, the online learning to rank. While their definitions of unbiasness are different, these two types of ULTR algorithms share the same goal—to find the best models that rank documents based on their intrinsic relevance or utility. However, most studies on offline and online unbiased learning to rank are carried in parallel without detailed comparisons on their background theories and empirical performance. In this article, we formalize the task of unbiased learning to rank and show that existing algorithms for offline unbiased learning and online learning to rank are just the two sides of the same coin. We evaluate eight state-of-the-art ULTR algorithms and find that many of them can be used in both offline settings and online environments with or without minor modifications. Further, we analyze how different offline and online learning paradigms would affect the theoretical foundation and empirical effectiveness of each algorithm on both synthetic and real search data. Our findings provide important insights and guidelines for choosing and deploying ULTR algorithms in practice.
- Aman Agarwal, Kenta Takatsu, Ivan Zaitsev, and Thorsten Joachims. 2019. A general framework for counterfactual learning-to-rank. In Proceedings of the ACM Conference on Research and Development in Information Retrieval (SIGIR’19).Google ScholarDigital Library
- Aman Agarwal, Xuanhui Wang, Cheng Li, Michael Bendersky, and Marc Najork. 2019. Addressing trust bias for unbiased learning-to-rank. In Proceedings of the World Wide Web Conference. ACM, 4--14.Google ScholarDigital Library
- Aman Agarwal, Ivan Zaitsev, Xuanhui Wang, Cheng Li, Marc Najork, and Thorsten Joachims. 2019. Estimating position bias without intrusive interventions. In Proceedings of the 12th ACM International Conference on Web Search and Data Mining. ACM, 474--482.Google ScholarDigital Library
- Qingyao Ai, Keping Bi, Jiafeng Guo, and W. Bruce Croft. 2018. Learning a deep listwise context model for ranking refinement. In Proceedings of the 41th ACM Conference on Research and Development in Information Retrieval (SIGIR’18). ACM.Google Scholar
- Qingyao Ai, Keping Bi, Cheng Luo, Jiafeng Guo, and W. Bruce Croft. 2018. Unbiased learning to rank with unbiased propensity estimation. In Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 385--394.Google Scholar
- Qingyao Ai, Jiaxin Mao, Yiqun Liu, and W. Bruce Croft. 2018. Unbiased learning to rank: Theory and practice. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. ACM, 2305--2306.Google Scholar
- Qingyao Ai, Xuanhui Wang, Sebastian Bruch, Nadav Golbandi, Michael Bendersky, and Marc Najork. 2019. Learning groupwise multivariate scoring functions using deep neural networks. In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval. 85--92.Google ScholarDigital Library
- Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. 2005. Learning to rank using gradient descent. In Proceedings of the 22nd International Conference on Machine Learning (ICML’05). ACM, 89--96.Google ScholarDigital Library
- Christopher J. C. Burges. 2010. From ranknet to lambdarank to lambdamart: An overview. Learning 11 (2010), 23--581.Google Scholar
- Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li. 2007. Learning to rank: From pairwise approach to listwise approach. In Proceedings of the 24th International Conference on Machine Learning (ICML’07). ACM, 129--136.Google ScholarDigital Library
- Olivier Cappé and Eric Moulines. 2009. On-line expectation--maximization algorithm for latent data models. J. Roy. Stat. Soc.: Ser. B (Stat. Methodol.) 71, 3 (2009), 593--613.Google ScholarCross Ref
- Olivier Chapelle and Yi Chang. 2011. Yahoo! Learning to rank challenge overview. In Yahoo! Learning to Rank Challenge. 1--24.Google Scholar
- Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. 2009. Expected reciprocal rank for graded relevance. In Proceedings of the 18th ACM Conference on Information and Knowledge Management. ACM, 621--630.Google ScholarDigital Library
- Ruey-Cheng Chen, Qingyao Ai, Gaya Jayasinghe, and W. Bruce Croft. 2019. Correcting for recency bias in job recommendation. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2185--2188.Google Scholar
- Aleksandr Chuklin, Ilya Markov, and Maarten de Rijke. 2015. Click models for web search. Synth. Lect. Info. Concepts Retriev. Serv. 7, 3 (2015), 1--115.Google ScholarCross Ref
- Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. 2008. An experimental comparison of click position-bias models. In Proceedings of the 1st International Conference on Web Search and Data Mining (WSDM’08). ACM, 87--94.Google ScholarDigital Library
- Yajuan Duan, Long Jiang, Tao Qin, Ming Zhou, and Heung-Yeung Shum. 2010. An empirical study on learning to rank of tweets. In Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 295--303.Google ScholarDigital Library
- Georges E. Dupret and Benjamin Piwowarski. 2008. A user browsing model to predict search engine click data from past observations. In Proceedings of the 31st ACM Conference on Research and Development in Information Retrieval (SIGIR’08). ACM, 331--338.Google Scholar
- Artem Grotov and Maarten de Rijke. 2016. Online learning to rank for information retrieval. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 1215--1218.Google ScholarDigital Library
- Fan Guo, Chao Liu, and Yi Min Wang. 2009. Efficient multiple-click models in web search. In Proceedings of the 2nd ACM International Conference on Web Search and Data Mining. ACM, 124--131.Google ScholarDigital Library
- Katja Hofmann, Anne Schuth, Shimon Whiteson, and Maarten De Rijke. 2013. Reusing historical interaction data for faster online learning to rank for IR. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining. 183--192.Google ScholarDigital Library
- Ziniu Hu, Yang Wang, Qu Peng, and Hang Li. 2019. Unbiased LambdaMART: An unbiased pairwise learning-to-rank algorithm. In Proceedings of the World Wide Web Conference. ACM, 2830--2836.Google ScholarDigital Library
- Rolf Jagerman, Harrie Oosterhuis, and Maarten de Rijke. 2019. To model or to intervene: A comparison of counterfactual and online learning to rank from user interactions. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’19). ACM, New York, NY, 15--24. DOI:https://doi.org/10.1145/3331184.3331269Google ScholarDigital Library
- Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Trans. Info. Syst. 20, 4 (2002), 422--446.Google ScholarDigital Library
- Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. In Proceedings of the 8th ACM SIGKDD. ACM, 133--142.Google ScholarDigital Library
- Thorsten Joachims. 2006. Training linear SVMs in linear time. In Proceedings of the 12th ACM SIGKDD. ACM, 217--226.Google ScholarDigital Library
- Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, and Geri Gay. 2005. Accurately interpreting clickthrough data as implicit feedback. In Proceedings of the 28th Annual ACM Conference on Research and Development in Information Retrieval (SIGIR’05). Acm, 154--161.Google ScholarDigital Library
- Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, Filip Radlinski, and Geri Gay. 2007. Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Trans. Info. Syst. 25, 2 (2007), 7.Google ScholarDigital Library
- Thorsten Joachims and Adith Swaminathan. 2016. Counterfactual evaluation and learning for search, recommendation and ad placement. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 1199--1201.Google ScholarDigital Library
- Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel. 2017. Unbiased learning-to-rank with biased feedback. In Proceedings of the 10th ACM International Conference on Web Search and Data Mining (WSDM’17). ACM, 781--789.Google ScholarDigital Library
- Sumeet Katariya, Branislav Kveton, Csaba Szepesvari, and Zheng Wen. 2016. DCM bandits: Learning to rank with multiple clicks. In Proceedings of the International Conference on Machine Learning. 1215--1224.Google Scholar
- Mark T. Keane and Maeve O’Brien. 2006. Modeling result-list searching in the world wide web: The role of relevance topologies and trust bias. In Proceedings of the Cognitive Science Society, Vol. 28.Google Scholar
- Branislav Kveton, Csaba Szepesvari, Zheng Wen, and Azin Ashkan. 2015. Cascading bandits: Learning to rank in the cascade model. In Proceedings of the International Conference on Machine Learning. 767--776.Google Scholar
- Tor Lattimore, Branislav Kveton, Shuai Li, and Csaba Szepesvari. 2018. TopRank: A practical algorithm for online stochastic ranking. In Advances in Neural Information Processing Systems. MIT Press, 3945--3954.Google Scholar
- Hang Li. 2011. A short introduction to learning to rank. IEICE Trans. Info. Syst. 94, 10 (2011), 1854--1862.Google ScholarCross Ref
- Ping Li, Qiang Wu, and Christopher J. Burges. 2008. Mcrank: Learning to rank using multiple classification and gradient boosting. In Advances in Neural Information Processing Systems. MIT Press, 897--904.Google ScholarDigital Library
- Shuai Li, Tor Lattimore, and Csaba Szepesvári. 2018. Online learning to rank with features. Retrieved from https://arXiv:1810.02567.Google Scholar
- Tie-Yan Liu. 2009. Learning to rank for information retrieval. Found. Trends Info. Retriev. 3, 3 (2009), 225--331.Google ScholarDigital Library
- Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Fabrizio Silvestri, and Salvatore Trani. 2016. Post-learning optimization of tree ensembles for efficient ranking. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. 949--952.Google ScholarDigital Library
- Jiaxin Mao, Zhumin Chu, Yiqun Liu, Min Zhang, and Shaoping Ma. 2019. Investigating the reliability of click models. In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval. 125--128.Google ScholarDigital Library
- Jiaxin Mao, Cheng Luo, Min Zhang, and Shaoping Ma. 2018. Constructing click models for mobile search. In Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. ACM, 775--784.Google ScholarDigital Library
- Harrie Oosterhuis and Maarten de Rijke. 2018. Differentiable unbiased online learning to rank. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. ACM, 1293--1302.Google ScholarDigital Library
- Harrie Oosterhuis and Maarten de Rijke. 2019. Optimizing ranking models in an online setting. In Proceedings of the European Conference on Information Retrieval. Springer, 382--396.Google ScholarDigital Library
- Joao Palotti. 2016. Learning to Rank for Personalized e-commerce Search at CIKM Cup 2016. Technical Report.Google Scholar
- Liang Pang, Jun Xu, Qingyao Ai, Yanyan Lan, Xueqi Cheng, and Jirong Wen. 2020. SetRank: Learning a permutation-invariant ranking model for information retrieval. In Proceedings of the 43th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM.Google ScholarDigital Library
- Rama Kumar Pasumarthi, Xuanhui Wang, Michael Bendersky, and Marc Najork. 2019. Self-attentive document interaction networks for permutation equivariant ranking. Retrieved from https://arXiv:1910.09676.Google Scholar
- Jay M. Ponte and W. Bruce Croft. 1998. A language modeling approach to information retrieval. In Proceedings of the 21st Annual ACM Conference on Research and Development in Information Retrieval (SIGIR’98). ACM, 275--281.Google Scholar
- Matthew Richardson, Ewa Dominowska, and Robert Ragno. 2007. Predicting clicks: Estimating the click-through rate for new ads. In Proceedings of the 16th International Conference on World Wide Web. ACM, 521--530.Google ScholarDigital Library
- Stephen E. Robertson and Steve Walker. 1994. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Proceedings of the 17th Annual ACM Conference on Research and Development in Information Retrieval (SIGIR’94). Springer-Verlag, New York, 232--241.Google Scholar
- Anne Schuth, Harrie Oosterhuis, Shimon Whiteson, and Maarten de Rijke. 2016. Multileave gradient descent for fast online learning to rank. In Proceedings of the 9th ACM International Conference on Web Search and Data Mining (WSDM’16). ACM, 457--466.Google ScholarDigital Library
- Anne Schuth, Floor Sietsma, Shimon Whiteson, Damien Lefortier, and Maarten de Rijke. 2014. Multileaved comparisons for fast online evaluation. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. ACM, 71--80.Google ScholarDigital Library
- Mark D. Smucker, James Allan, and Ben Carterette. 2007. A comparison of statistical significance tests for information retrieval evaluation. In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM’07). ACM, 623--632.Google ScholarDigital Library
- Chao Wang, Yiqun Liu, Min Zhang, Shaoping Ma, Meihong Zheng, Jing Qian, and Kuo Zhang. 2013. Incorporating vertical results into search click models. In Proceedings of the 36th ACM Conference on Research and Development in Information Retrieval (SIGIR’13). ACM, 503--512.Google ScholarDigital Library
- Huazheng Wang, Sonwoo Kim, Eric McCord-Snook, Qingyun Wu, and Hongning Wang. 2019. Variance reduction in gradient exploration for online learning to rank. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM.Google ScholarDigital Library
- Huazheng Wang, Ramsey Langley, Sonwoo Kim, Eric McCord-Snook, and Hongning Wang. 2018. Efficient exploration of gradient space for online learning to rank. In Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM.Google ScholarDigital Library
- Xuanhui Wang, Michael Bendersky, Donald Metzler, and Marc Najork. 2016. Learning to rank with selection bias in personal search. In Proceedings of the 39th ACM Conference on Research and Development in Information Retrieval (SIGIR’16). ACM, 115--124.Google ScholarDigital Library
- Xuanhui Wang, Nadav Golbandi, Michael Bendersky, Donald Metzler, and Marc Najork. 2018. Position bias estimation for unbiased learning to rank in personal search. In Proceedings of the 11th ACM International Conference on Web Search and Data Mining (WSDM’18). ACM, New York, NY, 610--618. DOI:https://doi.org/10.1145/3159652.3159732Google ScholarDigital Library
- Liu Yang, Qingyao Ai, Damiano Spina, Ruey-Cheng Chen, Liang Pang, W. Bruce Croft, Jiafeng Guo, and Falk Scholer. 2016. Beyond factoid QA: Effective methods for non-factoid answer sentence retrieval. In Proceedings of the European Conference on Information Retrieval (ECIR’16). Springer, 115--128.Google ScholarCross Ref
- Tao Yang, Shikai Fang, Shibo Li, Yulan Wang, and Qingyao Ai. 2020. Analysis of multivariate scoring functions for automatic unbiased learning to rank. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management. 2277--2280.Google ScholarDigital Library
- Yisong Yue and Thorsten Joachims. 2009. Interactively optimizing information retrieval systems as a dueling bandits problem. In Proceedings of the 26th International Conference on Machine Learning (ICML’09). ACM, 1201--1208.Google ScholarDigital Library
- Chengxiang Zhai and John Lafferty. 2017. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the ACM Conference on Research and Development in Information Retrieval (SIGIR’17), Vol. 51. ACM, 268--276.Google ScholarDigital Library
- Tong Zhao and Irwin King. 2016. Constructing reliable gradient exploration for online learning to rank. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 1643--1652.Google ScholarDigital Library
- Masrour Zoghi, Tomas Tunys, Mohammad Ghavamzadeh, Branislav Kveton, Csaba Szepesvari, and Zheng Wen. 2017. Online learning to rank in stochastic click models. In Proceedings of the International Conference on Machine Learning (ICML’17). 4199--4208.Google Scholar
Index Terms
- Unbiased Learning to Rank: Online or Offline?
Recommendations
Unbiased Learning to Rank with Unbiased Propensity Estimation
SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information RetrievalLearning to rank with biased click data is a well-known challenge. A variety of methods has been explored to debias click data for learning to rank such as click models, result interleaving and, more recently, the unbiased learning-to-rank framework ...
Policy-Aware Unbiased Learning to Rank for Top-k Rankings
SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information RetrievalCounterfactual Learning to Rank (LTR) methods optimize ranking systems using logged user interactions that contain interaction biases. Existing methods are only unbiased if users are presented with all relevant items in every ranking. There is currently ...
Online Learning to Rank: Absolute vs. Relative
WWW '15 Companion: Proceedings of the 24th International Conference on World Wide WebOnline learning to rank holds great promise for learning personalized search result rankings. First algorithms have been proposed, namely absolute feedback approaches, based on contextual bandits learning; and relative feedback approaches, based on ...
Comments