skip to main content
research-article

Unbiased Learning to Rank: Online or Offline?

Authors Info & Claims
Published:17 February 2021Publication History
Skip Abstract Section

Abstract

How to obtain an unbiased ranking model by learning to rank with biased user feedback is an important research question for IR. Existing work on unbiased learning to rank (ULTR) can be broadly categorized into two groups—the studies on unbiased learning algorithms with logged data, namely, the offline unbiased learning, and the studies on unbiased parameters estimation with real-time user interactions, namely, the online learning to rank. While their definitions of unbiasness are different, these two types of ULTR algorithms share the same goal—to find the best models that rank documents based on their intrinsic relevance or utility. However, most studies on offline and online unbiased learning to rank are carried in parallel without detailed comparisons on their background theories and empirical performance. In this article, we formalize the task of unbiased learning to rank and show that existing algorithms for offline unbiased learning and online learning to rank are just the two sides of the same coin. We evaluate eight state-of-the-art ULTR algorithms and find that many of them can be used in both offline settings and online environments with or without minor modifications. Further, we analyze how different offline and online learning paradigms would affect the theoretical foundation and empirical effectiveness of each algorithm on both synthetic and real search data. Our findings provide important insights and guidelines for choosing and deploying ULTR algorithms in practice.

References

  1. Aman Agarwal, Kenta Takatsu, Ivan Zaitsev, and Thorsten Joachims. 2019. A general framework for counterfactual learning-to-rank. In Proceedings of the ACM Conference on Research and Development in Information Retrieval (SIGIR’19).Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Aman Agarwal, Xuanhui Wang, Cheng Li, Michael Bendersky, and Marc Najork. 2019. Addressing trust bias for unbiased learning-to-rank. In Proceedings of the World Wide Web Conference. ACM, 4--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Aman Agarwal, Ivan Zaitsev, Xuanhui Wang, Cheng Li, Marc Najork, and Thorsten Joachims. 2019. Estimating position bias without intrusive interventions. In Proceedings of the 12th ACM International Conference on Web Search and Data Mining. ACM, 474--482.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Qingyao Ai, Keping Bi, Jiafeng Guo, and W. Bruce Croft. 2018. Learning a deep listwise context model for ranking refinement. In Proceedings of the 41th ACM Conference on Research and Development in Information Retrieval (SIGIR’18). ACM.Google ScholarGoogle Scholar
  5. Qingyao Ai, Keping Bi, Cheng Luo, Jiafeng Guo, and W. Bruce Croft. 2018. Unbiased learning to rank with unbiased propensity estimation. In Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 385--394.Google ScholarGoogle Scholar
  6. Qingyao Ai, Jiaxin Mao, Yiqun Liu, and W. Bruce Croft. 2018. Unbiased learning to rank: Theory and practice. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. ACM, 2305--2306.Google ScholarGoogle Scholar
  7. Qingyao Ai, Xuanhui Wang, Sebastian Bruch, Nadav Golbandi, Michael Bendersky, and Marc Najork. 2019. Learning groupwise multivariate scoring functions using deep neural networks. In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval. 85--92.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. 2005. Learning to rank using gradient descent. In Proceedings of the 22nd International Conference on Machine Learning (ICML’05). ACM, 89--96.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Christopher J. C. Burges. 2010. From ranknet to lambdarank to lambdamart: An overview. Learning 11 (2010), 23--581.Google ScholarGoogle Scholar
  10. Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li. 2007. Learning to rank: From pairwise approach to listwise approach. In Proceedings of the 24th International Conference on Machine Learning (ICML’07). ACM, 129--136.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Olivier Cappé and Eric Moulines. 2009. On-line expectation--maximization algorithm for latent data models. J. Roy. Stat. Soc.: Ser. B (Stat. Methodol.) 71, 3 (2009), 593--613.Google ScholarGoogle ScholarCross RefCross Ref
  12. Olivier Chapelle and Yi Chang. 2011. Yahoo! Learning to rank challenge overview. In Yahoo! Learning to Rank Challenge. 1--24.Google ScholarGoogle Scholar
  13. Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. 2009. Expected reciprocal rank for graded relevance. In Proceedings of the 18th ACM Conference on Information and Knowledge Management. ACM, 621--630.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ruey-Cheng Chen, Qingyao Ai, Gaya Jayasinghe, and W. Bruce Croft. 2019. Correcting for recency bias in job recommendation. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2185--2188.Google ScholarGoogle Scholar
  15. Aleksandr Chuklin, Ilya Markov, and Maarten de Rijke. 2015. Click models for web search. Synth. Lect. Info. Concepts Retriev. Serv. 7, 3 (2015), 1--115.Google ScholarGoogle ScholarCross RefCross Ref
  16. Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. 2008. An experimental comparison of click position-bias models. In Proceedings of the 1st International Conference on Web Search and Data Mining (WSDM’08). ACM, 87--94.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Yajuan Duan, Long Jiang, Tao Qin, Ming Zhou, and Heung-Yeung Shum. 2010. An empirical study on learning to rank of tweets. In Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 295--303.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Georges E. Dupret and Benjamin Piwowarski. 2008. A user browsing model to predict search engine click data from past observations. In Proceedings of the 31st ACM Conference on Research and Development in Information Retrieval (SIGIR’08). ACM, 331--338.Google ScholarGoogle Scholar
  19. Artem Grotov and Maarten de Rijke. 2016. Online learning to rank for information retrieval. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 1215--1218.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Fan Guo, Chao Liu, and Yi Min Wang. 2009. Efficient multiple-click models in web search. In Proceedings of the 2nd ACM International Conference on Web Search and Data Mining. ACM, 124--131.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Katja Hofmann, Anne Schuth, Shimon Whiteson, and Maarten De Rijke. 2013. Reusing historical interaction data for faster online learning to rank for IR. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining. 183--192.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Ziniu Hu, Yang Wang, Qu Peng, and Hang Li. 2019. Unbiased LambdaMART: An unbiased pairwise learning-to-rank algorithm. In Proceedings of the World Wide Web Conference. ACM, 2830--2836.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Rolf Jagerman, Harrie Oosterhuis, and Maarten de Rijke. 2019. To model or to intervene: A comparison of counterfactual and online learning to rank from user interactions. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’19). ACM, New York, NY, 15--24. DOI:https://doi.org/10.1145/3331184.3331269Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Trans. Info. Syst. 20, 4 (2002), 422--446.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. In Proceedings of the 8th ACM SIGKDD. ACM, 133--142.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Thorsten Joachims. 2006. Training linear SVMs in linear time. In Proceedings of the 12th ACM SIGKDD. ACM, 217--226.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, and Geri Gay. 2005. Accurately interpreting clickthrough data as implicit feedback. In Proceedings of the 28th Annual ACM Conference on Research and Development in Information Retrieval (SIGIR’05). Acm, 154--161.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, Filip Radlinski, and Geri Gay. 2007. Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Trans. Info. Syst. 25, 2 (2007), 7.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Thorsten Joachims and Adith Swaminathan. 2016. Counterfactual evaluation and learning for search, recommendation and ad placement. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 1199--1201.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel. 2017. Unbiased learning-to-rank with biased feedback. In Proceedings of the 10th ACM International Conference on Web Search and Data Mining (WSDM’17). ACM, 781--789.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Sumeet Katariya, Branislav Kveton, Csaba Szepesvari, and Zheng Wen. 2016. DCM bandits: Learning to rank with multiple clicks. In Proceedings of the International Conference on Machine Learning. 1215--1224.Google ScholarGoogle Scholar
  32. Mark T. Keane and Maeve O’Brien. 2006. Modeling result-list searching in the world wide web: The role of relevance topologies and trust bias. In Proceedings of the Cognitive Science Society, Vol. 28.Google ScholarGoogle Scholar
  33. Branislav Kveton, Csaba Szepesvari, Zheng Wen, and Azin Ashkan. 2015. Cascading bandits: Learning to rank in the cascade model. In Proceedings of the International Conference on Machine Learning. 767--776.Google ScholarGoogle Scholar
  34. Tor Lattimore, Branislav Kveton, Shuai Li, and Csaba Szepesvari. 2018. TopRank: A practical algorithm for online stochastic ranking. In Advances in Neural Information Processing Systems. MIT Press, 3945--3954.Google ScholarGoogle Scholar
  35. Hang Li. 2011. A short introduction to learning to rank. IEICE Trans. Info. Syst. 94, 10 (2011), 1854--1862.Google ScholarGoogle ScholarCross RefCross Ref
  36. Ping Li, Qiang Wu, and Christopher J. Burges. 2008. Mcrank: Learning to rank using multiple classification and gradient boosting. In Advances in Neural Information Processing Systems. MIT Press, 897--904.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Shuai Li, Tor Lattimore, and Csaba Szepesvári. 2018. Online learning to rank with features. Retrieved from https://arXiv:1810.02567.Google ScholarGoogle Scholar
  38. Tie-Yan Liu. 2009. Learning to rank for information retrieval. Found. Trends Info. Retriev. 3, 3 (2009), 225--331.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Fabrizio Silvestri, and Salvatore Trani. 2016. Post-learning optimization of tree ensembles for efficient ranking. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. 949--952.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Jiaxin Mao, Zhumin Chu, Yiqun Liu, Min Zhang, and Shaoping Ma. 2019. Investigating the reliability of click models. In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval. 125--128.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Jiaxin Mao, Cheng Luo, Min Zhang, and Shaoping Ma. 2018. Constructing click models for mobile search. In Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. ACM, 775--784.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Harrie Oosterhuis and Maarten de Rijke. 2018. Differentiable unbiased online learning to rank. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. ACM, 1293--1302.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Harrie Oosterhuis and Maarten de Rijke. 2019. Optimizing ranking models in an online setting. In Proceedings of the European Conference on Information Retrieval. Springer, 382--396.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Joao Palotti. 2016. Learning to Rank for Personalized e-commerce Search at CIKM Cup 2016. Technical Report.Google ScholarGoogle Scholar
  45. Liang Pang, Jun Xu, Qingyao Ai, Yanyan Lan, Xueqi Cheng, and Jirong Wen. 2020. SetRank: Learning a permutation-invariant ranking model for information retrieval. In Proceedings of the 43th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Rama Kumar Pasumarthi, Xuanhui Wang, Michael Bendersky, and Marc Najork. 2019. Self-attentive document interaction networks for permutation equivariant ranking. Retrieved from https://arXiv:1910.09676.Google ScholarGoogle Scholar
  47. Jay M. Ponte and W. Bruce Croft. 1998. A language modeling approach to information retrieval. In Proceedings of the 21st Annual ACM Conference on Research and Development in Information Retrieval (SIGIR’98). ACM, 275--281.Google ScholarGoogle Scholar
  48. Matthew Richardson, Ewa Dominowska, and Robert Ragno. 2007. Predicting clicks: Estimating the click-through rate for new ads. In Proceedings of the 16th International Conference on World Wide Web. ACM, 521--530.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Stephen E. Robertson and Steve Walker. 1994. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Proceedings of the 17th Annual ACM Conference on Research and Development in Information Retrieval (SIGIR’94). Springer-Verlag, New York, 232--241.Google ScholarGoogle Scholar
  50. Anne Schuth, Harrie Oosterhuis, Shimon Whiteson, and Maarten de Rijke. 2016. Multileave gradient descent for fast online learning to rank. In Proceedings of the 9th ACM International Conference on Web Search and Data Mining (WSDM’16). ACM, 457--466.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Anne Schuth, Floor Sietsma, Shimon Whiteson, Damien Lefortier, and Maarten de Rijke. 2014. Multileaved comparisons for fast online evaluation. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. ACM, 71--80.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Mark D. Smucker, James Allan, and Ben Carterette. 2007. A comparison of statistical significance tests for information retrieval evaluation. In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM’07). ACM, 623--632.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Chao Wang, Yiqun Liu, Min Zhang, Shaoping Ma, Meihong Zheng, Jing Qian, and Kuo Zhang. 2013. Incorporating vertical results into search click models. In Proceedings of the 36th ACM Conference on Research and Development in Information Retrieval (SIGIR’13). ACM, 503--512.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Huazheng Wang, Sonwoo Kim, Eric McCord-Snook, Qingyun Wu, and Hongning Wang. 2019. Variance reduction in gradient exploration for online learning to rank. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Huazheng Wang, Ramsey Langley, Sonwoo Kim, Eric McCord-Snook, and Hongning Wang. 2018. Efficient exploration of gradient space for online learning to rank. In Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Xuanhui Wang, Michael Bendersky, Donald Metzler, and Marc Najork. 2016. Learning to rank with selection bias in personal search. In Proceedings of the 39th ACM Conference on Research and Development in Information Retrieval (SIGIR’16). ACM, 115--124.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Xuanhui Wang, Nadav Golbandi, Michael Bendersky, Donald Metzler, and Marc Najork. 2018. Position bias estimation for unbiased learning to rank in personal search. In Proceedings of the 11th ACM International Conference on Web Search and Data Mining (WSDM’18). ACM, New York, NY, 610--618. DOI:https://doi.org/10.1145/3159652.3159732Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Liu Yang, Qingyao Ai, Damiano Spina, Ruey-Cheng Chen, Liang Pang, W. Bruce Croft, Jiafeng Guo, and Falk Scholer. 2016. Beyond factoid QA: Effective methods for non-factoid answer sentence retrieval. In Proceedings of the European Conference on Information Retrieval (ECIR’16). Springer, 115--128.Google ScholarGoogle ScholarCross RefCross Ref
  59. Tao Yang, Shikai Fang, Shibo Li, Yulan Wang, and Qingyao Ai. 2020. Analysis of multivariate scoring functions for automatic unbiased learning to rank. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management. 2277--2280.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Yisong Yue and Thorsten Joachims. 2009. Interactively optimizing information retrieval systems as a dueling bandits problem. In Proceedings of the 26th International Conference on Machine Learning (ICML’09). ACM, 1201--1208.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Chengxiang Zhai and John Lafferty. 2017. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the ACM Conference on Research and Development in Information Retrieval (SIGIR’17), Vol. 51. ACM, 268--276.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Tong Zhao and Irwin King. 2016. Constructing reliable gradient exploration for online learning to rank. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 1643--1652.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Masrour Zoghi, Tomas Tunys, Mohammad Ghavamzadeh, Branislav Kveton, Csaba Szepesvari, and Zheng Wen. 2017. Online learning to rank in stochastic click models. In Proceedings of the International Conference on Machine Learning (ICML’17). 4199--4208.Google ScholarGoogle Scholar

Index Terms

  1. Unbiased Learning to Rank: Online or Offline?

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Information Systems
      ACM Transactions on Information Systems  Volume 39, Issue 2
      April 2021
      391 pages
      ISSN:1046-8188
      EISSN:1558-2868
      DOI:10.1145/3444752
      Issue’s Table of Contents

      Copyright © 2021 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 17 February 2021
      • Accepted: 1 November 2020
      • Revised: 1 October 2020
      • Received: 1 July 2020
      Published in tois Volume 39, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format