ABSTRACT
Online learning to rank holds great promise for learning personalized search result rankings. First algorithms have been proposed, namely absolute feedback approaches, based on contextual bandits learning; and relative feedback approaches, based on gradient methods and inferred preferences between complete result rankings. Both types of approaches have shown promise, but they have not previously been compared to each other. It is therefore unclear which type of approach is the most suitable for which online learning to rank problems. In this work we present the first empirical comparison of absolute and relative online learning to rank approaches.
- K. Hofmann, A. Schuth, S. Whiteson, and M. de Rijke. Reusing historical interaction data for faster online learning to rank for IR. In WSDM '13, pages 549--558, 2013. Google ScholarDigital Library
- T. Joachims. Evaluating retrieval performance using clickthrough data. Text Mining, pages 79--96, 2003.Google Scholar
- J. Langford and T. Zhang. The epoch-greedy algorithm for multi-armed bandits with side information. In Advances in neural information processing systems, pages 817--824, 2008.Google ScholarDigital Library
- L. Li, W. Chu, J. Langford, and R. E. Schapire. A contextual-bandit approach to personalized news article recommendation. In WWW '10, pages 661--670, 2010. Google ScholarDigital Library
- T. Qin, T.-Y. Liu, J. Xu, and H. Li. Letor: A benchmark collection for research on learning to rank for information retrieval. Information Retrieval, 13 (4): 346--374, 2010. Google ScholarDigital Library
- F. Radlinski, M. Kurup, and T. Joachims. How does clickthrough data reflect retrieval quality? In CIKM '08, pages 43--52, 2008. Google ScholarDigital Library
- A. Schuth, K. Hofmann, S. Whiteson, and M. de Rijke. Lerot: An online learning to rank framework. In LivingLab '13, pages 23--26, 2013. Google ScholarDigital Library
Index Terms
- Online Learning to Rank: Absolute vs. Relative
Recommendations
Online learning to rank with top-k feedback
We consider two settings of online learning to rank where feedback is restricted to top ranked items. The problem is cast as an online game between a learner and sequence of users, over T rounds. In both settings, the learners objective is to present ...
Multileave Gradient Descent for Fast Online Learning to Rank
WSDM '16: Proceedings of the Ninth ACM International Conference on Web Search and Data MiningModern search systems are based on dozens or even hundreds of ranking features. The dueling bandit gradient descent (DBGD) algorithm has been shown to effectively learn combinations of these features solely from user interactions. DBGD explores the ...
Unbiased Learning to Rank: Online or Offline?
How to obtain an unbiased ranking model by learning to rank with biased user feedback is an important research question for IR. Existing work on unbiased learning to rank (ULTR) can be broadly categorized into two groups—the studies on unbiased learning ...
Comments