ABSTRACT
Given a repeatedly issued query and a document with a not-yet-confirmed potential to satisfy the users' needs, a search system should place this document on a high position in order to gather user feedback and obtain a more confident estimate of the document utility. On the other hand, the main objective of the search system is to maximize expected user satisfaction over a rather long period, what requires showing more relevant documents on average. The state-of-the-art approaches to solving this exploration-exploitation dilemma rely on strongly simplified settings making these approaches infeasible in practice. We improve the most flexible and pragmatic of them to handle some actual practical issues. The first one is utilizing prior information about queries and documents, the second is combining bandit-based learning approaches with a default production ranking algorithm. We show experimentally that our framework enables to significantly improve the ranking of a leading commercial search engine.
- A. Agarwal, D. Hsu, S. Kale, J. Langford, L. Li, and R. E. Schapire. Taming the monster: A fast and simple algorithm for contextual bandits. arXiv preprint arXiv:1402.0555, 2014.Google Scholar
- P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2--3):235--256, 2002. Google ScholarDigital Library
- P. N. Bennett, F. Radlinski, R. W. White, and E. Yilmaz. Inferring and using location metadata to personalize web search. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '11, pages 135--144, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- P. N. Bennett, R. W. White, W. Chu, S. T. Dumais, P. Bailey, F. Borisyuk, and X. Cui. Modeling the impact of short- and long-term behavior on search personalization. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '12, pages 185--194, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- M. Best and N. Chakravarti. Active set algorithms for isotonic regression; a unifying framework. Mathematical Programming, 47(1--3):425--439, 1990. Google ScholarDigital Library
- C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. ICML'05, 2005. Google ScholarDigital Library
- O. Chapelle, Y. Chang, and T.-Y. Liu. The yahoo! learning to rank challenge. http://learningtorankchallenge.yahoo.com, 2010.Google Scholar
- O. Chapelle, D. Metlzer, Y. Zhang, and P. Grinspan. Expected reciprocal rank for graded relevance. CIKM '09, pages 621--630, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- A. Chuklin, P. Serdyukov, and M. De Rijke. Click model-based information retrieval metrics. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, pages 493--502. ACM, 2013. Google ScholarDigital Library
- K. Collins-Thompson, P. N. Bennett, R. W. White, S. de la Chica, and D. Sontag. Personalizing web search results by reading level. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM '11, pages 403--412, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- N. Craswell, O. Zoeter, M. Taylor, and B. Ramsey. An experimental comparison of click position-bias models. In Proceedings of the 2008 International Conference on Web Search and Data Mining, WSDM '08, pages 87--94, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- F. Diaz. Integration of news content into web results. In Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 182--191. ACM, 2009. Google ScholarDigital Library
- F. Diaz and J. Arguello. Adaptation of offline vertical selection predictions in the presence of user feedback. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 323--330. ACM, 2009. Google ScholarDigital Library
- J. H. Friedman. Greedy function approximation: a dient boosting machine. Annals of Statistics, pages 1189--1232, 2001.Google Scholar
- F. Guo, C. Liu, and Y. M. Wang. Efficient multiple-click models in web search. WSDM '09, pages 124--131, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- B. He and I. Ounis. Query performance prediction. Information Systems, 31(7):585--594, 2006. Google ScholarDigital Library
- K. Hofmann, S. Whiteson, and M. Rijke. Balancing exploration and exploitation in listwise and pairwise online learning to rank for information retrieval. Information Retrieval, 16(1):63--90, 2013. Google ScholarDigital Library
- L. Jie, S. Lamkhede, R. Sapra, E. Hsu, H. Song, and Y. Chang. A unified search federation system based on online user feedback. KDD '13, pages 1195--1203, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- E. Kaufmann, O. Cappe, and A. Garivier. On bayesian upper confidence bounds for bandit problems. In N. D. Lawrence and M. A. Girolami, editors, AISTATS-12, volume 22, pages 592--600, 2012.Google Scholar
- E. Kaufmann, N. Korda, and R. Munos. Thompson sampling: An asymptotically optimal finite-time analysis. In N. Bshouty, G. Stoltz, N. Vayatis, and T. Zeugmann, editors, Algorithmic Learning Theory, volume 7568 of Lecture Notes in Computer Science, pages 199--213. Springer Berlin Heidelberg, 2012. Google ScholarDigital Library
- L. Li, S. Chen, J. Kleban, and A. Gupta. Counterfactual estimation and optimization of click metrics for search engines. CoRR, abs/1403.1891, 2014.Google Scholar
- L. Li, W. Chu, J. Langford, and R. E. Schapire. A contextual-bandit approach to personalized news article recommendation. WWW '10, pages 661--670, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- L. Li, W. Chu, J. Langford, and X. Wang. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, WSDM '11, pages 297--306, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- T.-Y. Liu. Learning to rank for information retrieval. Found. Trends Inf. Retr., 3(3):225--331, Mar. 2009. Google ScholarDigital Library
- T. Moon, W. Chu, L. Li, Z. Zheng, and Y. Chang. An online learning framework for refining recency search results with user click feedback. ACM Trans. Inf. Syst., 30(4):20:1--20:28, Nov. 2012. Google ScholarDigital Library
- J. A. Nelder and R. Mead. A simplex method for function minimization. The computer journal, 7(4):308--313, 1965.Google Scholar
- F. Radlinski, R. Kleinberg, and T. Joachims. Learning diverse rankings with multi-armed bandits. ICML '08, pages 784--791, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- K. Raman, T. Joachims, P. Shivaswamy, and T. Schnabel. Stable coactive learning via perturbation. ICML, 2013.Google Scholar
- M. Shokouhi and K. Radinsky. Time-sensitive query auto-completion. SIGIR '12, pages 601--610, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- A. Slivkins, F. Radlinski, and S. Gollapudi. Ranked bandits in metric spaces: Learning diverse rankings over large document collections. J. Mach. Learn. Res., 14(1):399--436, Feb. 2013. Google ScholarDigital Library
- M. Sloan and J. Wang. Iterative expectation for multi period information retrieval. In WSDM Workshop on Web Search Click Data, 2013.Google Scholar
- L. Tang, R. Rosales, A. Singh, and D. Agarwal. Automatic ad format selection via contextual bandits. In Proceedings of the 22nd ACM international conference on Conference on information and knowledge management, pages 1587--1594. ACM, 2013. Google ScholarDigital Library
- H. P. Vanchinathan, I. Nikolic, F. De Bona, and A. Krause. Explore-exploit in top-n recommender systems via gaussian processes. In Proceedings of the 8th ACM Conference on Recommender systems, pages 225--232. ACM, 2014. Google ScholarDigital Library
- C. J. C. H. Watkins. Learning from delayed rewards. PhD thesis, University of Cambridge, 1989.Google Scholar
- Y. Yue and T. Joachims. Interactively optimizing information retrieval systems as a dueling bandits problem. ICML '09, pages 1201--1208, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- Y. Zhou and W. B. Croft. Query performance prediction in web search environments. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 543--550. ACM, 2007. Google ScholarDigital Library
- J. Zhu, J. Wang, I. J. Cox, and M. J. Taylor. Risky business: Modeling and exploiting uncertainty in information retrieval. In Proceedings of the 32Nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '09, pages 99--106, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
Index Terms
- Gathering Additional Feedback on Search Results by Multi-Armed Bandits with Respect to Production Ranking
Recommendations
Learning diverse rankings with multi-armed bandits
ICML '08: Proceedings of the 25th international conference on Machine learningAlgorithms for learning to rank Web documents usually assume a document's relevance is independent of other documents. This leads to learned ranking functions that produce rankings with redundant results. In contrast, user studies have shown that ...
Re-ranking search results using query logs
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge managementThis work addresses two common problems in search, frequently occurring with underspecified user queries: the top-ranked results for such queries may not contain documents relevant to the user's search intent, and fresh and relevant pages may not get ...
Selective Exploration of Commercial Documents in Web Search
WWW '16 Companion: Proceedings of the 25th International Conference Companion on World Wide WebImplicit user feedback is known to be a strong signal of user preferences in web search. Hence, solving the exploration-exploitation dilemma [5] became an important direction of improvement of ranking algorithms in the last years. In this poster, in the ...
Comments