research-article

Gathering Additional Feedback on Search Results by Multi-Armed Bandits with Respect to Production Ranking

Authors:
Aleksandr Vorobev

Yandex LLC, Moscow, Russian Fed.

Yandex LLC, Moscow, Russian Fed.
View Profile

,
Damien Lefortier

Yandex LLC, University of Amsterdam, Moscow, Russian Fed.

Yandex LLC, University of Amsterdam, Moscow, Russian Fed.
View Profile

,
Gleb Gusev

Yandex LLC, Moscow, Russian Fed.

Yandex LLC, Moscow, Russian Fed.
View Profile

,
Pavel Serdyukov

Yandex LLC, Moscow, Russian Fed.

Yandex LLC, Moscow, Russian Fed.
View Profile

WWW '15: Proceedings of the 24th International Conference on World Wide WebMay 2015Pages 1177–1187https://doi.org/10.1145/2736277.2741104

Published:18 May 2015Publication History

WWW '15: Proceedings of the 24th International Conference on World Wide Web

Pages 1177–1187

ABSTRACT

Given a repeatedly issued query and a document with a not-yet-confirmed potential to satisfy the users' needs, a search system should place this document on a high position in order to gather user feedback and obtain a more confident estimate of the document utility. On the other hand, the main objective of the search system is to maximize expected user satisfaction over a rather long period, what requires showing more relevant documents on average. The state-of-the-art approaches to solving this exploration-exploitation dilemma rely on strongly simplified settings making these approaches infeasible in practice. We improve the most flexible and pragmatic of them to handle some actual practical issues. The first one is utilizing prior information about queries and documents, the second is combining bandit-based learning approaches with a default production ranking algorithm. We show experimentally that our framework enables to significantly improve the ranking of a leading commercial search engine.

References

A. Agarwal, D. Hsu, S. Kale, J. Langford, L. Li, and R. E. Schapire. Taming the monster: A fast and simple algorithm for contextual bandits. arXiv preprint arXiv:1402.0555, 2014.Google Scholar
P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2--3):235--256, 2002. Google ScholarDigital Library
P. N. Bennett, F. Radlinski, R. W. White, and E. Yilmaz. Inferring and using location metadata to personalize web search. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '11, pages 135--144, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
P. N. Bennett, R. W. White, W. Chu, S. T. Dumais, P. Bailey, F. Borisyuk, and X. Cui. Modeling the impact of short- and long-term behavior on search personalization. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '12, pages 185--194, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
M. Best and N. Chakravarti. Active set algorithms for isotonic regression; a unifying framework. Mathematical Programming, 47(1--3):425--439, 1990. Google ScholarDigital Library
C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. ICML'05, 2005. Google ScholarDigital Library
O. Chapelle, Y. Chang, and T.-Y. Liu. The yahoo! learning to rank challenge. http://learningtorankchallenge.yahoo.com, 2010.Google Scholar
O. Chapelle, D. Metlzer, Y. Zhang, and P. Grinspan. Expected reciprocal rank for graded relevance. CIKM '09, pages 621--630, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
A. Chuklin, P. Serdyukov, and M. De Rijke. Click model-based information retrieval metrics. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, pages 493--502. ACM, 2013. Google ScholarDigital Library
K. Collins-Thompson, P. N. Bennett, R. W. White, S. de la Chica, and D. Sontag. Personalizing web search results by reading level. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM '11, pages 403--412, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
N. Craswell, O. Zoeter, M. Taylor, and B. Ramsey. An experimental comparison of click position-bias models. In Proceedings of the 2008 International Conference on Web Search and Data Mining, WSDM '08, pages 87--94, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
F. Diaz. Integration of news content into web results. In Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 182--191. ACM, 2009. Google ScholarDigital Library
F. Diaz and J. Arguello. Adaptation of offline vertical selection predictions in the presence of user feedback. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 323--330. ACM, 2009. Google ScholarDigital Library
J. H. Friedman. Greedy function approximation: a dient boosting machine. Annals of Statistics, pages 1189--1232, 2001.Google Scholar
F. Guo, C. Liu, and Y. M. Wang. Efficient multiple-click models in web search. WSDM '09, pages 124--131, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
B. He and I. Ounis. Query performance prediction. Information Systems, 31(7):585--594, 2006. Google ScholarDigital Library
K. Hofmann, S. Whiteson, and M. Rijke. Balancing exploration and exploitation in listwise and pairwise online learning to rank for information retrieval. Information Retrieval, 16(1):63--90, 2013. Google ScholarDigital Library
L. Jie, S. Lamkhede, R. Sapra, E. Hsu, H. Song, and Y. Chang. A unified search federation system based on online user feedback. KDD '13, pages 1195--1203, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
E. Kaufmann, O. Cappe, and A. Garivier. On bayesian upper confidence bounds for bandit problems. In N. D. Lawrence and M. A. Girolami, editors, AISTATS-12, volume 22, pages 592--600, 2012.Google Scholar
E. Kaufmann, N. Korda, and R. Munos. Thompson sampling: An asymptotically optimal finite-time analysis. In N. Bshouty, G. Stoltz, N. Vayatis, and T. Zeugmann, editors, Algorithmic Learning Theory, volume 7568 of Lecture Notes in Computer Science, pages 199--213. Springer Berlin Heidelberg, 2012. Google ScholarDigital Library
L. Li, S. Chen, J. Kleban, and A. Gupta. Counterfactual estimation and optimization of click metrics for search engines. CoRR, abs/1403.1891, 2014.Google Scholar
L. Li, W. Chu, J. Langford, and R. E. Schapire. A contextual-bandit approach to personalized news article recommendation. WWW '10, pages 661--670, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
L. Li, W. Chu, J. Langford, and X. Wang. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, WSDM '11, pages 297--306, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
T.-Y. Liu. Learning to rank for information retrieval. Found. Trends Inf. Retr., 3(3):225--331, Mar. 2009. Google ScholarDigital Library
T. Moon, W. Chu, L. Li, Z. Zheng, and Y. Chang. An online learning framework for refining recency search results with user click feedback. ACM Trans. Inf. Syst., 30(4):20:1--20:28, Nov. 2012. Google ScholarDigital Library
J. A. Nelder and R. Mead. A simplex method for function minimization. The computer journal, 7(4):308--313, 1965.Google Scholar
F. Radlinski, R. Kleinberg, and T. Joachims. Learning diverse rankings with multi-armed bandits. ICML '08, pages 784--791, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
K. Raman, T. Joachims, P. Shivaswamy, and T. Schnabel. Stable coactive learning via perturbation. ICML, 2013.Google Scholar
M. Shokouhi and K. Radinsky. Time-sensitive query auto-completion. SIGIR '12, pages 601--610, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
A. Slivkins, F. Radlinski, and S. Gollapudi. Ranked bandits in metric spaces: Learning diverse rankings over large document collections. J. Mach. Learn. Res., 14(1):399--436, Feb. 2013. Google ScholarDigital Library
M. Sloan and J. Wang. Iterative expectation for multi period information retrieval. In WSDM Workshop on Web Search Click Data, 2013.Google Scholar
L. Tang, R. Rosales, A. Singh, and D. Agarwal. Automatic ad format selection via contextual bandits. In Proceedings of the 22nd ACM international conference on Conference on information and knowledge management, pages 1587--1594. ACM, 2013. Google ScholarDigital Library
H. P. Vanchinathan, I. Nikolic, F. De Bona, and A. Krause. Explore-exploit in top-n recommender systems via gaussian processes. In Proceedings of the 8th ACM Conference on Recommender systems, pages 225--232. ACM, 2014. Google ScholarDigital Library
C. J. C. H. Watkins. Learning from delayed rewards. PhD thesis, University of Cambridge, 1989.Google Scholar
Y. Yue and T. Joachims. Interactively optimizing information retrieval systems as a dueling bandits problem. ICML '09, pages 1201--1208, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
Y. Zhou and W. B. Croft. Query performance prediction in web search environments. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 543--550. ACM, 2007. Google ScholarDigital Library
J. Zhu, J. Wang, I. J. Cox, and M. J. Taylor. Risky business: Modeling and exploiting uncertainty in information retrieval. In Proceedings of the 32Nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '09, pages 99--106, New York, NY, USA, 2009. ACM. Google ScholarDigital Library

Index Terms

Gathering Additional Feedback on Search Results by Multi-Armed Bandits with Respect to Production Ranking
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information retrieval

Recommendations

Learning diverse rankings with multi-armed bandits
ICML '08: Proceedings of the 25th international conference on Machine learning

Algorithms for learning to rank Web documents usually assume a document's relevance is independent of other documents. This leads to learned ranking functions that produce rankings with redundant results. In contrast, user studies have shown that ...
Read More
Re-ranking search results using query logs
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management

This work addresses two common problems in search, frequently occurring with underspecified user queries: the top-ranked results for such queries may not contain documents relevant to the user's search intent, and fresh and relevant pages may not get ...
Read More
Selective Exploration of Commercial Documents in Web Search
WWW '16 Companion: Proceedings of the 25th International Conference Companion on World Wide Web

Implicit user feedback is known to be a strong signal of user preferences in web search. Hence, solving the exploration-exploitation dilemma [5] became an important direction of improvement of ranking algorithms in the last years. In this poster, in the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '15: Proceedings of the 24th International Conference on World Wide Web
May 2015
1460 pages
ISBN:9781450334693
General Chairs:
Aldo Gangemi
National Research Council, Italy & Paris 13 University-CNRS, France
,
Stefano Leonardi
Sapienza University of Rome, Italy
,
Alessandro Panconesi
Sapienza University of Rome, Italy
Copyright © 2015 Copyright is held by the International World Wide Web Conference Committee (IW3C2)
Sponsors
In-Cooperation
Publisher
International World Wide Web Conferences Steering Committee
Republic and Canton of Geneva, Switzerland
Publication History
- Published: 18 May 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
implicit feedback
isotonic regression
multi-armed bandit
reinforcement learning
web search ranking
Qualifiers
- research-article
Conference

Acceptance Rates
WWW '15 Paper Acceptance Rate131of929submissions,14%Overall Acceptance Rate1,899of8,196submissions,23%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 14
  Total Citations
  View Citations
- 207
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Gathering Additional Feedback on Search Results by Multi-Armed Bandits with Respect to Production Ranking

WWW '15: Proceedings of the 24th International Conference on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Learning diverse rankings with multi-armed bandits

Re-ranking search results using query logs

Selective Exploration of Commercial Documents in Web Search