skip to main content
10.1145/2736277.2741104acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Gathering Additional Feedback on Search Results by Multi-Armed Bandits with Respect to Production Ranking

Published:18 May 2015Publication History

ABSTRACT

Given a repeatedly issued query and a document with a not-yet-confirmed potential to satisfy the users' needs, a search system should place this document on a high position in order to gather user feedback and obtain a more confident estimate of the document utility. On the other hand, the main objective of the search system is to maximize expected user satisfaction over a rather long period, what requires showing more relevant documents on average. The state-of-the-art approaches to solving this exploration-exploitation dilemma rely on strongly simplified settings making these approaches infeasible in practice. We improve the most flexible and pragmatic of them to handle some actual practical issues. The first one is utilizing prior information about queries and documents, the second is combining bandit-based learning approaches with a default production ranking algorithm. We show experimentally that our framework enables to significantly improve the ranking of a leading commercial search engine.

References

  1. A. Agarwal, D. Hsu, S. Kale, J. Langford, L. Li, and R. E. Schapire. Taming the monster: A fast and simple algorithm for contextual bandits. arXiv preprint arXiv:1402.0555, 2014.Google ScholarGoogle Scholar
  2. P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2--3):235--256, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. N. Bennett, F. Radlinski, R. W. White, and E. Yilmaz. Inferring and using location metadata to personalize web search. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '11, pages 135--144, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. N. Bennett, R. W. White, W. Chu, S. T. Dumais, P. Bailey, F. Borisyuk, and X. Cui. Modeling the impact of short- and long-term behavior on search personalization. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '12, pages 185--194, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Best and N. Chakravarti. Active set algorithms for isotonic regression; a unifying framework. Mathematical Programming, 47(1--3):425--439, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. ICML'05, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. O. Chapelle, Y. Chang, and T.-Y. Liu. The yahoo! learning to rank challenge. http://learningtorankchallenge.yahoo.com, 2010.Google ScholarGoogle Scholar
  8. O. Chapelle, D. Metlzer, Y. Zhang, and P. Grinspan. Expected reciprocal rank for graded relevance. CIKM '09, pages 621--630, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Chuklin, P. Serdyukov, and M. De Rijke. Click model-based information retrieval metrics. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, pages 493--502. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. K. Collins-Thompson, P. N. Bennett, R. W. White, S. de la Chica, and D. Sontag. Personalizing web search results by reading level. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM '11, pages 403--412, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. N. Craswell, O. Zoeter, M. Taylor, and B. Ramsey. An experimental comparison of click position-bias models. In Proceedings of the 2008 International Conference on Web Search and Data Mining, WSDM '08, pages 87--94, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. F. Diaz. Integration of news content into web results. In Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 182--191. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. F. Diaz and J. Arguello. Adaptation of offline vertical selection predictions in the presence of user feedback. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 323--330. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. H. Friedman. Greedy function approximation: a dient boosting machine. Annals of Statistics, pages 1189--1232, 2001.Google ScholarGoogle Scholar
  15. F. Guo, C. Liu, and Y. M. Wang. Efficient multiple-click models in web search. WSDM '09, pages 124--131, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. He and I. Ounis. Query performance prediction. Information Systems, 31(7):585--594, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. K. Hofmann, S. Whiteson, and M. Rijke. Balancing exploration and exploitation in listwise and pairwise online learning to rank for information retrieval. Information Retrieval, 16(1):63--90, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. L. Jie, S. Lamkhede, R. Sapra, E. Hsu, H. Song, and Y. Chang. A unified search federation system based on online user feedback. KDD '13, pages 1195--1203, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. E. Kaufmann, O. Cappe, and A. Garivier. On bayesian upper confidence bounds for bandit problems. In N. D. Lawrence and M. A. Girolami, editors, AISTATS-12, volume 22, pages 592--600, 2012.Google ScholarGoogle Scholar
  20. E. Kaufmann, N. Korda, and R. Munos. Thompson sampling: An asymptotically optimal finite-time analysis. In N. Bshouty, G. Stoltz, N. Vayatis, and T. Zeugmann, editors, Algorithmic Learning Theory, volume 7568 of Lecture Notes in Computer Science, pages 199--213. Springer Berlin Heidelberg, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. Li, S. Chen, J. Kleban, and A. Gupta. Counterfactual estimation and optimization of click metrics for search engines. CoRR, abs/1403.1891, 2014.Google ScholarGoogle Scholar
  22. L. Li, W. Chu, J. Langford, and R. E. Schapire. A contextual-bandit approach to personalized news article recommendation. WWW '10, pages 661--670, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. L. Li, W. Chu, J. Langford, and X. Wang. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, WSDM '11, pages 297--306, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. T.-Y. Liu. Learning to rank for information retrieval. Found. Trends Inf. Retr., 3(3):225--331, Mar. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T. Moon, W. Chu, L. Li, Z. Zheng, and Y. Chang. An online learning framework for refining recency search results with user click feedback. ACM Trans. Inf. Syst., 30(4):20:1--20:28, Nov. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. A. Nelder and R. Mead. A simplex method for function minimization. The computer journal, 7(4):308--313, 1965.Google ScholarGoogle Scholar
  27. F. Radlinski, R. Kleinberg, and T. Joachims. Learning diverse rankings with multi-armed bandits. ICML '08, pages 784--791, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. K. Raman, T. Joachims, P. Shivaswamy, and T. Schnabel. Stable coactive learning via perturbation. ICML, 2013.Google ScholarGoogle Scholar
  29. M. Shokouhi and K. Radinsky. Time-sensitive query auto-completion. SIGIR '12, pages 601--610, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Slivkins, F. Radlinski, and S. Gollapudi. Ranked bandits in metric spaces: Learning diverse rankings over large document collections. J. Mach. Learn. Res., 14(1):399--436, Feb. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. Sloan and J. Wang. Iterative expectation for multi period information retrieval. In WSDM Workshop on Web Search Click Data, 2013.Google ScholarGoogle Scholar
  32. L. Tang, R. Rosales, A. Singh, and D. Agarwal. Automatic ad format selection via contextual bandits. In Proceedings of the 22nd ACM international conference on Conference on information and knowledge management, pages 1587--1594. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. H. P. Vanchinathan, I. Nikolic, F. De Bona, and A. Krause. Explore-exploit in top-n recommender systems via gaussian processes. In Proceedings of the 8th ACM Conference on Recommender systems, pages 225--232. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. C. J. C. H. Watkins. Learning from delayed rewards. PhD thesis, University of Cambridge, 1989.Google ScholarGoogle Scholar
  35. Y. Yue and T. Joachims. Interactively optimizing information retrieval systems as a dueling bandits problem. ICML '09, pages 1201--1208, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Y. Zhou and W. B. Croft. Query performance prediction in web search environments. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 543--550. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. J. Zhu, J. Wang, I. J. Cox, and M. J. Taylor. Risky business: Modeling and exploiting uncertainty in information retrieval. In Proceedings of the 32Nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '09, pages 99--106, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Gathering Additional Feedback on Search Results by Multi-Armed Bandits with Respect to Production Ranking

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        WWW '15: Proceedings of the 24th International Conference on World Wide Web
        May 2015
        1460 pages
        ISBN:9781450334693

        Copyright © 2015 Copyright is held by the International World Wide Web Conference Committee (IW3C2)

        Publisher

        International World Wide Web Conferences Steering Committee

        Republic and Canton of Geneva, Switzerland

        Publication History

        • Published: 18 May 2015

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        WWW '15 Paper Acceptance Rate131of929submissions,14%Overall Acceptance Rate1,899of8,196submissions,23%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader