skip to main content
10.1145/3132847.3132850acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Beyond Success Rate: Utility as a Search Quality Metric for Online Experiments

Published:06 November 2017Publication History

ABSTRACT

User satisfaction metrics are an integral part of search engine development as they help system developers to understand and evaluate the quality of the user experience. Research to date has mostly focused on predicting success or frustration as a proxy for satisfaction. However, users' search experience is more complex than merely being either successful or not. As such, using success rate as a measure of satisfaction can be limiting. In this work, we propose the use of utility as a measure of searcher satisfaction. This concept represents the fulfillment a user receives from con-suming a service and explains how users aim to gain optimal overall satisfaction. Our utility metrics measure the user satisfac-tion by aggregating all their interaction with the search engine. These interactions are represented as a timeline of actions and their dwelltimes, where each action is classified as having a posi-tive or negative effect on the user. We examine sessions mined from Bing logs, with multi-point scale assessment of searcher satisfaction and show that utility is a better proxy for satisfaction compared to success. Leveraging that data, we design metrics of searcher satisfaction that assess the overall utility accumulated by a user during her search session. We use real user traffic from millions of users in an A/B setting to compare utility metrics to success rate metrics. We show that utility is a better metric for evaluating searcher satisfaction with the search engine, and a more sensitive and accurate metric when compared to predicting success. These metrics are currently adopted as the top-level met-ric for evaluating the thousands of A/B experiments that are run on Bing each year.

References

  1. Ageev, M et al. 2011. Find it if you can: a game for modeling different types of web search success using interaction data. In SIGIR '11: 345--354. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. L. Azzopardi. 2014. Modelling interaction with economic models of search. In SIGIR'14: 3--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. T. Crook et al. Seven pitfalls to avoid when running controlled experiments on the web. In KDD'09, 1105--1114, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. Dmitriev, and X.Wu. 2015. Measuring Metrics. In CIKM'16 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Drutsa, A. Ufliand and G. Gussev. Practical Aspects of Sensitivity in Online Experimentation with User Engagement Metrics. In CIKM'15. 2015 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Drutsa, G. Gusev, and P. Serdyukov. Engagement periodicity in search en-gine usage: analysis and its application to search quality evaluation. In WSDM'15, 27--36, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. H. Feild et al. Predicting searcher frustration. In SIGIR'10: 34--41, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Fox et al. Evaluating implicit measures to improve web search. ACM TOIS, 23(2): 147--168, 2005. 9}A. Hassan. 2012. A semi-supervised approach to modeling web search satisfac-tion. In SIGIR'12: 275--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Hassan et al. 2013. Beyond clicks: Query reformulation as a predictor of search satisfaction. In CIKM'13: 2019--2028. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Hassan et al. 2010. Beyond DCG: User behavior as a predictor of a successful search. In WSDM'10: 221--230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. B. Huffman and M. Hochster. 2007. How well does result relevance predict session satisfaction? In SIGIR'07: 567--574. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K. Järvelin and J. Kekäläinen. 2000. IR evaluation methods for retrieving highly relevant documents. In SIGIR'00: 41--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. K. Järvelin et al. 2008. Discounted cumulated gain based evaluation of multiple-query IR sessions. In ECIR'08: 4--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Jiang et al. 2015. Understanding and Predicting Graded Search Satisfaction. In WSDM'15: 57--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. E. Kanoulas et al. 2011. Evaluating multi-query sessions. In SIGIR'11: 1053--1062. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Kelly. 2009. Methods for evaluating interactive information retrieval systems with users. Foundation and Trends in Information Retrieval, 3(1--2): 1--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Y. Kim et al. 2014. Modeling dwell time to predict click-level satisfaction. In WSDM'14: 193--202. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Kohavi et al. 2009. Controlled experiments on the web: survey and practical guide. Data Mining and Knowledge Discovery. 18(1): 140--181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Kohavi et al.. Trustworthy online controlled experiments: Five puzzling outcomes explained. In KDD'12, 786--794, 2012. {15} Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Kohavi et al. 2013. Online controlled experiments at large scale. In KDD'13, 1168--1176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Kohavi et al. Seven rules of thumb for web site experimenters. In KDD'14, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. W. Machmouchi and G. Buscher. 2016. Principles for the design of online A/B experiments. In SIGIR'16: 589--590 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. G. Mankiw. 2010. Principles of Macroeconomics. South-Western Cengage Learning.Google ScholarGoogle Scholar
  24. A. Moffat and J. Zobel. 2008. Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst. 27(1). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. L. T. Su. 2003. A comprehensive and systematic model of user evaluation of Web search engines. In JASIST, 54(13): 1175--1192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. Tang, et al. 2010. Overlapping Experiment Infrastructure: More, Better, Faster Experimentation. In KDD'10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. H. Wang et al. 2014. Modeling action-level satisfaction for search task satisfac-tion prediction. In SIGIR'14: 123--132. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Beyond Success Rate: Utility as a Search Quality Metric for Online Experiments

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management
      November 2017
      2604 pages
      ISBN:9781450349185
      DOI:10.1145/3132847

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 6 November 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CIKM '17 Paper Acceptance Rate171of855submissions,20%Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader