research-article

Beyond Success Rate: Utility as a Search Quality Metric for Online Experiments

Authors:
Widad Machmouchi

Microsoft, Bellevue, WA, USA

Microsoft, Bellevue, WA, USA
View Profile

,
Ahmed Hassan Awadallah

Microsoft, Redmond, WA, USA

Microsoft, Redmond, WA, USA
View Profile

,
Imed Zitouni

Microsoft, Bellevue, WA, USA

Microsoft, Bellevue, WA, USA
View Profile

,
Georg Buscher

Facebook, Menlo Park, CA, USA

Facebook, Menlo Park, CA, USA
View Profile

CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge ManagementNovember 2017Pages 757–765https://doi.org/10.1145/3132847.3132850

Published:06 November 2017Publication History

CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

Pages 757–765

ABSTRACT

User satisfaction metrics are an integral part of search engine development as they help system developers to understand and evaluate the quality of the user experience. Research to date has mostly focused on predicting success or frustration as a proxy for satisfaction. However, users' search experience is more complex than merely being either successful or not. As such, using success rate as a measure of satisfaction can be limiting. In this work, we propose the use of utility as a measure of searcher satisfaction. This concept represents the fulfillment a user receives from con-suming a service and explains how users aim to gain optimal overall satisfaction. Our utility metrics measure the user satisfac-tion by aggregating all their interaction with the search engine. These interactions are represented as a timeline of actions and their dwelltimes, where each action is classified as having a posi-tive or negative effect on the user. We examine sessions mined from Bing logs, with multi-point scale assessment of searcher satisfaction and show that utility is a better proxy for satisfaction compared to success. Leveraging that data, we design metrics of searcher satisfaction that assess the overall utility accumulated by a user during her search session. We use real user traffic from millions of users in an A/B setting to compare utility metrics to success rate metrics. We show that utility is a better metric for evaluating searcher satisfaction with the search engine, and a more sensitive and accurate metric when compared to predicting success. These metrics are currently adopted as the top-level met-ric for evaluating the thousands of A/B experiments that are run on Bing each year.

References

Ageev, M et al. 2011. Find it if you can: a game for modeling different types of web search success using interaction data. In SIGIR '11: 345--354. Google ScholarDigital Library
L. Azzopardi. 2014. Modelling interaction with economic models of search. In SIGIR'14: 3--12. Google ScholarDigital Library
T. Crook et al. Seven pitfalls to avoid when running controlled experiments on the web. In KDD'09, 1105--1114, 2009. Google ScholarDigital Library
P. Dmitriev, and X.Wu. 2015. Measuring Metrics. In CIKM'16 Google ScholarDigital Library
A. Drutsa, A. Ufliand and G. Gussev. Practical Aspects of Sensitivity in Online Experimentation with User Engagement Metrics. In CIKM'15. 2015 Google ScholarDigital Library
A. Drutsa, G. Gusev, and P. Serdyukov. Engagement periodicity in search en-gine usage: analysis and its application to search quality evaluation. In WSDM'15, 27--36, 2015. Google ScholarDigital Library
H. Feild et al. Predicting searcher frustration. In SIGIR'10: 34--41, 2010. Google ScholarDigital Library
S. Fox et al. Evaluating implicit measures to improve web search. ACM TOIS, 23(2): 147--168, 2005. 9}A. Hassan. 2012. A semi-supervised approach to modeling web search satisfac-tion. In SIGIR'12: 275--284. Google ScholarDigital Library
A. Hassan et al. 2013. Beyond clicks: Query reformulation as a predictor of search satisfaction. In CIKM'13: 2019--2028. Google ScholarDigital Library
A. Hassan et al. 2010. Beyond DCG: User behavior as a predictor of a successful search. In WSDM'10: 221--230. Google ScholarDigital Library
S. B. Huffman and M. Hochster. 2007. How well does result relevance predict session satisfaction? In SIGIR'07: 567--574. Google ScholarDigital Library
K. Järvelin and J. Kekäläinen. 2000. IR evaluation methods for retrieving highly relevant documents. In SIGIR'00: 41--48. Google ScholarDigital Library
K. Järvelin et al. 2008. Discounted cumulated gain based evaluation of multiple-query IR sessions. In ECIR'08: 4--15. Google ScholarDigital Library
J. Jiang et al. 2015. Understanding and Predicting Graded Search Satisfaction. In WSDM'15: 57--66. Google ScholarDigital Library
E. Kanoulas et al. 2011. Evaluating multi-query sessions. In SIGIR'11: 1053--1062. Google ScholarDigital Library
D. Kelly. 2009. Methods for evaluating interactive information retrieval systems with users. Foundation and Trends in Information Retrieval, 3(1--2): 1--224. Google ScholarDigital Library
Y. Kim et al. 2014. Modeling dwell time to predict click-level satisfaction. In WSDM'14: 193--202. Google ScholarDigital Library
R. Kohavi et al. 2009. Controlled experiments on the web: survey and practical guide. Data Mining and Knowledge Discovery. 18(1): 140--181. Google ScholarDigital Library
R. Kohavi et al.. Trustworthy online controlled experiments: Five puzzling outcomes explained. In KDD'12, 786--794, 2012. {15} Google ScholarDigital Library
R. Kohavi et al. 2013. Online controlled experiments at large scale. In KDD'13, 1168--1176. Google ScholarDigital Library
R. Kohavi et al. Seven rules of thumb for web site experimenters. In KDD'14, 2014. Google ScholarDigital Library
W. Machmouchi and G. Buscher. 2016. Principles for the design of online A/B experiments. In SIGIR'16: 589--590 Google ScholarDigital Library
G. Mankiw. 2010. Principles of Macroeconomics. South-Western Cengage Learning.Google Scholar
A. Moffat and J. Zobel. 2008. Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst. 27(1). Google ScholarDigital Library
L. T. Su. 2003. A comprehensive and systematic model of user evaluation of Web search engines. In JASIST, 54(13): 1175--1192. Google ScholarDigital Library
D. Tang, et al. 2010. Overlapping Experiment Infrastructure: More, Better, Faster Experimentation. In KDD'10. Google ScholarDigital Library
H. Wang et al. 2014. Modeling action-level satisfaction for search task satisfac-tion prediction. In SIGIR'14: 123--132. Google ScholarDigital Library

Index Terms

Beyond Success Rate: Utility as a Search Quality Metric for Online Experiments
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Retrieval efficiency

Recommendations

Understanding and Predicting Graded Search Satisfaction
WSDM '15: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining

Understanding and estimating satisfaction with search engines is an important aspect of evaluating retrieval performance. Research to date has modeled and predicted search satisfaction on a binary scale, i.e., the searchers are either satisfied or ...
Read More
Meta-evaluation of Online and Offline Web Search Evaluation Metrics
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

As in most information retrieval (IR) studies, evaluation plays an essential part in Web search research. Both offline and online evaluation metrics are adopted in measuring the performance of search engines. Offline metrics are usually based on ...
Read More
Query/Task Satisfaction and Grid-based Evaluation Metrics Under Different Image Search Intents
SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

People use web image search with various search intents: from serious demands for work to just passing time by browsing images of a favorite actor. Such a diversity of intents can influence user satisfaction and evaluation metrics, both of which are ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management
November 2017
2604 pages
ISBN:9781450349185
DOI:10.1145/3132847
General Chairs:
Ee-Peng Lim
Singapore Management University, Singapore
,
Marianne Winslett
University of Illinois at Urbana-Champaign, USA, and Advanced Digital Sciences Center, Singapore
,
Program Chairs:
Mark Sanderson
RMIT, Australia
,
Ada Fu
Chinese University of Hong Kong, Hong Kong
,
Jimeng Sun
Georgia Tech, USA
,
Shane Culpepper
RMIT, Australia
,
Eric Lo
Chinese University of Hong Kong, Hong Kong
,
Joyce Ho
Emory University, USA
,
Debora Donato
Mix Tech, Inc., USA
,
Rakesh Agrawal
Data Insights Laboratories, USA
,
Yu Zheng
Microsoft Research Asia, China
,
Carlos Castillo
Qatar Computing Research Institute, Qatar
,
Aixin Sun
Nanyang Technological University, Singapore
,
Vincent S. Tseng
National Cheng Kung University, Taiwan
,
Chenliang Li
Wuhan University, China
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 November 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Search satisfaction
effort
evaluation
session.
utility
Qualifiers
- research-article
Conference

Acceptance Rates
CIKM '17 Paper Acceptance Rate171of855submissions,20%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 382
  Total Downloads
- Downloads (Last 12 months)37
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Beyond Success Rate: Utility as a Search Quality Metric for Online Experiments

CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Understanding and Predicting Graded Search Satisfaction

Meta-evaluation of Online and Offline Web Search Evaluation Metrics

Query/Task Satisfaction and Grid-based Evaluation Metrics Under Different Image Search Intents