research-article

A/B Testing at Scale: Accelerating Software Innovation

Authors:
Alex Deng

Microsoft Corporation, Redmond, WA, USA

Microsoft Corporation, Redmond, WA, USA
View Profile

,
Pavel Dmitriev

Microsoft Corporation, Redmond, WA, USA

Microsoft Corporation, Redmond, WA, USA
View Profile

,
Somit Gupta

Microsoft Corporation, Redmond, WA, USA

Microsoft Corporation, Redmond, WA, USA
View Profile

,
Ron Kohavi

Microsoft Corporation, Remond, WA, USA

Microsoft Corporation, Remond, WA, USA
View Profile

,
Paul Raff

Microsoft Corporation, Redmond, WA, USA

Microsoft Corporation, Redmond, WA, USA
View Profile

,
Lukas Vermeer

Booking.com, Amsterdam, Netherlands

Booking.com, Amsterdam, Netherlands
View Profile

SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information RetrievalAugust 2017Pages 1395–1397https://doi.org/10.1145/3077136.3082060

Published:07 August 2017Publication History

SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 1395–1397

ABSTRACT

The Internet provides developers of connected software, including web sites, applications, and devices, an unprecedented opportunity to accelerate innovation by evaluating ideas quickly and accurately using controlled experiments, also known as A/B tests. From front-end user-interface changes to backend algorithms, from search engines (e.g., Google, Bing, Yahoo!) to retailers (e.g., Amazon, eBay, Etsy) to social networking services (e.g., Facebook, LinkedIn, Twitter) to travel services (e.g., Expedia, Airbnb, Booking.com) to many startups, online controlled experiments are now utilized to make data-driven decisions at a wide range of companies. While the theory of a controlled experiment is simple, and dates back to Sir Ronald A. Fisher's experiments at the Rothamsted Agricultural Experimental Station in England in the 1920s, the deployment and evaluation of online controlled experiments at scale (100's of concurrently running experiments) across variety of web sites, mobile apps, and desktop applications presents many pitfalls and new research challenges. In this tutorial we will give an introduction to A/B testing, share key lessons learned from scaling experimentation at Bing to thousands of experiments per year, present real examples, and outline promising directions for future work. The tutorial will go beyond applications of A/B testing in information retrieval and will also discuss on practical and research challenges arising in experimentation on web sites and mobile and desktop apps. Our goal in this tutorial is to teach attendees how to scale experimentation for their teams, products, and companies, leading to better data-driven decisions. We also want to inspire more academic research in the relatively new and rapidly evolving field of online controlled experimentation.

References

R. Kohavi, "Online Controlled Experiments: Lessons from Running A/B/n Tests for 12 Years," in Conference on Knowledge Discovery and Data Mining (KDD), 2009.Google Scholar
A. Fabijan, P. Dmitriev, H. Holmstrom and J. Bosch, "The Evolution of Continuous Experimentation in Software Product Development," in International Conference on Software Engineering (ICSE), 2017.Google Scholar
A. Deng and X. Shi, "Data-Driven Metric Development for Online Controlled Experiments: Seven Lessons Learned," in Conference on Knowledge Discovery and Data Mining (KDD), 2016. Google ScholarDigital Library
A. Deng, J. Lu and S. Chen, "Continuous monitoring of A/B tests without pain: Optional stopping in Bayesian testing," in Conference on Data Science and Advanced Analytics, 2016. Google ScholarCross Ref
P. Dmitriev and X. Wu, "Measuring Metrics," in Conference on Information and Knowledge Management (CIKM), 2016. Google ScholarDigital Library
W. Machmouchi and G. Buscher, "Principles for the Design of Online A/B Metrics," in ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2016. Google ScholarDigital Library
Z. Zhao, M. Chen, D. Matheson and M. Stone, "Online Experimentation Diagnosis and Troubleshooting Beyond AA Validation," in Conference on Data Science and Advanced Analytics, 2016. Google ScholarCross Ref
R. Kohavi, R. Longbotham and J. Quarto-vonTivadar, "Planning, Running, and Analyzing Controlled Experiments on the Web," in tutorial at Conference on Knowledge Discovery and Data Mining, 2009.Google Scholar
R. Kohavi, "Pitfalls in Online Controlled Experiments," in MIT COnference on Digital Experimentation (CODE), 2016.Google Scholar
R. Kohavi, A. Deng, B. Frasca, R. Longbotham, T. Walker and Y. Xu, "Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained," in Conference on Knowledge Discovery and Data Mining (KDD), 2012. Google ScholarDigital Library
A. Deng, Y. Xu, R. Kohavi and T. Walker, "Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-Experiment Data," in Conference on Web Search and Data Mining (WSDM), 2013. Google ScholarDigital Library
R. Kohavi, A. Deng, B. Frasca, T. Walker, Y. Xu and N. Pohlmann, "Online Controlled Experiments at Large Scale," in Conference on Knowledge Discovery and Data Mining (KDD), 2013. Google ScholarDigital Library
A. Deng, "Objective Bayesian Two Sample Hypothesis Testing for Online Controlled Experiments," in World Wide Web Conference (WWW), 2015.Google Scholar
A. Deng, P. Zhang, S. Chen, D. Kim and J. Lu, "Concise Summarization of Heterogeneous Treatment Effect Using Total Variation Regularized Regression," in In submission, 2017.Google Scholar

Index Terms

A/B Testing at Scale: Accelerating Software Innovation
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Test collections

Recommendations

Online controlled experiments at large scale
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Web-facing companies, including Amazon, eBay, Etsy, Facebook, Google, Groupon, Intuit, LinkedIn, Microsoft, Netflix, Shop Direct, StumbleUpon, Yahoo, and Zynga use online controlled experiments to guide product development and accelerate innovation. At ...
Read More
Network A/B Testing: From Sampling to Estimation
WWW '15: Proceedings of the 24th International Conference on World Wide Web

A/B testing, also known as bucket testing, split testing, or controlled experiment, is a standard way to evaluate user engagement or satisfaction from a new service, feature, or product. It is widely used in online websites, including social network ...
Read More
A/B Testing Intuition Busters: Common Misunderstandings in Online Controlled Experiments
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

A/B tests, or online controlled experiments, are heavily used in industry to evaluate implementations of ideas. While the statistics behind controlled experiments are well documented and some basic pitfalls known, we have observed some seemingly ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval
August 2017
1476 pages
ISBN:9781450350228
DOI:10.1145/3077136
General Chairs:
Noriko Kando
National Institute of Informatics
,
Tetsuya Sakai
Waseda University
,
Hideo Joho
University of Tsukuba
,
Program Chairs:
Hang Li
Huawei Noah's Ark Lab
,
Arjen P. de Vries
Radboud University
,
Ryen W. White
Microsoft Cortana
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 August 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
a/b testing
experimentation
Qualifiers
- research-article
Conference

Acceptance Rates
SIGIR '17 Paper Acceptance Rate78of362submissions,22%Overall Acceptance Rate792of3,983submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 553
  Total Downloads
- Downloads (Last 12 months)40
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A/B Testing at Scale: Accelerating Software Innovation

SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Online controlled experiments at large scale

Network A/B Testing: From Sampling to Estimation

A/B Testing Intuition Busters: Common Misunderstandings in Online Controlled Experiments