ABSTRACT
The Internet provides developers of connected software, including web sites, applications, and devices, an unprecedented opportunity to accelerate innovation by evaluating ideas quickly and accurately using controlled experiments, also known as A/B tests. From front-end user-interface changes to backend algorithms, from search engines (e.g., Google, Bing, Yahoo!) to retailers (e.g., Amazon, eBay, Etsy) to social networking services (e.g., Facebook, LinkedIn, Twitter) to travel services (e.g., Expedia, Airbnb, Booking.com) to many startups, online controlled experiments are now utilized to make data-driven decisions at a wide range of companies. While the theory of a controlled experiment is simple, and dates back to Sir Ronald A. Fisher's experiments at the Rothamsted Agricultural Experimental Station in England in the 1920s, the deployment and evaluation of online controlled experiments at scale (100's of concurrently running experiments) across variety of web sites, mobile apps, and desktop applications presents many pitfalls and new research challenges. In this tutorial we will give an introduction to A/B testing, share key lessons learned from scaling experimentation at Bing to thousands of experiments per year, present real examples, and outline promising directions for future work. The tutorial will go beyond applications of A/B testing in information retrieval and will also discuss on practical and research challenges arising in experimentation on web sites and mobile and desktop apps. Our goal in this tutorial is to teach attendees how to scale experimentation for their teams, products, and companies, leading to better data-driven decisions. We also want to inspire more academic research in the relatively new and rapidly evolving field of online controlled experimentation.
- R. Kohavi, "Online Controlled Experiments: Lessons from Running A/B/n Tests for 12 Years," in Conference on Knowledge Discovery and Data Mining (KDD), 2009.Google Scholar
- A. Fabijan, P. Dmitriev, H. Holmstrom and J. Bosch, "The Evolution of Continuous Experimentation in Software Product Development," in International Conference on Software Engineering (ICSE), 2017.Google Scholar
- A. Deng and X. Shi, "Data-Driven Metric Development for Online Controlled Experiments: Seven Lessons Learned," in Conference on Knowledge Discovery and Data Mining (KDD), 2016. Google ScholarDigital Library
- A. Deng, J. Lu and S. Chen, "Continuous monitoring of A/B tests without pain: Optional stopping in Bayesian testing," in Conference on Data Science and Advanced Analytics, 2016. Google ScholarCross Ref
- P. Dmitriev and X. Wu, "Measuring Metrics," in Conference on Information and Knowledge Management (CIKM), 2016. Google ScholarDigital Library
- W. Machmouchi and G. Buscher, "Principles for the Design of Online A/B Metrics," in ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2016. Google ScholarDigital Library
- Z. Zhao, M. Chen, D. Matheson and M. Stone, "Online Experimentation Diagnosis and Troubleshooting Beyond AA Validation," in Conference on Data Science and Advanced Analytics, 2016. Google ScholarCross Ref
- R. Kohavi, R. Longbotham and J. Quarto-vonTivadar, "Planning, Running, and Analyzing Controlled Experiments on the Web," in tutorial at Conference on Knowledge Discovery and Data Mining, 2009.Google Scholar
- R. Kohavi, "Pitfalls in Online Controlled Experiments," in MIT COnference on Digital Experimentation (CODE), 2016.Google Scholar
- R. Kohavi, A. Deng, B. Frasca, R. Longbotham, T. Walker and Y. Xu, "Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained," in Conference on Knowledge Discovery and Data Mining (KDD), 2012. Google ScholarDigital Library
- A. Deng, Y. Xu, R. Kohavi and T. Walker, "Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-Experiment Data," in Conference on Web Search and Data Mining (WSDM), 2013. Google ScholarDigital Library
- R. Kohavi, A. Deng, B. Frasca, T. Walker, Y. Xu and N. Pohlmann, "Online Controlled Experiments at Large Scale," in Conference on Knowledge Discovery and Data Mining (KDD), 2013. Google ScholarDigital Library
- A. Deng, "Objective Bayesian Two Sample Hypothesis Testing for Online Controlled Experiments," in World Wide Web Conference (WWW), 2015.Google Scholar
- A. Deng, P. Zhang, S. Chen, D. Kim and J. Lu, "Concise Summarization of Heterogeneous Treatment Effect Using Total Variation Regularized Regression," in In submission, 2017.Google Scholar
Index Terms
- A/B Testing at Scale: Accelerating Software Innovation
Recommendations
Online controlled experiments at large scale
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data miningWeb-facing companies, including Amazon, eBay, Etsy, Facebook, Google, Groupon, Intuit, LinkedIn, Microsoft, Netflix, Shop Direct, StumbleUpon, Yahoo, and Zynga use online controlled experiments to guide product development and accelerate innovation. At ...
Network A/B Testing: From Sampling to Estimation
WWW '15: Proceedings of the 24th International Conference on World Wide WebA/B testing, also known as bucket testing, split testing, or controlled experiment, is a standard way to evaluate user engagement or satisfaction from a new service, feature, or product. It is widely used in online websites, including social network ...
A/B Testing Intuition Busters: Common Misunderstandings in Online Controlled Experiments
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data MiningA/B tests, or online controlled experiments, are heavily used in industry to evaluate implementations of ideas. While the statistics behind controlled experiments are well documented and some basic pitfalls known, we have observed some seemingly ...
Comments