skip to main content
10.1145/3219819.3219875acmotherconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

SQR: Balancing Speed, Quality and Risk in Online Experiments

Published:19 July 2018Publication History

ABSTRACT

Controlled experimentation, also called A/B testing, is widely adopted to accelerate product innovations in the online world. However, how fast we innovate can be limited by how we run experiments. Most experiments go through a "ramp up" process where we gradually increase the traffic to the new treatment to 100%. We have seen huge inefficiency and risk in how experiments are ramped, and it is getting in the way of innovation. This can go both ways: we ramp too slowly and much time and resource is wasted; or we ramp too fast and suboptimal decisions are made. In this paper, we build up a ramping framework that can effectively balance among Speed, Quality and Risk (SQR). We start out by identifying the top common mistakes experimenters make, and then introduce the four SQR principles corresponding to the four ramp phases of an experiment. To truly scale SQR to all experiments, we develop a statistical algorithm that is embedded into the process of running every experiment to automatically recommend ramp decisions. Finally, to complete the whole picture, we briefly cover the auto-ramp engineering infrastructure that can collect inputs and execute on the recommendations timely and reliably.

References

  1. Kohavi, Ron, Deng, Alex, Frasca, Brian, Walker, Toby, Xu, Ya, and Pohlmann, Nils. Online Controlled Experiments at Large Scale. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (Chicago 2013), 1168--1176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Tang, Diane, Agarwal, Ashish, O'Brien, Deirdre, and Meyer, Mike. Overlapping Experiment Infrastructure: More, Better, Faster Experimentation. In Proceedings 16th Conference on Knowledge Discovery and Data Mining (Washington, DC 2010), 17--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Xu, Ya, Chen, Nanyu, Fernandez, Addrian, Sinno, Omar, and Bhasin, Anmol. From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Sydney 2015), 2227--2236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Ries, Eric. The Lean Startup: How Today's Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses. Crown Business, 2011.Google ScholarGoogle Scholar
  5. Kohavi, Ron, Longbotham, Roger, Sommerfield, Dan, and Henne, Randal M. Controlled experiments on the web: survey and practical guide. Data Mining and Knowledge Discovery, 18, 1 (Feb 2009), 140--181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Box, Joan F. R. A. Fisher and the Design of Experiments, 1922--1926. The American Statistician, 34, 1 (1980), 1--7.Google ScholarGoogle Scholar
  7. Tamhane, Ajit C. Statistical analysis of designed experiments: theory and applications. John Wiley &Sons, Inc., 2009.Google ScholarGoogle ScholarCross RefCross Ref
  8. Rubin, Donald B. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology (0 1974) Key: citeulike:4632390, 66, 5 (October 1974), 688--701.Google ScholarGoogle Scholar
  9. Bakshy, Eytan, Eckles, Dean, and Bernstein, Michael S. Designing and Deploying Online Field Experiments. (Seoul 2014), Proceedings of the 23rd international conference on World wide web, 283--292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Wald, Abraham. Sequential Tests of Statistical Hypotheses. Annals of Mathematical Statistics, 16, 2 (June 1945), 117--186.Google ScholarGoogle ScholarCross RefCross Ref
  11. Johnson, N. L. Sequential analysis:A survey. Journal of the Royal Statistical Society. Series A (General), 124, 3 (1961), 372--411.Google ScholarGoogle ScholarCross RefCross Ref
  12. Lai, T. L. Sequential analysis. John Wiley &Sons, Ltd., 2001.Google ScholarGoogle Scholar
  13. Bartroff, Jay, Lai, Tze Leung, and Shih, Mei-Chiung. Sequential experimentation in clinical trials: design and analysis. Springer Science &Business Media, 2012.Google ScholarGoogle Scholar
  14. Chang, Yuan-chin Ivan. Application of sequential probability ratio test to computerized criterion-referenced testing. Sequential Analysis, 23, 1 (2004), 45--61.Google ScholarGoogle Scholar
  15. Johari, Ramesh, Pekelis, Leo, and Walsh, David J. Alwaysvalid inference: Bringing sequential analysis to A/B testing. eprint arXiv:1512.04922 (Dec. 2015).Google ScholarGoogle Scholar
  16. Siroker, Dan and Koomen, Pete. A / B Testing: The Most Powerful Way to Turn Clicks Into Customers. Wiley Publishing, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Lehmann, Erich L. and Romano, Joseph P. Testing Statistical Hypotheses. Springer, 2008.Google ScholarGoogle Scholar
  18. Deng, Alex, Xu, Ya, Kohavi, Ron, and Walker, Toby. Improving the sensitivity of online controlled experiments by utilizing pre-experiment data. (Rome), Proceedings of the sixth ACM international conference on Web search and data mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kohavi, Ron, Deng, Alex, Longbotham, Roger, and Xu, Ya. Seven Rules of Thumb for Web Site Experimenters. (New York 2014), Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kohavi, Ron, Deng, Alex, Frasca, Brian, Longbotham, Roger, Walker, Toby, and Xu, Ya. Trustworthy online controlled experiments: Five puzzling outcomes explained. (Beijing 2012), Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Novikov, Andrey. Optimal sequential multiple hypothesis tests. ArXiv e-prints (Nov. 2008).Google ScholarGoogle Scholar
  22. Benjamini, Yoav and Hochberg, Yosef. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological), 57, 1 (1995), 289--300.Google ScholarGoogle Scholar
  23. Benjamini, Yoav and Yekutieli, Daniel. The control of the false discovery rate in multiple testing under dependency. The Annals of Statistics, 29, 4 (2001), 1165--1188.Google ScholarGoogle ScholarCross RefCross Ref
  24. Hohnhold, Henning, O'Brien, Deirdre, and Tang, Diane. Focus on the Long-Term: It's better for Users and Business. (Sydney 2015), Proceedings of the 21st Conference on Knowledge Discovery and Data Mining, 1849--1858.Google ScholarGoogle Scholar

Index Terms

  1. SQR: Balancing Speed, Quality and Risk in Online Experiments

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
        July 2018
        2925 pages
        ISBN:9781450355520
        DOI:10.1145/3219819

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 July 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        KDD '18 Paper Acceptance Rate107of983submissions,11%Overall Acceptance Rate1,133of8,635submissions,13%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader