research-article

SQR: Balancing Speed, Quality and Risk in Online Experiments

Authors:
Ya Xu

LinkedIn Corporation, Sunnyvale, CA, USA

LinkedIn Corporation, Sunnyvale, CA, USA
View Profile

,
Weitao Duan

LinkedIn Corporation, Sunnyvale, CA, USA

LinkedIn Corporation, Sunnyvale, CA, USA
View Profile

,
Shaochen Huang

LinkedIn Corporation, Sunnyvale, CA, USA

LinkedIn Corporation, Sunnyvale, CA, USA
View Profile

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningJuly 2018Pages 895–904https://doi.org/10.1145/3219819.3219875

Published:19 July 2018Publication History

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 895–904

ABSTRACT

Controlled experimentation, also called A/B testing, is widely adopted to accelerate product innovations in the online world. However, how fast we innovate can be limited by how we run experiments. Most experiments go through a "ramp up" process where we gradually increase the traffic to the new treatment to 100%. We have seen huge inefficiency and risk in how experiments are ramped, and it is getting in the way of innovation. This can go both ways: we ramp too slowly and much time and resource is wasted; or we ramp too fast and suboptimal decisions are made. In this paper, we build up a ramping framework that can effectively balance among Speed, Quality and Risk (SQR). We start out by identifying the top common mistakes experimenters make, and then introduce the four SQR principles corresponding to the four ramp phases of an experiment. To truly scale SQR to all experiments, we develop a statistical algorithm that is embedded into the process of running every experiment to automatically recommend ramp decisions. Finally, to complete the whole picture, we briefly cover the auto-ramp engineering infrastructure that can collect inputs and execute on the recommendations timely and reliably.

References

Kohavi, Ron, Deng, Alex, Frasca, Brian, Walker, Toby, Xu, Ya, and Pohlmann, Nils. Online Controlled Experiments at Large Scale. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (Chicago 2013), 1168--1176. Google ScholarDigital Library
Tang, Diane, Agarwal, Ashish, O'Brien, Deirdre, and Meyer, Mike. Overlapping Experiment Infrastructure: More, Better, Faster Experimentation. In Proceedings 16th Conference on Knowledge Discovery and Data Mining (Washington, DC 2010), 17--26. Google ScholarDigital Library
Xu, Ya, Chen, Nanyu, Fernandez, Addrian, Sinno, Omar, and Bhasin, Anmol. From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Sydney 2015), 2227--2236. Google ScholarDigital Library
Ries, Eric. The Lean Startup: How Today's Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses. Crown Business, 2011.Google Scholar
Kohavi, Ron, Longbotham, Roger, Sommerfield, Dan, and Henne, Randal M. Controlled experiments on the web: survey and practical guide. Data Mining and Knowledge Discovery, 18, 1 (Feb 2009), 140--181. Google ScholarDigital Library
Box, Joan F. R. A. Fisher and the Design of Experiments, 1922--1926. The American Statistician, 34, 1 (1980), 1--7.Google Scholar
Tamhane, Ajit C. Statistical analysis of designed experiments: theory and applications. John Wiley &Sons, Inc., 2009.Google ScholarCross Ref
Rubin, Donald B. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology (0 1974) Key: citeulike:4632390, 66, 5 (October 1974), 688--701.Google Scholar
Bakshy, Eytan, Eckles, Dean, and Bernstein, Michael S. Designing and Deploying Online Field Experiments. (Seoul 2014), Proceedings of the 23rd international conference on World wide web, 283--292. Google ScholarDigital Library
Wald, Abraham. Sequential Tests of Statistical Hypotheses. Annals of Mathematical Statistics, 16, 2 (June 1945), 117--186.Google ScholarCross Ref
Johnson, N. L. Sequential analysis:A survey. Journal of the Royal Statistical Society. Series A (General), 124, 3 (1961), 372--411.Google ScholarCross Ref
Lai, T. L. Sequential analysis. John Wiley &Sons, Ltd., 2001.Google Scholar
Bartroff, Jay, Lai, Tze Leung, and Shih, Mei-Chiung. Sequential experimentation in clinical trials: design and analysis. Springer Science &Business Media, 2012.Google Scholar
Chang, Yuan-chin Ivan. Application of sequential probability ratio test to computerized criterion-referenced testing. Sequential Analysis, 23, 1 (2004), 45--61.Google Scholar
Johari, Ramesh, Pekelis, Leo, and Walsh, David J. Alwaysvalid inference: Bringing sequential analysis to A/B testing. eprint arXiv:1512.04922 (Dec. 2015).Google Scholar
Siroker, Dan and Koomen, Pete. A / B Testing: The Most Powerful Way to Turn Clicks Into Customers. Wiley Publishing, 2013. Google ScholarDigital Library
Lehmann, Erich L. and Romano, Joseph P. Testing Statistical Hypotheses. Springer, 2008.Google Scholar
Deng, Alex, Xu, Ya, Kohavi, Ron, and Walker, Toby. Improving the sensitivity of online controlled experiments by utilizing pre-experiment data. (Rome), Proceedings of the sixth ACM international conference on Web search and data mining. Google ScholarDigital Library
Kohavi, Ron, Deng, Alex, Longbotham, Roger, and Xu, Ya. Seven Rules of Thumb for Web Site Experimenters. (New York 2014), Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. Google ScholarDigital Library
Kohavi, Ron, Deng, Alex, Frasca, Brian, Longbotham, Roger, Walker, Toby, and Xu, Ya. Trustworthy online controlled experiments: Five puzzling outcomes explained. (Beijing 2012), Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. Google ScholarDigital Library
Novikov, Andrey. Optimal sequential multiple hypothesis tests. ArXiv e-prints (Nov. 2008).Google Scholar
Benjamini, Yoav and Hochberg, Yosef. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological), 57, 1 (1995), 289--300.Google Scholar
Benjamini, Yoav and Yekutieli, Daniel. The control of the false discovery rate in multiple testing under dependency. The Annals of Statistics, 29, 4 (2001), 1165--1188.Google ScholarCross Ref
Hohnhold, Henning, O'Brien, Deirdre, and Tang, Diane. Focus on the Long-Term: It's better for Users and Business. (Sydney 2015), Proceedings of the 21st Conference on Knowledge Discovery and Data Mining, 1849--1858.Google Scholar

Index Terms

SQR: Balancing Speed, Quality and Risk in Online Experiments
1. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
      1. Causal reasoning and diagnostics
2. Mathematics of computing
  1. Probability and statistics
    1. Probabilistic inference problems

Recommendations

How A/B Tests Could Go Wrong: Automatic Diagnosis of Invalid Online Experiments
WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining

We have seen a massive growth of online experiments at Internet companies. Although conceptually simple, A/B tests can easily go wrong in the hands of inexperienced users and on an A/B testing platform with little governance. An invalid A/B test hurts ...
Read More
Trustworthy and Powerful Online Marketplace Experimentation with Budget-split Design
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

Online experimentation, also known as A/B testing, is the gold standard for measuring product impacts and making business decisions in the tech industry. The validity and utility of experiments, however, hinge on unbiasedness and sufficient power. In ...
Read More
A/B Testing at Scale: Accelerating Software Innovation
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

The Internet provides developers of connected software, including web sites, applications, and devices, an unprecedented opportunity to accelerate innovation by evaluating ideas quickly and accurately using controlled experiments, also known as A/B ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
July 2018
2925 pages
ISBN:9781450355520
DOI:10.1145/3219819
General Chairs:
Yike Guo
Imperial College London
,
Faisal Farooq
IBM
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 July 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
a/b testing
causal inference
controlled experiment
experimentation
quality
ramp
risk
speed
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '18 Paper Acceptance Rate107of983submissions,11%Overall Acceptance Rate1,133of8,635submissions,13%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 11
  Total Citations
  View Citations
- 649
  Total Downloads
- Downloads (Last 12 months)51
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

SQR: Balancing Speed, Quality and Risk in Online Experiments

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

How A/B Tests Could Go Wrong: Automatic Diagnosis of Invalid Online Experiments

Trustworthy and Powerful Online Marketplace Experimentation with Budget-split Design

A/B Testing at Scale: Accelerating Software Innovation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

SQR: Balancing Speed, Quality and Risk in Online Experiments

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

How A/B Tests Could Go Wrong: Automatic Diagnosis of Invalid Online Experiments

Trustworthy and Powerful Online Marketplace Experimentation with Budget-split Design

A/B Testing at Scale: Accelerating Software Innovation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media