skip to main content
10.1145/3447548.3467193acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Trustworthy and Powerful Online Marketplace Experimentation with Budget-split Design

Published: 14 August 2021 Publication History

Abstract

Online experimentation, also known as A/B testing, is the gold standard for measuring product impacts and making business decisions in the tech industry. The validity and utility of experiments, however, hinge on unbiasedness and sufficient power. In two-sided online marketplaces, both requirements are called into question. The Bernoulli randomized experiments are biased because treatment units interfere with control units through market competition and violate the "stable unit treatment value assumption"(SUTVA). The experimental power on at least one side of the market is often insufficient because of disparate sample sizes on the two sides. Despite the importance of online marketplaces to the online economy and the crucial role experimentation plays in product development, there lacks an effective and practical solution to the bias and low power problems in marketplace experimentation. In this paper we address this shortcoming by proposing the budget-split design, which is unbiased in any marketplace where buyers have a finite or infinite budget. We show that it is more powerful than all other unbiased designs in the literature. We then provide a generalizable system architecture for deploying this design to online marketplaces. Finally, we confirm the effectiveness of our proposal with empirical performance from experiments run in two real-world online marketplaces. We demonstrate how it achieves over 15x gain in experimental power and removes market competition induced bias, which can be up to 230% the treatment effect size.

Supplementary Material

MP4 File (trustworthy_and_powerful_online_marketplace-min_liu-jialiang_mao-38958162-h11g.mp4)
This is a 20min presentation video for "Trustworthy and Powerful Online Marketplace Experimentation with Budget-split Design", which describes the importance and challenges of the problem, and insights into the solution.

References

[1]
Deepak Agarwal, Souvik Ghosh, Kai Wei, and Siyu You. 2014. Budget pacing for targeted online advertisements at linkedin. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 1613--1619.
[2]
Peter M Aronow, Cyrus Samii, et al. 2017. Estimating average causal effects under general interference, with application to a social network experiment. The Annals of Applied Statistics 11, 4 (2017), 1912--1947.
[3]
Guillaume W. Basse, Hossein Azari Soufiani, and Diane Lambert. 2016. Randomization and The Pernicious Effects of Limited Budgets on Auction Experiments. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, AI STATS 2016, Cadiz, Spain, May 9--11, 2016 (JMLR Workshop and Conference Proceedings, Vol. 51), Arthur Gretton and Christian C. Robert (Eds.). JMLR.org,1412--1420. http://proceedings.mlr.press/v51/basse16b.html
[4]
Thomas Blake and Dominic Coey. 2014. Why Marketplace Experimentation is Harder than It Seems: The Role of Test-Control Interference. In Proceedings of the Fifteenth ACM Conference on Economics and Computation(Palo Alto, California, USA) (EC '14). Association for Computing Machinery, New York, NY, USA, 567--582. https://doi.org/10.1145/2600057.2602837
[5]
Iavor Bojinov and Neil Shephard. 2019. Time Series Experiments and Causal Estimands: Exact Randomization Tests and Trading. J. Amer. Statist. Assoc. 114, 528 (2019), 1665--1682. https://doi.org/10.1080/01621459.2018.1527225arXiv:https://doi.org/10.1080/01621459.2018.1527225
[6]
Iavor Bojinov, David Simchi-Levi, and Jinglong Zhao. 2020. Design and Analysis of Switchback Experiments. Available at SSRN 3684168(2020).
[7]
Alex Chin. 2018. Central limit theorems via Stein's method for randomized experiments under interference. arXiv preprint arXiv:1804.03105(2018).
[8]
Alex Deng and Xiaolin Shi. 2016. Data-Driven Metric Development for Online Controlled Experiments. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '16. https://doi.org/10.1145/2939672.2939700
[9]
Alex Deng, Ya Xu, Ron Kohavi, and Toby Walker. 2013. Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-Experiment Data. In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining(Rome, Italy) (WSDM '13). Association for Computing Machinery, New York, NY, USA, 123--132. https://doi.org/10.1145/2433396.2433413
[10]
Nick Doudchenko, Minzhengxiong Zhang, Evgeni Drynkin, Edoardo Airoldi, Vahab Mirrokni, and Jean Pouget-Abadie. 2020. Causal inference with bipartite designs. arXiv preprint arXiv:2010.02108(2020).
[11]
Dean Eckles, Brian Karrer, and Johan Ugander. 01 Mar. 2017. Design and Analysis of Experiments in Networks: Reducing Bias from Interference.Journal of Causal Inference 5, 1 (01 Mar. 2017), 20150021. https://doi.org/10.1515/jci-2015-0021
[12]
Benjamin Edelman, Michael Ostrovsky, and Michael Schwarz. 2007. Internet advertising and the generalized second-price auction: Selling billions of dollars worth of keywords. American economic review 97, 1 (2007), 242--259.
[13]
Viet Ha-Thuc, Avishek Dutta, Ren Mao, Matthew Wood, and Yunli Liu. 2020. A Counterfactual Framework for Seller-Side A/B Testing on Marketplaces. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2288--2296.
[14]
Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yanxin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers, and Joaquin Quinonero Candela. 2014. Practical Lessons from Predicting Clicks on Ads at Facebook. In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising(New York, NY, USA) (ADKDD'14). Association for Computing Machinery, New York, NY, USA, 1--9. https://doi.org/10.1145/2648584.2648589
[15]
Paul W Holland. 1986. Statistics and causal inference. Journal of the American statistical Association 81, 396 (1986), 945--960.
[16]
Michael G Hudgens and M Elizabeth Halloran. 2008. Toward causal inference with interference.J. Amer. Statist. Assoc. 103, 482 (2008), 832--842.
[17]
Guido W Imbens and Donald B Rubin. 2015. Causal inference in statistics, social, and biomedical sciences. Cambridge University Press.
[18]
Jason (Xiao) Wang. 2018. The Design of A/B Tests in an Online Market-place. https://tech.ebayinc.com/research/the-design-of-a-b-tests-in-an-online-marketplace/
[19]
Ramesh Johari, Pete Koomen, Leonid Pekelis, and David Walsh. 2017. Peeking at A/B Tests: Why It Matters, and What to Do about It. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(Halifax, NS, Canada)(KDD '17). Association for Computing Machinery, New York, NY, USA, 1517--1525. https://doi.org/10.1145/3097983.3097992
[20]
Ramesh Johari, Hannah Li, and Gabriel Y. Weintraub. 2020. Experimental Design in Two-Sided Platforms: An Analysis of Bias. Proceedings of the 21st ACM Conference on Economics and Computation(2020).
[21]
Ron Kohavi, Alex Deng, Brian Frasca, Roger Longbotham, Toby Walker, and Ya Xu. 2012. Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained. Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining(2012). https://doi.org/10.1145/2339530.2339653 arXiv:arXiv:1503.08776v1
[22]
Lara O'Reilly. 2015. These charts show how far Google and Facebook are ahead of Twitter. https://www.businessinsider.com/macquarie-research-facebook-google-and-twitter-number-of-advertisers-2015--2#:?:text=Google%3A%204%20million%20advertisers,Facebook%3A%202%20million%20advertisers
[23]
Benjamin Letham, Brian Karrer, Guilherme Ottoni, and Eytan Bakshy. 2019. Constrained Bayesian Optimization with Noisy Experiments. Bayesian Anal. 14, 2 (06 2019), 495--519. https://doi.org/10.1214/18-BA1110
[24]
H. McMahan, Gary Holt, D. Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, Sharat Chikkerur, Dan Liu, Martin Wattenberg, Arnar Hrafnkelsson, Tom Boulos, and Jeremy Kubica. 2013. Ad click prediction: a view from the trenches. 1222--1230. https://doi.org/10.1145/2487575.2488200
[25]
Greg Novak, Sven Schmit, and Dave Spiegel. 2020. Experimentation with resource constraints. https://multithreaded.stitchfix.com/blog/2020/11/18/virtual-warehouse/
[26]
Jean Pouget-Abadie, Kevin Aydin, Warren Schudy, Kay Brodersen, and Vahab Mirrokni. 2019. Variance Reduction in Bipartite Experiments through Correlation Clustering. In Advances in Neural Information Processing Systems. 13309--13319.
[27]
Donald B Rubin. 1974. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of educational Psychology 66, 5 (1974), 688.
[28]
Donald B Rubin. 2005. Causal Inference Using Potential Outcomes.J. Amer. Statist. Assoc. 100, 469 (2005), 322--331. https://doi.org/10.1198/016214504000001880 arXiv: https://doi.org/10.1198/016214504000001880
[29]
Martin Saveski, Jean Pouget-Abadie, Guillaume Saint-Jacques, Weitao Duan, Sou-vik Ghosh, Ya Xu, and Edoardo M. Airoldi. 2017. Detecting Network Effects: Randomizing Over Randomized Experiments. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(Halifax, NS, Canada) (KDD '17). Association for Computing Machinery, New York,NY, USA, 1027--1035. https://doi.org/10.1145/3097983.3098192
[30]
Diane Tang, Ashish Agarwal, Deirdre O'Brien, and Mike Meyer. 2010. Overlapping Experiment Infrastructure: More, Better, Faster Experimentation. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(Washington, DC, USA) (KDD '10). Association for Computing Machinery, New York, NY, USA, 17--26. https://doi.org/10.1145/1835804.1835810
[31]
WebsiteBuilder. 2020. 40 Jaw-Dropping Google Stats Facts. https://websitebuilder.org/google-stats/
[32]
Ya Xu, Nanyu Chen, Addrian Fernandez, Omar Sinno, and Anmol Bhasin. 2015. From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(Sydney, NSW, Australia) (KDD '15). Association for Computing Machinery, New York, NY, USA, 2227--2236. https://doi.org/10.1145/2783258.2788602
[33]
Ya Xu, Weitao Duan, and Shaochen Huang. 2018. SQR: Balancing Speed, Quality and Risk in Online Experiments. 895--904. https://doi.org/10.1145/3219819.3219875

Cited By

View all
  • (2024)Improving Ego-Cluster for Network Effect MeasurementProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671557(5713-5722)Online publication date: 25-Aug-2024
  • (2024)Learning Links for Adaptable and Explainable RetrievalProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679953(4046-4050)Online publication date: 21-Oct-2024
  • (2023)Statistical inference and A/B testing for first-price pacing equilibriaProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619268(20868-20905)Online publication date: 23-Jul-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
August 2021
4259 pages
ISBN:9781450383325
DOI:10.1145/3447548
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 August 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. A/B testing
  2. algorithms
  3. causal inference
  4. controlled experiment
  5. experimentation
  6. online marketplaces

Qualifiers

  • Research-article

Conference

KDD '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)62
  • Downloads (Last 6 weeks)12
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Improving Ego-Cluster for Network Effect MeasurementProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671557(5713-5722)Online publication date: 25-Aug-2024
  • (2024)Learning Links for Adaptable and Explainable RetrievalProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679953(4046-4050)Online publication date: 21-Oct-2024
  • (2023)Statistical inference and A/B testing for first-price pacing equilibriaProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619268(20868-20905)Online publication date: 23-Jul-2023
  • (2023)A Common Misassumption in Online Experiments with Machine Learning ModelsACM SIGIR Forum10.1145/3636341.363635857:1(1-9)Online publication date: 4-Dec-2023
  • (2023)Quantifying the Effectiveness of Advertising: A Bootstrap Proportion Test for Brand Lift TestingProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615021(1627-1636)Online publication date: 21-Oct-2023
  • (2023)All about Sample-Size Calculations for A/B Testing: Novel Extensions & Practical GuideProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614779(3574-3583)Online publication date: 21-Oct-2023
  • (2023)Detecting Interference in Online Controlled Experiments with Increasing AllocationProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599308(661-672)Online publication date: 6-Aug-2023
  • (2023)Near-Optimal Experimental Design Under the Budget Constraint in Online PlatformsProceedings of the ACM Web Conference 202310.1145/3543507.3583528(3603-3613)Online publication date: 30-Apr-2023
  • (2023)Statistical Challenges in Online Controlled Experiments: A Review of A/B Testing MethodologyThe American Statistician10.1080/00031305.2023.225723778:2(135-149)Online publication date: 18-Oct-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media