skip to main content
10.1145/3159652.3159699acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Consistent Transformation of Ratio Metrics for Efficient Online Controlled Experiments

Published: 02 February 2018 Publication History

Abstract

We study ratio overall evaluation criteria (user behavior quality metrics) and, in particular, average values of non-user level metrics, that are widely used in A/B testing as an important part of modern Internet companies» evaluation instruments (e.g., abandonment rate, a user»s absence time after a session).
We focus on the problem of sensitivity improvement of these criteria, since there is a large gap between the variety of sensitivity improvement techniques designed for user level metrics and the variety of such techniques for ratio criteria.
We propose a novel transformation of a ratio criterion to the average value of a user level (randomization-unit level, in general) metric that creates an opportunity to directly use a wide range of sensitivity improvement techniques designed for the user level that make A/B tests more efficient. We provide theoretical guarantees on the novel metric»s consistency in terms of preservation of two crucial properties (directionality and significance level) w.r.t. the source ratio criteria.
The experimental evaluation of the approach is done on hundreds large-scale real A/B tests run at one of the most popular global search engines, reinforces the theoretical results, and demonstrates up to $+34%$ of sensitivity rate improvement achieved by the transformation combined with the best known regression adjustment.

References

[1]
Olga Arkhipova, Lidia Grauer, Igor Kuralenok, and Pavel Serdyukov . 2015. Search Engine Evaluation based on Search Engine Switching Prediction SIGIR'2015. ACM, 723--726.
[2]
Eytan Bakshy and Dean Eckles . 2013. Uncertainty in online experiments with dependent data: An evaluation of bootstrap methods KDD'2013. 1303--1311.
[3]
Shuchi Chawla, Jason Hartline, and Denis Nekipelov . 2016. A/B testing of auctions. In EC'2016.
[4]
Thomas Crook, Brian Frasca, Ron Kohavi, and Roger Longbotham . 2009. Seven pitfalls to avoid when running controlled experiments on the web KDD'2009. 1105--1114.
[5]
Alex Deng, Jiannan Lu, and Jonthan Litz . 2017. Trustworthy Analysis of Online A/B Tests: Pitfalls, challenges and solutions WSDM'2017. 641--649.
[6]
Alex Deng and Xiaolin Shi . 2016. Data-Driven Metric Development for Online Controlled Experiments: Seven Lessons Learned KDD'2016.
[7]
Alex Deng, Ya Xu, Ron Kohavi, and Toby Walker . 2013. Improving the sensitivity of online controlled experiments by utilizing pre-experiment data WSDM'2013. 123--132.
[8]
Pavel Dmitriev and Xian Wu . 2016. Measuring Metrics CIKM'2016. 429--437.
[9]
Alexey Drutsa . 2015. Sign-Aware Periodicity Metrics of User Engagement for Online Search Quality Evaluation SIGIR'2015. 779--782.
[10]
Alexey Drutsa, Gleb Gusev, and Pavel Serdyukov . 2015 a. Engagement Periodicity in Search Engine Usage: Analysis and Its Application to Search Quality Evaluation. In WSDM'2015. 27--36.
[11]
Alexey Drutsa, Gleb Gusev, and Pavel Serdyukov . 2015 b. Future User Engagement Prediction and its Application to Improve the Sensitivity of Online Experiments. In WWW'2015. 256--266.
[12]
Alexey Drutsa, Gleb Gusev, and Pavel Serdyukov . 2017 a. Periodicity in User Engagement with a Search Engine and its Application to Online Controlled Experiments. ACM Transactions on the Web (TWEB) Vol. 11 (2017).
[13]
Alexey Drutsa, Gleb Gusev, and Pavel Serdyukov . 2017 b. Using the Delay in a Treatment Effect to Improve Sensitivity and Preserve Directionality of Engagement Metrics in A/B Experiments WWW'2017.
[14]
Alexey Drutsa, Anna Ufliand, and Gleb Gusev . 2015 c. Practical Aspects of Sensitivity in Online Experimentation with User Engagement Metrics CIKM'2015. 763--772.
[15]
Georges Dupret and Mounia Lalmas . 2013. Absence time and user engagement: evaluating ranking functions WSDM'2013. 173--182.
[16]
Bradley Efron and Robert J Tibshirani . 1994. An introduction to the bootstrap. CRC press.
[17]
David A Freedman . 2008. On regression adjustments to experimental data. Advances in Applied Mathematics Vol. 40, 2 (2008), 180--193.
[18]
David A Freedman, David Collier, Jasjeet S Sekhon, and Philip B Stark . 2010. Statistical models and causal inference: a dialogue with the social sciences. Cambridge University Press.
[19]
Henning Hohnhold, Deirdre O'Brien, and Diane Tang . 2015. Focusing on the Long-term: It's Good for Users and Business KDD'2015. 1849--1858.
[20]
Bernard J Jansen, Amanda Spink, and Vinish Kathuria . 2007. How to define searching sessions on web search engines. Advances in Web Mining and Web Usage Analysis. Springer, 92--109.
[21]
Eugene Kharitonov, Alexey Drutsa, and Pavel Serdyukov . 2017. Learning Sensitive Combinations of A/B Test Metrics WSDM'2017.
[22]
Eugene Kharitonov, Craig Macdonald, Pavel Serdyukov, and Iadh Ounis . 2015 a. Optimised Scheduling of Online Experiments. In SIGIR'2015. 453--462.
[23]
Eugene Kharitonov, Aleksandr Vorobev, Craig Macdonald, Pavel Serdyukov, and Iadh Ounis . 2015 b. Sequential Testing for Early Stopping of Online Experiments SIGIR'2015. 473--482.
[24]
Ronny Kohavi, Thomas Crook, Roger Longbotham, Brian Frasca, Randy Henne, Juan Lavista Ferres, and Tamir Melamed . 2009 a. Online experimentation at Microsoft. Data Mining Case Studies (2009), 11.
[25]
Ron Kohavi, Alex Deng, Brian Frasca, Roger Longbotham, Toby Walker, and Ya Xu . 2012. Trustworthy online controlled experiments: Five puzzling outcomes explained KDD'2012. 786--794.
[26]
Ron Kohavi, Alex Deng, Brian Frasca, Toby Walker, Ya Xu, and Nils Pohlmann . 2013. Online controlled experiments at large scale. In KDD'2013. 1168--1176.
[27]
R. Kohavi, A. Deng, R. Longbotham, and Y. Xu . 2014. Seven Rules of Thumb for Web Site Experimenters. KDD'2014.
[28]
Ron Kohavi, Randal M Henne, and Dan Sommerfield . 2007. Practical guide to controlled experiments on the web: listen to your customers not to the hippo KDD'2007. 959--967.
[29]
Ron Kohavi, Roger Longbotham, Dan Sommerfield, and Randal M Henne . 2009 b. Controlled experiments on the web: survey and practical guide. Data Min. Knowl. Discov. Vol. 18, 1 (2009), 140--181.
[30]
Ron Kohavi, David Messner, Seth Eliot, Juan Lavista Ferres, Randy Henne, Vignesh Kannappan, and Justin Wang . 2010. Tracking Users' Clicks and Submits: Tradeoffs between User Experience and Data Loss. (2010).
[31]
Stephen L Morgan and Christopher Winship . 2014. Counterfactuals and causal inference. Cambridge University Press.
[32]
Kirill Nikolaev, Alexey Drutsa, Ekaterina Gladkikh, Alexander Ulianov, Gleb Gusev, and Pavel Serdyukov . 2015. Extreme States Distribution Decomposition Method for Search Engine Online Evaluation KDD'2015. 845--854.
[33]
Eric T Peterson . 2004. Web analytics demystified: a marketer's guide to understanding how your web site affects your business. Ingram.
[34]
Alexey Poyarkov, Alexey Drutsa, Andrey Khalyavin, Gleb Gusev, and Pavel Serdyukov . 2016. Boosted Decision Tree Regression Adjustment for Variance Reduction in Online Controlled Experiments. In KDD'2016. 235--244.
[35]
Filip Radlinski, Madhu Kurup, and Thorsten Joachims . 2008. How does clickthrough data reflect retrieval quality? CIKM'2008. 43--52.
[36]
Kerry Rodden, Hilary Hutchinson, and Xin Fu . 2010. Measuring the user experience on a large scale: user-centered metrics for web applications CHI'2010. 2395--2398.
[37]
Tetsuya Sakai . 2006. Evaluating evaluation metrics based on the bootstrap SIGIR'2006. 525--532.
[38]
Yang Song, Xiaolin Shi, and Xin Fu . 2013. Evaluating and predicting user engagement change with degraded search relevance WWW'2013. 1213--1224.
[39]
Diane Tang, Ashish Agarwal, Deirdre O'Brien, and Mike Meyer . 2010. Overlapping experiment infrastructure: More, better, faster experimentation KDD'2010. 17--26.
[40]
Huizhi Xie and Juliette Aurisset . 2016. Improving the Sensitivity of Online Controlled Experiments: Case Studies at Netflix KDD'2016.
[41]
Ya Xu and Nanyu Chen . 2016. Evaluating Mobile Apps with A/B and Quasi A/B Tests KDD'2016.
[42]
Ya Xu, Nanyu Chen, Addrian Fernandez, Omar Sinno, and Anmol Bhasin . 2015. From infrastructure to culture: A/B testing challenges in large scale social networks KDD'2015.

Cited By

View all
  • (2025)Relationships Between Genetic Parameters in the Component Traits of a Ratio Trait and the Distribution and Heritability of Such Ratio TraitAnimal Science Journal10.1111/asj.7003196:1Online publication date: 22-Jan-2025
  • (2024)Multi-Objective Recommendation via Multivariate Policy LearningProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688132(712-721)Online publication date: 8-Oct-2024
  • (2024)Optimal Baseline Corrections for Off-Policy Contextual BanditsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688105(722-732)Online publication date: 8-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '18: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining
February 2018
821 pages
ISBN:9781450355810
DOI:10.1145/3159652
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 February 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. a/b test
  2. delta method
  3. directionality
  4. linearization
  5. non-user level metric
  6. online controlled experiment
  7. ratio oec
  8. sensitivity

Qualifiers

  • Research-article

Conference

WSDM 2018

Acceptance Rates

WSDM '18 Paper Acceptance Rate 81 of 514 submissions, 16%;
Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)29
  • Downloads (Last 6 weeks)4
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Relationships Between Genetic Parameters in the Component Traits of a Ratio Trait and the Distribution and Heritability of Such Ratio TraitAnimal Science Journal10.1111/asj.7003196:1Online publication date: 22-Jan-2025
  • (2024)Multi-Objective Recommendation via Multivariate Policy LearningProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688132(712-721)Online publication date: 8-Oct-2024
  • (2024)Optimal Baseline Corrections for Off-Policy Contextual BanditsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688105(722-732)Online publication date: 8-Oct-2024
  • (2024)Powerful A/B-Testing Metrics and Where to Find ThemProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688036(816-818)Online publication date: 8-Oct-2024
  • (2024)Learning Metrics that Maximise Power for Accelerated A/B-TestsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671512(5183-5193)Online publication date: 25-Aug-2024
  • (2024)A/B testingJournal of Systems and Software10.1016/j.jss.2024.112011211:COnline publication date: 2-Jul-2024
  • (2024)Variance Reduction in Ratio Metrics for Efficient Online ExperimentsAdvances in Information Retrieval10.1007/978-3-031-56069-9_34(292-297)Online publication date: 24-Mar-2024
  • (2021)On Post-selection Inference in A/B TestingProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining10.1145/3447548.3467129(2743-2752)Online publication date: 14-Aug-2021
  • (2020)Dealing with Ratio Metrics in A/B Testing at the Presence of Intra-user Correlation and SegmentsWeb Information Systems Engineering – WISE 202010.1007/978-3-030-62008-0_39(563-577)Online publication date: 21-Oct-2020
  • (2019)Effective Online Evaluation for Web SearchProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331378(1399-1400)Online publication date: 18-Jul-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media