ABSTRACT
We present a method for implementing shrinkage of treatment effect estimators, and hence improving their precision, via experiment splitting. Experiment splitting reduces shrinkage to a standard prediction problem. The method makes minimal distributional assumptions, and allows for the degree of shrinkage in one metric to depend on other metrics. Using a dataset of 226 Facebook News Feed A/B tests, we show that a lasso estimator based on repeated experiment splitting has a 44% lower mean squared predictive error than the conventional, unshrunk treatment effect estimator, a 18% lower mean squared predictive error than the James-Stein shrinkage estimator, and would lead to substantially improved launch decisions over both.
- Hirotogu Akaike. 1998. Information theory and an extension of the maximum likelihood principle. In Selected Papers of Hirotugu Akaike. Springer, 199-213.Google Scholar
- Michael L Anderson and Jeremy Magruder. 2017. Split-sample strategies for avoiding false discoveries. Technical Report. National Bureau of Economic Research.Google Scholar
- Susan Athey and Guido Imbens. 2016. Recursive partitioning for heterogeneous causal effects. Proceedings of the National Academy of Sciences 113, 27(2016), 7353-7360.Google ScholarCross Ref
- Susan Athey, Julie Tibshirani, and Stefan Wager. 2016. Generalized random forests. arXiv preprint arXiv:1610.01271(2016).Google Scholar
- Susan Athey and Stefan Wager. 2017. Efficient policy learning. arXiv preprint arXiv:1702.02896(2017).Google Scholar
- Eduardo M Azevedo, Alex Deng, Jose Luis Montiel Olea, Justin Rao, and E Glen Weyl. 2018. The A/B Testing Problem. In Proceedings of the 2018 ACM Conference on Economics and Computation. ACM, 461-462. Google ScholarDigital Library
- Gerard Biau. 2012. Analysis of a random forests model. Journal of Machine Learning Research 13, Apr (2012), 1063-1095. Google ScholarDigital Library
- Thomas Blake and Dominic Coey. 2014. Why marketplace experimentation is harder than it seems: The role of test-control interference. In Proceedings of the fifteenth ACM conference on Economics and computation. ACM, 567-582. Google ScholarDigital Library
- Lawrence D Brown. 2008. In-season prediction of batting averages: A field test of empirical Bayes and Bayes methodologies. The Annals of Applied Statistics(2008), 113-152.Google Scholar
- Lawrence D Brown and Eitan Greenshtein. 2009. Nonparametric empirical Bayes and compound decision approaches to estimation of a high-dimensional vector of normal means. The Annals of Statistics(2009), 1685-1704.Google Scholar
- Bradley P Carlin and Thomas A Louis. 2010. Bayes and empirical Bayes methods for data analysis. Chapman and Hall/CRC. Google ScholarDigital Library
- George Casella. 1985. An introduction to empirical Bayes data analysis. The American Statistician 39, 2 (1985), 83-87.Google Scholar
- Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. 2018. Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal 21, 1 (2018), C1-C68.Google ScholarCross Ref
- Victor Chernozhukov, Whitney Newey, and James Robins. 2018. Double/de-biased machine learning using regularized Riesz representers. arXiv preprint arXiv:1802.08667(2018).Google Scholar
- Alex Deng. 2015. Objective bayesian two sample hypothesis testing for online controlled experiments. In Proceedings of the 24th International Conference on World Wide Web. ACM, 923-928. Google ScholarDigital Library
- Bradley Efron. 2011. Tweedie's formula and selection bias. J. Amer. Statist. Assoc. 106, 496 (2011), 1602-1614.Google ScholarCross Ref
- Bradley Efron. 2012. Large-scale inference: empirical Bayes methods for estimation, testing, and prediction. Cambridge University Press.Google Scholar
- Bradley Efron and Trevor Hastie. 2016. Computer age statistical inference. Cambridge University Press. Google ScholarDigital Library
- Bradley Efron and Carl Morris. 1973. Stein's estimation rule and its competitors-an empirical Bayes approach. J. Amer. Statist. Assoc. 68, 341 (1973), 117-130.Google Scholar
- Bradley Efron and Carl Morris. 1975. Data analysis using Stein's estimator and its generalizations. J. Amer. Statist. Assoc. 70, 350 (1975), 311-319.Google ScholarCross Ref
- Bradley Efron, Carl Morris, 1976. Multivariate empirical Bayes and estimation of covariance matrices. The Annals of Statistics 4, 1 (1976), 22-32.Google ScholarCross Ref
- Bradley Efron, Robert Tibshirani, John D Storey, and Virginia Tusher. 2001. Empirical Bayes analysis of a microarray experiment. Journal of the American statistical association 96, 456(2001), 1151-1160.Google ScholarCross Ref
- Marcel Fafchamps and Julien Labonne. 2017. Using Split Samples to Improve Inference on Causal Effects. Political Analysis 25, 4 (2017), 465-482.Google ScholarCross Ref
- Jerome Friedman, Trevor Hastie, Holger Höfling, Robert Tibshirani, 2007. Pathwise coordinate optimization. The Annals of Applied Statistics 1, 2 (2007), 302-332.Google ScholarCross Ref
- F. Hayashi. 2011. Econometrics. Princeton University Press.Google Scholar
- William James and Charles Stein. 1961. Estimation with quadratic loss. In Proceedings of the fourth Berkeley symposium on mathematical statistics and probability, Vol. 1. 361-379.Google Scholar
- Ron Kohavi, Alex Deng, Brian Frasca, Toby Walker, Ya Xu, and Nils Pohlmann. 2013. Online controlled experiments at large scale. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1168-1176. Google ScholarDigital Library
- Colin L Mallows. 1973. Some comments on Cp. Technometrics 15, 4 (1973), 661-675.Google Scholar
- Whitney K Newey and James R Robins. 2018. Cross-fitting and fast remainder rates for semiparametric estimation. arXiv preprint arXiv:1801.09138(2018).Google Scholar
- Alexander Peysakhovich and Dean Eckles. 2018. Learning Causal Effects From Many Randomized Experiments Using Regularized Instrumental Variables. In Proceedings of the 2018 World Wide Web Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 699-707. Google ScholarDigital Library
- Herbert Robbins. 1956. An Empirical Bayes Approach to Statistics. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics. The Regents of the University of California.Google ScholarCross Ref
- Charles M Stein. 1956. Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In Proc. Third Berkeley Symp. Math. Statist. Probab., 1956, Vol. 1. Univ. California Press, 197-206.Google ScholarCross Ref
- Charles M Stein. 1962. Confidence sets for the mean of a multivariate normal distribution. Journal of the Royal Statistical Society. Series B (Methodological) (1962), 265-296.Google Scholar
- Charles M Stein. 1981. Estimation of the mean of a multivariate normal distribution. The annals of Statistics(1981), 1135-1151.Google Scholar
- William E Strawderman. 1971. Proper Bayes minimax estimators of the multivariate normal mean. The Annals of Mathematical Statistics 42, 1 (1971), 385-388.Google ScholarCross Ref
- Mark J Van der Laan and Sherri Rose. 2011. Targeted learning: causal inference for observational and experimental data. Springer Science & Business Media.Google Scholar
- Stefan Wager and Susan Athey. 2017. Estimation and inference of heterogeneous treatment effects using random forests. J. Amer. Statist. Assoc.just-accepted (2017).Google Scholar
- Stefan Wager, Wenfei Du, Jonathan Taylor, and Robert J Tibshirani. 2016. High-dimensional regression adjustments in randomized experiments. Proceedings of the National Academy of Sciences 113, 45(2016), 12673-12678.Google ScholarCross Ref
Recommendations
Mean likelihood estimators
The use of Mathematica in deriving mean likelihood estimators is discussed. Comparisons are made between the mean likelihood estimator, the maximum likelihood estimator, and the Bayes estimator based on a Jeffrey's noninformative prior. These estimators ...
James-Stein type estimators of variances
In this paper we propose James-Stein type estimators for variances raised to a fixed power by shrinking individual variance estimators towards the arithmetic mean. We derive and estimate the optimal choices of shrinkage parameters under both the squared ...
Resampled Regenerative Estimators
Special Issue on Don IglehartWe discuss some estimators for simulations of processes having multiple regenerative sequences. The estimators are obtained by resampling trajectories without and with replacement, which correspond to a type of U-statistic and a type of V-statistic, ...
Comments