Abstract
In the era of big data, high-dimensional data always arrive in streams, making timely and accurate decision necessary. It has become particularly important to rapidly and sequentially identify individuals whose behavior deviates from the norm. Aiming at identifying as many irregular behavioral patterns as possible, the authors develop a large-scale dynamic testing system in the framework of false discovery rate (FDR) control. By fully exploiting the sequential feature of datastreams, the authors propose a screening-assisted procedure that filters streams and then only tests streams that pass the filter at each time point. A data-driven optimal screening threshold is derived, giving the new method an edge over existing methods. Under some mild conditions on the dependence structure of datastreams, the FDR is shown to be strongly controlled and the suggested approach for determining screening thresholds is asymptotically optimal. Simulation studies show that the proposed method is both accurate and powerful, and a real-data example is used for illustrative purpose.
Similar content being viewed by others
References
Qiu P and Xiang D, Univariate dynamic screening system: An approach for identifying individuals with irregular longitudinal behavior, Technometrics, 2014, 56: 248–260.
Barras L, Scaillet O, and Wermers R, False discoveries in mutual fund performance: Measuring luck in estimated alphas, Journal of Finance, 2010, 65: 179–216.
Fama E F and French K R, Luck versus skill in the cross section of mutual fund returns, Journal of Finance, 2010, 65: 1915–1947.
Benjamini Y and Hochberg Y, Controlling the false discovery rate: A practical and powerful approach to multiple testing, J. R. Statist. Soc. B, 1995, 57: 289–300.
Storey J D and Tibshirani R, Statistical significance for genomewide studies, Proc. Natl. Acad. Sci., 2003, 100: 9440–9445.
Genovese C R, Lazar N A, and Nichols T, Thresholding of statistical maps in functional neuroimaging using the false discovery rate, Neuroimage, 2002, 15: 870–878.
Spiegelhalter D, Sherlaw-Johnson C, Bardsley M, et al., Statistical methods for healthcare regulation: Rating, screening, and surveillance (with discussion), J. R. Statist. Soc. A, 2012, 175: 1–47.
Sun W and Cai T T, Oracle and adaptive compound decision rules for false discovery rate control, J. Amer. Statist. Assoc., 2007, 102: 901–912.
Genovese C and Wasserman L, Operating characteristics and extensions of the false discovery rate procedure, J. R. Statist. Soc. B, 2002, 64: 499–517.
Mei Y, Efficient scalable schemes for monitoring a large number of data streams, Biometrika, 2010, 97: 419–433.
Xie Y and Siegmund D, Sequential multi-sensor change-point detection, Ann. Statist., 2013, 41: 670–692.
Fan J and Lü J, Sure independence screening for ultrahigh dimensional feature space, J. R. Statist. Soc. B, 2008, 70: 849–911.
Wasserman L and Roeder K, High-dimensional variable selection, Ann. Statist., 2003, 37: 2178–2201.
Bourgon R, Gentleman R, and Huber W, Independent filtering increases detection power for high-throughput experiments, Proc. Natl. Acad. Sci., 2010, 107: 9546–9551.
Sarkar S K, Chen J, and Guo W, Multiple testing in a two-stage adaptive design with combination tests controlling FDR, J. Amer. Statist. Assoc., 2013, 108: 1385–1401.
Benjamini Y and Yekutieli D, The control of the false discovery rate in multiple testing under dependency, Ann. Statist., 2001, 29: 1165–1188.
Storey J D, Taylor J E, and Siegmund D, Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach, J. R. Statist. Soc. B, 2004, 66: 187–205.
Lin D Y, Evaluating statistical significance in two-stage genomewide association studies, Am. J. Hum. Genet., 2006, 78: 505–509.
Benjamini Y and Heller R, False discovery rates for spatial signals, J. Amer. Statist. Assoc., 2007, 102: 1272–1281.
Yekutieli D, Hierarchical false discovery rate controlling methodology, J. Amer. Statist. Assoc., 2008, 103: 309–316.
Meinshausen N, Hierarchical testing of variable importance, Biometrika, 2008, 95: 265–278.
Goeman J J and Solari A, The sequential rejection principle of familywise error control, Ann. Statist., 2010, 38: 3782–3810.
Marshall C, Best N, Bottle A, et al., Statistical issues in the prospective monitoring of health outcomes across multiple units, J. R. Statist. Soc. A, 2004, 167: 541–559.
Grigg O A, Spiegelhalter D J, and Jones H E, Local and marginal control charts applied to methicillinresistant staphylococcus aureus bacteraemia reports in uk acute national health service trusts, J. R. Statist. Soc. A, 2009, 172: 49–66.
Gandy A and Lau F D H, Non-restarting cumulative sum charts and control of the false discovery rate, Biometrika, 2013, 100: 261–268.
Efron B, Size, power and false discovery rates, Ann. Statist., 2007, 35: 1351–1377.
Lai T L, Control charts based on weighted sums, Ann. Statist., 1974, 2: 134–147.
Runger G C, and Prabhu S S, A markov chain model for the multivariate exponentially weighted moving averages control chart, J. Amer. Statist. Assoc., 1996, 91: 1701–1706.
Müller H G, Change-points in nonparametric regression analysis, Ann. Statist., 1992, 20: 737–761.
Wang H, Lo S H, Zheng T, et al., Interaction-based feature selection and classification for high-dimensional biological data, Bioinformatics, 2012, 28: 2834–2842.
Cai T, Sun W, and Wang W, CARS: Covariate assisted ranking and screening for large-scale two-sample inference, Manuscript, 2016.
Du L and Zhang C M, Single-index modulated multiple testing, Ann. Statist., 2014, 42: 1262–1311.
Fan J, Han X, and Gu W, Estimating false discovery proportion under arbitrary covariance dependence, J. Amer. Statist. Assoc., 2012, 107: 1019–1035.
Zou C, Peng L, Feng L, et al., Multivariate-sign-based high-dimensional tests for sphericity, Biometrika, 2014, 101: 229–236.
Efron B and Tibshirani R, Empirical Bayes methods and false discovery rates for microarrays, Genetic Epidemiology, 2002, 23: 70–86.
Lucas J M and Saccucci M S, Exponentially weighted moving average control scheme: Properties and enhancements, Technometrics, 1990, 32: 1–29.
Hart J D, Nonparametric Smoothing and Lack-of-Fit Tests, Springer, New York, 1997.
Horowitz J L and Spokoiny V G, An adaptive, rate-optimal test of a parametric mean-regression model against a nonparametric alternative, Econometrica, 2011, 69: 599–631.
Shen X, Zou C, Jiang W, et al., Monitoring poisson count data with probability control limits when sample sizes are time varying, Naval Research Logistic, 2013, 60: 625–636.
Benjamini Y and Hochberg Y, On the adaptive control of the false discovery rate in multiple testing with independent statistics, J. Educ. Behav. Statist., 2000, 25: 60–83.
Genovese C R, Roeder K, and Wasserman L, False discovery control with p-value weighting, Biometrika, 2006, 93: 509–524.
Roeder K, Bacanu S A, Wasserman L, et al., Using linkage genome scans to improve power of association in genome scans, The American Journal of Human Genetics, 2006, 78: 243–252.
Lyons R, Strong laws of large numbers for weakly correlated random variables, Michigan Math. J, 1988, 35: 353–359.
Durrett R, Probability: Theory and Examples, Cambridge University Press, Cambridge, 2010.
Hu J X, Zhao H, and Zhou H, False discovery rate control with groups, J. Amer. Statist. Assoc., 2010, 105: 1215–1227.
Serfling R J, Approximation Theorems of Mathematical Statistics, Wiley, New York, 1980.
Masry E and Fan J Q, Local polynomial estimation of regression functions for mixing processes, Scand. J. Stat., 1997, 24: 165–179.
Bosq D, Nonparametric Statistics for Stochastic Processes: Estimation and Prediction. Lecture Notes in Statistics, Vol 110, Springer, Berlin, 1998.
Author information
Authors and Affiliations
Corresponding authors
Additional information
This research was supported by the National Natural Science Foundation of China under Grant Nos. 11771332, 11771220, 11671178, 11925106, 11971247, and the National Science Foundation of Tianjin under Grant Nos. 18JCJQJC46000, 18ZXZNGX00140 and the 111Project B20016. Mushtaq was also supported by the Fundamental Research Funds for the Central Universities.
Rights and permissions
About this article
Cite this article
Mushtaq, I., Zhou, Q. & Zi, X. Screening-Assisted Dynamic Multiple Testing with False Discovery Rate Control. J Syst Sci Complex 36, 716–754 (2023). https://doi.org/10.1007/s11424-023-1143-y
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11424-023-1143-y