Screening-Assisted Dynamic Multiple Testing with False Discovery Rate Control

Mushtaq, Iram; Zhou, Qin; Zi, Xuemin

doi:10.1007/s11424-023-1143-y

Screening-Assisted Dynamic Multiple Testing with False Discovery Rate Control

Published: 18 February 2023

Volume 36, pages 716–754, (2023)
Cite this article

Journal of Systems Science and Complexity Aims and scope Submit manuscript

Iram Mushtaq¹,
Qin Zhou² &
Xuemin Zi³

93 Accesses
Explore all metrics

Abstract

In the era of big data, high-dimensional data always arrive in streams, making timely and accurate decision necessary. It has become particularly important to rapidly and sequentially identify individuals whose behavior deviates from the norm. Aiming at identifying as many irregular behavioral patterns as possible, the authors develop a large-scale dynamic testing system in the framework of false discovery rate (FDR) control. By fully exploiting the sequential feature of datastreams, the authors propose a screening-assisted procedure that filters streams and then only tests streams that pass the filter at each time point. A data-driven optimal screening threshold is derived, giving the new method an edge over existing methods. Under some mild conditions on the dependence structure of datastreams, the FDR is shown to be strongly controlled and the suggested approach for determining screening thresholds is asymptotically optimal. Simulation studies show that the proposed method is both accurate and powerful, and a real-data example is used for illustrative purpose.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Large-scale dependent multiple testing via hidden semi-Markov models

Article 31 May 2023

On the power of some sequential multiple testing procedures

Article 02 April 2020

Test for high-dimensional outliers with principal component analysis

Article Open access 04 June 2024

References

Qiu P and Xiang D, Univariate dynamic screening system: An approach for identifying individuals with irregular longitudinal behavior, Technometrics, 2014, 56: 248–260.
Article MathSciNet Google Scholar
Barras L, Scaillet O, and Wermers R, False discoveries in mutual fund performance: Measuring luck in estimated alphas, Journal of Finance, 2010, 65: 179–216.
Article Google Scholar
Fama E F and French K R, Luck versus skill in the cross section of mutual fund returns, Journal of Finance, 2010, 65: 1915–1947.
Article Google Scholar
Benjamini Y and Hochberg Y, Controlling the false discovery rate: A practical and powerful approach to multiple testing, J. R. Statist. Soc. B, 1995, 57: 289–300.
MathSciNet MATH Google Scholar
Storey J D and Tibshirani R, Statistical significance for genomewide studies, Proc. Natl. Acad. Sci., 2003, 100: 9440–9445.
Article MathSciNet MATH Google Scholar
Genovese C R, Lazar N A, and Nichols T, Thresholding of statistical maps in functional neuroimaging using the false discovery rate, Neuroimage, 2002, 15: 870–878.
Article Google Scholar
Spiegelhalter D, Sherlaw-Johnson C, Bardsley M, et al., Statistical methods for healthcare regulation: Rating, screening, and surveillance (with discussion), J. R. Statist. Soc. A, 2012, 175: 1–47.
Article Google Scholar
Sun W and Cai T T, Oracle and adaptive compound decision rules for false discovery rate control, J. Amer. Statist. Assoc., 2007, 102: 901–912.
Article MathSciNet MATH Google Scholar
Genovese C and Wasserman L, Operating characteristics and extensions of the false discovery rate procedure, J. R. Statist. Soc. B, 2002, 64: 499–517.
Article MathSciNet MATH Google Scholar
Mei Y, Efficient scalable schemes for monitoring a large number of data streams, Biometrika, 2010, 97: 419–433.
Article MathSciNet MATH Google Scholar
Xie Y and Siegmund D, Sequential multi-sensor change-point detection, Ann. Statist., 2013, 41: 670–692.
Article MathSciNet MATH Google Scholar
Fan J and Lü J, Sure independence screening for ultrahigh dimensional feature space, J. R. Statist. Soc. B, 2008, 70: 849–911.
Article MathSciNet MATH Google Scholar
Wasserman L and Roeder K, High-dimensional variable selection, Ann. Statist., 2003, 37: 2178–2201.
MathSciNet MATH Google Scholar
Bourgon R, Gentleman R, and Huber W, Independent filtering increases detection power for high-throughput experiments, Proc. Natl. Acad. Sci., 2010, 107: 9546–9551.
Article Google Scholar
Sarkar S K, Chen J, and Guo W, Multiple testing in a two-stage adaptive design with combination tests controlling FDR, J. Amer. Statist. Assoc., 2013, 108: 1385–1401.
Article MathSciNet MATH Google Scholar
Benjamini Y and Yekutieli D, The control of the false discovery rate in multiple testing under dependency, Ann. Statist., 2001, 29: 1165–1188.
Article MathSciNet MATH Google Scholar
Storey J D, Taylor J E, and Siegmund D, Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach, J. R. Statist. Soc. B, 2004, 66: 187–205.
Article MathSciNet MATH Google Scholar
Lin D Y, Evaluating statistical significance in two-stage genomewide association studies, Am. J. Hum. Genet., 2006, 78: 505–509.
Article Google Scholar
Benjamini Y and Heller R, False discovery rates for spatial signals, J. Amer. Statist. Assoc., 2007, 102: 1272–1281.
Article MathSciNet MATH Google Scholar
Yekutieli D, Hierarchical false discovery rate controlling methodology, J. Amer. Statist. Assoc., 2008, 103: 309–316.
Article MathSciNet MATH Google Scholar
Meinshausen N, Hierarchical testing of variable importance, Biometrika, 2008, 95: 265–278.
Article MathSciNet MATH Google Scholar
Goeman J J and Solari A, The sequential rejection principle of familywise error control, Ann. Statist., 2010, 38: 3782–3810.
Article MathSciNet MATH Google Scholar
Marshall C, Best N, Bottle A, et al., Statistical issues in the prospective monitoring of health outcomes across multiple units, J. R. Statist. Soc. A, 2004, 167: 541–559.
Article MathSciNet Google Scholar
Grigg O A, Spiegelhalter D J, and Jones H E, Local and marginal control charts applied to methicillinresistant staphylococcus aureus bacteraemia reports in uk acute national health service trusts, J. R. Statist. Soc. A, 2009, 172: 49–66.
Article Google Scholar
Gandy A and Lau F D H, Non-restarting cumulative sum charts and control of the false discovery rate, Biometrika, 2013, 100: 261–268.
Article MathSciNet MATH Google Scholar
Efron B, Size, power and false discovery rates, Ann. Statist., 2007, 35: 1351–1377.
Article MathSciNet MATH Google Scholar
Lai T L, Control charts based on weighted sums, Ann. Statist., 1974, 2: 134–147.
Article MathSciNet MATH Google Scholar
Runger G C, and Prabhu S S, A markov chain model for the multivariate exponentially weighted moving averages control chart, J. Amer. Statist. Assoc., 1996, 91: 1701–1706.
Article MathSciNet MATH Google Scholar
Müller H G, Change-points in nonparametric regression analysis, Ann. Statist., 1992, 20: 737–761.
Article MathSciNet MATH Google Scholar
Wang H, Lo S H, Zheng T, et al., Interaction-based feature selection and classification for high-dimensional biological data, Bioinformatics, 2012, 28: 2834–2842.
Article Google Scholar
Cai T, Sun W, and Wang W, CARS: Covariate assisted ranking and screening for large-scale two-sample inference, Manuscript, 2016.
Du L and Zhang C M, Single-index modulated multiple testing, Ann. Statist., 2014, 42: 1262–1311.
Article MathSciNet MATH Google Scholar
Fan J, Han X, and Gu W, Estimating false discovery proportion under arbitrary covariance dependence, J. Amer. Statist. Assoc., 2012, 107: 1019–1035.
Article MathSciNet MATH Google Scholar
Zou C, Peng L, Feng L, et al., Multivariate-sign-based high-dimensional tests for sphericity, Biometrika, 2014, 101: 229–236.
Article MathSciNet MATH Google Scholar
Efron B and Tibshirani R, Empirical Bayes methods and false discovery rates for microarrays, Genetic Epidemiology, 2002, 23: 70–86.
Article Google Scholar
Lucas J M and Saccucci M S, Exponentially weighted moving average control scheme: Properties and enhancements, Technometrics, 1990, 32: 1–29.
Article MathSciNet Google Scholar
Hart J D, Nonparametric Smoothing and Lack-of-Fit Tests, Springer, New York, 1997.
Book MATH Google Scholar
Horowitz J L and Spokoiny V G, An adaptive, rate-optimal test of a parametric mean-regression model against a nonparametric alternative, Econometrica, 2011, 69: 599–631.
Article MathSciNet MATH Google Scholar
Shen X, Zou C, Jiang W, et al., Monitoring poisson count data with probability control limits when sample sizes are time varying, Naval Research Logistic, 2013, 60: 625–636.
Article MathSciNet MATH Google Scholar
Benjamini Y and Hochberg Y, On the adaptive control of the false discovery rate in multiple testing with independent statistics, J. Educ. Behav. Statist., 2000, 25: 60–83.
Article Google Scholar
Genovese C R, Roeder K, and Wasserman L, False discovery control with p-value weighting, Biometrika, 2006, 93: 509–524.
Article MathSciNet MATH Google Scholar
Roeder K, Bacanu S A, Wasserman L, et al., Using linkage genome scans to improve power of association in genome scans, The American Journal of Human Genetics, 2006, 78: 243–252.
Article Google Scholar
Lyons R, Strong laws of large numbers for weakly correlated random variables, Michigan Math. J, 1988, 35: 353–359.
Article MathSciNet MATH Google Scholar
Durrett R, Probability: Theory and Examples, Cambridge University Press, Cambridge, 2010.
Book MATH Google Scholar
Hu J X, Zhao H, and Zhou H, False discovery rate control with groups, J. Amer. Statist. Assoc., 2010, 105: 1215–1227.
Article MathSciNet MATH Google Scholar
Serfling R J, Approximation Theorems of Mathematical Statistics, Wiley, New York, 1980.
Book MATH Google Scholar
Masry E and Fan J Q, Local polynomial estimation of regression functions for mixing processes, Scand. J. Stat., 1997, 24: 165–179.
Article MathSciNet MATH Google Scholar
Bosq D, Nonparametric Statistics for Stochastic Processes: Estimation and Prediction. Lecture Notes in Statistics, Vol 110, Springer, Berlin, 1998.
MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, 300071, China
Iram Mushtaq
School of Mathematics and Statistics, Jiangsu Normal University, Xuzhou, 221116, China
Qin Zhou
School of Science, Tianjin University of Technology and Education, Tianjin, 300350, China
Xuemin Zi

Authors

Iram Mushtaq
View author publications
You can also search for this author in PubMed Google Scholar
Qin Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Xuemin Zi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Iram Mushtaq, Qin Zhou or Xuemin Zi.

Additional information

This research was supported by the National Natural Science Foundation of China under Grant Nos. 11771332, 11771220, 11671178, 11925106, 11971247, and the National Science Foundation of Tianjin under Grant Nos. 18JCJQJC46000, 18ZXZNGX00140 and the 111Project B20016. Mushtaq was also supported by the Fundamental Research Funds for the Central Universities.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mushtaq, I., Zhou, Q. & Zi, X. Screening-Assisted Dynamic Multiple Testing with False Discovery Rate Control. J Syst Sci Complex 36, 716–754 (2023). https://doi.org/10.1007/s11424-023-1143-y

Download citation

Received: 06 May 2021
Revised: 23 August 2021
Published: 18 February 2023
Issue Date: April 2023
DOI: https://doi.org/10.1007/s11424-023-1143-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Screening-Assisted Dynamic Multiple Testing with False Discovery Rate Control

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Large-scale dependent multiple testing via hidden semi-Markov models

On the power of some sequential multiple testing procedures

Test for high-dimensional outliers with principal component analysis

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Screening-Assisted Dynamic Multiple Testing with False Discovery Rate Control

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Large-scale dependent multiple testing via hidden semi-Markov models

On the power of some sequential multiple testing procedures

Test for high-dimensional outliers with principal component analysis

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation