Skip to main content
Log in

Screening-Assisted Dynamic Multiple Testing with False Discovery Rate Control

  • Published:
Journal of Systems Science and Complexity Aims and scope Submit manuscript

Abstract

In the era of big data, high-dimensional data always arrive in streams, making timely and accurate decision necessary. It has become particularly important to rapidly and sequentially identify individuals whose behavior deviates from the norm. Aiming at identifying as many irregular behavioral patterns as possible, the authors develop a large-scale dynamic testing system in the framework of false discovery rate (FDR) control. By fully exploiting the sequential feature of datastreams, the authors propose a screening-assisted procedure that filters streams and then only tests streams that pass the filter at each time point. A data-driven optimal screening threshold is derived, giving the new method an edge over existing methods. Under some mild conditions on the dependence structure of datastreams, the FDR is shown to be strongly controlled and the suggested approach for determining screening thresholds is asymptotically optimal. Simulation studies show that the proposed method is both accurate and powerful, and a real-data example is used for illustrative purpose.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Qiu P and Xiang D, Univariate dynamic screening system: An approach for identifying individuals with irregular longitudinal behavior, Technometrics, 2014, 56: 248–260.

    Article  MathSciNet  Google Scholar 

  2. Barras L, Scaillet O, and Wermers R, False discoveries in mutual fund performance: Measuring luck in estimated alphas, Journal of Finance, 2010, 65: 179–216.

    Article  Google Scholar 

  3. Fama E F and French K R, Luck versus skill in the cross section of mutual fund returns, Journal of Finance, 2010, 65: 1915–1947.

    Article  Google Scholar 

  4. Benjamini Y and Hochberg Y, Controlling the false discovery rate: A practical and powerful approach to multiple testing, J. R. Statist. Soc. B, 1995, 57: 289–300.

    MathSciNet  MATH  Google Scholar 

  5. Storey J D and Tibshirani R, Statistical significance for genomewide studies, Proc. Natl. Acad. Sci., 2003, 100: 9440–9445.

    Article  MathSciNet  MATH  Google Scholar 

  6. Genovese C R, Lazar N A, and Nichols T, Thresholding of statistical maps in functional neuroimaging using the false discovery rate, Neuroimage, 2002, 15: 870–878.

    Article  Google Scholar 

  7. Spiegelhalter D, Sherlaw-Johnson C, Bardsley M, et al., Statistical methods for healthcare regulation: Rating, screening, and surveillance (with discussion), J. R. Statist. Soc. A, 2012, 175: 1–47.

    Article  Google Scholar 

  8. Sun W and Cai T T, Oracle and adaptive compound decision rules for false discovery rate control, J. Amer. Statist. Assoc., 2007, 102: 901–912.

    Article  MathSciNet  MATH  Google Scholar 

  9. Genovese C and Wasserman L, Operating characteristics and extensions of the false discovery rate procedure, J. R. Statist. Soc. B, 2002, 64: 499–517.

    Article  MathSciNet  MATH  Google Scholar 

  10. Mei Y, Efficient scalable schemes for monitoring a large number of data streams, Biometrika, 2010, 97: 419–433.

    Article  MathSciNet  MATH  Google Scholar 

  11. Xie Y and Siegmund D, Sequential multi-sensor change-point detection, Ann. Statist., 2013, 41: 670–692.

    Article  MathSciNet  MATH  Google Scholar 

  12. Fan J and Lü J, Sure independence screening for ultrahigh dimensional feature space, J. R. Statist. Soc. B, 2008, 70: 849–911.

    Article  MathSciNet  MATH  Google Scholar 

  13. Wasserman L and Roeder K, High-dimensional variable selection, Ann. Statist., 2003, 37: 2178–2201.

    MathSciNet  MATH  Google Scholar 

  14. Bourgon R, Gentleman R, and Huber W, Independent filtering increases detection power for high-throughput experiments, Proc. Natl. Acad. Sci., 2010, 107: 9546–9551.

    Article  Google Scholar 

  15. Sarkar S K, Chen J, and Guo W, Multiple testing in a two-stage adaptive design with combination tests controlling FDR, J. Amer. Statist. Assoc., 2013, 108: 1385–1401.

    Article  MathSciNet  MATH  Google Scholar 

  16. Benjamini Y and Yekutieli D, The control of the false discovery rate in multiple testing under dependency, Ann. Statist., 2001, 29: 1165–1188.

    Article  MathSciNet  MATH  Google Scholar 

  17. Storey J D, Taylor J E, and Siegmund D, Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach, J. R. Statist. Soc. B, 2004, 66: 187–205.

    Article  MathSciNet  MATH  Google Scholar 

  18. Lin D Y, Evaluating statistical significance in two-stage genomewide association studies, Am. J. Hum. Genet., 2006, 78: 505–509.

    Article  Google Scholar 

  19. Benjamini Y and Heller R, False discovery rates for spatial signals, J. Amer. Statist. Assoc., 2007, 102: 1272–1281.

    Article  MathSciNet  MATH  Google Scholar 

  20. Yekutieli D, Hierarchical false discovery rate controlling methodology, J. Amer. Statist. Assoc., 2008, 103: 309–316.

    Article  MathSciNet  MATH  Google Scholar 

  21. Meinshausen N, Hierarchical testing of variable importance, Biometrika, 2008, 95: 265–278.

    Article  MathSciNet  MATH  Google Scholar 

  22. Goeman J J and Solari A, The sequential rejection principle of familywise error control, Ann. Statist., 2010, 38: 3782–3810.

    Article  MathSciNet  MATH  Google Scholar 

  23. Marshall C, Best N, Bottle A, et al., Statistical issues in the prospective monitoring of health outcomes across multiple units, J. R. Statist. Soc. A, 2004, 167: 541–559.

    Article  MathSciNet  Google Scholar 

  24. Grigg O A, Spiegelhalter D J, and Jones H E, Local and marginal control charts applied to methicillinresistant staphylococcus aureus bacteraemia reports in uk acute national health service trusts, J. R. Statist. Soc. A, 2009, 172: 49–66.

    Article  Google Scholar 

  25. Gandy A and Lau F D H, Non-restarting cumulative sum charts and control of the false discovery rate, Biometrika, 2013, 100: 261–268.

    Article  MathSciNet  MATH  Google Scholar 

  26. Efron B, Size, power and false discovery rates, Ann. Statist., 2007, 35: 1351–1377.

    Article  MathSciNet  MATH  Google Scholar 

  27. Lai T L, Control charts based on weighted sums, Ann. Statist., 1974, 2: 134–147.

    Article  MathSciNet  MATH  Google Scholar 

  28. Runger G C, and Prabhu S S, A markov chain model for the multivariate exponentially weighted moving averages control chart, J. Amer. Statist. Assoc., 1996, 91: 1701–1706.

    Article  MathSciNet  MATH  Google Scholar 

  29. Müller H G, Change-points in nonparametric regression analysis, Ann. Statist., 1992, 20: 737–761.

    Article  MathSciNet  MATH  Google Scholar 

  30. Wang H, Lo S H, Zheng T, et al., Interaction-based feature selection and classification for high-dimensional biological data, Bioinformatics, 2012, 28: 2834–2842.

    Article  Google Scholar 

  31. Cai T, Sun W, and Wang W, CARS: Covariate assisted ranking and screening for large-scale two-sample inference, Manuscript, 2016.

  32. Du L and Zhang C M, Single-index modulated multiple testing, Ann. Statist., 2014, 42: 1262–1311.

    Article  MathSciNet  MATH  Google Scholar 

  33. Fan J, Han X, and Gu W, Estimating false discovery proportion under arbitrary covariance dependence, J. Amer. Statist. Assoc., 2012, 107: 1019–1035.

    Article  MathSciNet  MATH  Google Scholar 

  34. Zou C, Peng L, Feng L, et al., Multivariate-sign-based high-dimensional tests for sphericity, Biometrika, 2014, 101: 229–236.

    Article  MathSciNet  MATH  Google Scholar 

  35. Efron B and Tibshirani R, Empirical Bayes methods and false discovery rates for microarrays, Genetic Epidemiology, 2002, 23: 70–86.

    Article  Google Scholar 

  36. Lucas J M and Saccucci M S, Exponentially weighted moving average control scheme: Properties and enhancements, Technometrics, 1990, 32: 1–29.

    Article  MathSciNet  Google Scholar 

  37. Hart J D, Nonparametric Smoothing and Lack-of-Fit Tests, Springer, New York, 1997.

    Book  MATH  Google Scholar 

  38. Horowitz J L and Spokoiny V G, An adaptive, rate-optimal test of a parametric mean-regression model against a nonparametric alternative, Econometrica, 2011, 69: 599–631.

    Article  MathSciNet  MATH  Google Scholar 

  39. Shen X, Zou C, Jiang W, et al., Monitoring poisson count data with probability control limits when sample sizes are time varying, Naval Research Logistic, 2013, 60: 625–636.

    Article  MathSciNet  MATH  Google Scholar 

  40. Benjamini Y and Hochberg Y, On the adaptive control of the false discovery rate in multiple testing with independent statistics, J. Educ. Behav. Statist., 2000, 25: 60–83.

    Article  Google Scholar 

  41. Genovese C R, Roeder K, and Wasserman L, False discovery control with p-value weighting, Biometrika, 2006, 93: 509–524.

    Article  MathSciNet  MATH  Google Scholar 

  42. Roeder K, Bacanu S A, Wasserman L, et al., Using linkage genome scans to improve power of association in genome scans, The American Journal of Human Genetics, 2006, 78: 243–252.

    Article  Google Scholar 

  43. Lyons R, Strong laws of large numbers for weakly correlated random variables, Michigan Math. J, 1988, 35: 353–359.

    Article  MathSciNet  MATH  Google Scholar 

  44. Durrett R, Probability: Theory and Examples, Cambridge University Press, Cambridge, 2010.

    Book  MATH  Google Scholar 

  45. Hu J X, Zhao H, and Zhou H, False discovery rate control with groups, J. Amer. Statist. Assoc., 2010, 105: 1215–1227.

    Article  MathSciNet  MATH  Google Scholar 

  46. Serfling R J, Approximation Theorems of Mathematical Statistics, Wiley, New York, 1980.

    Book  MATH  Google Scholar 

  47. Masry E and Fan J Q, Local polynomial estimation of regression functions for mixing processes, Scand. J. Stat., 1997, 24: 165–179.

    Article  MathSciNet  MATH  Google Scholar 

  48. Bosq D, Nonparametric Statistics for Stochastic Processes: Estimation and Prediction. Lecture Notes in Statistics, Vol 110, Springer, Berlin, 1998.

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Iram Mushtaq, Qin Zhou or Xuemin Zi.

Additional information

This research was supported by the National Natural Science Foundation of China under Grant Nos. 11771332, 11771220, 11671178, 11925106, 11971247, and the National Science Foundation of Tianjin under Grant Nos. 18JCJQJC46000, 18ZXZNGX00140 and the 111Project B20016. Mushtaq was also supported by the Fundamental Research Funds for the Central Universities.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mushtaq, I., Zhou, Q. & Zi, X. Screening-Assisted Dynamic Multiple Testing with False Discovery Rate Control. J Syst Sci Complex 36, 716–754 (2023). https://doi.org/10.1007/s11424-023-1143-y

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11424-023-1143-y

Keywords

Navigation