Abstract
Beta-binomial sequential goodness-of-fit (or BB-SGoF) method for multiple testing has been recently proposed as a suitable modification of the sequential goodness-of-fit (SGoF) multiple testing method when the tests are correlated in blocks. In this paper we investigate for the first time the power, the FDR and the conservativeness of BB-SGoF in an intensive Monte Carlo simulation study. Important features such as automatic selection of the number of existing blocks and preliminary testing for independence are explored. Our study reveals that (a) BB-SGoF method roughly maintains the properties of original SGoF in the dependent case, reporting a small value for the probability that the number of false positives exceeds the number of false negatives with p value below \(\gamma \); (b) BB-SGoF weakly controls for FDR even when the beta-binomial model is violated and the number of blocks \(k\) is unknown; and that (c) the loss of power of the automatic selector for the number of blocks relative to the benchmark method which uses the true \(k\) varies depending on the proportion and the type (strong, intermediate or weak) of the effects, being strongly influenced by the within-block correlation too.


Similar content being viewed by others
References
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B 57(1):289–300
Carvajal-Rodríguez A, de Uña-Álvarez J (2011) Assessing significance in high-throughput experiments by sequential goodness of fit and q-value estimation. PLoS ONE 6(9):e24700
Carvajal-Rodríguez A, de Uña-Álvarez J, Rolán-Álvarez E (2009) A new multitest correction (SGoF) that increases its statistical power when increasing the number of tests. BMC Bioinform 10(209):1–14
Castro-Conde I, de Uña-Álvarez J (2014a) sgof: An R Package for multiple testing problems. The R Journal (in press). http://journal.r-project.org/archive/accepted/conde-alvarez.pdf
Castro-Conde I, de Uña-Álvarez J (2014b) sgof: Multiple hypothesis testing. R package version 2.1.1. http://cran.r-project.org/web/packages/sgof/sgof.pdf
de Uña-Álvarez J (2011) On the statistical properties of SGoF multitesting method. Stat Appl Genet Mol Biol 10(1):Article Id 18
de Uña-Álvarez J (2012) The beta-binomial SGoF method for multiple dependent tests. Stat Appl Genet Mol Biol 11(3):Article Id 14
de Uña-Álvarez J, Carvajal-Rodríguez A (2010) ‘SGoFicance Trace’: assessing significance in high dimensional testing problems. PLoS ONE 5(12):e15930
Donoho D, Jin J (2004) Higher criticism for detecting sparse heterogeneous mixtures. Ann Stat 32(3):962–994
Donoho D, Jin J (2008) Higher criticism thresholding: optimal feature selection when useful features are rare and weak. Proc Natl Acad Sci 105(39):14,790–14,795
Dudoit S, Shaffer JP, Boldrick JC (2003) Multiple hypothesis testing in microarray experiments. Stat Sci 18(1):71–103
Dudoit S, van der Laan MJ (2008) Multiple testing procedures with applications to genomics. Springer, Berlin ISBN: 978-0-387-49316-9
Genovese C, Wasserman L (2002) Operating characteristics and extensions of the FDR procedure. J R Stat Soc B 64:499–518
Genovese C, Wasserman L (2004) A stochastic process approach to false discovery control. Ann Stat 32(3):1038–1061
Genz A, Bretz F, Miwa T, Mi X, Leisch F, Scheipl F, Hothorn T (2014) mvtnorm: Multivariate normal and t distributions. R package version 1.0-0
Hedenfalk I, Duggan D et al (2001) Gene-expression profiles in hereditary breast cancer. N Engl J Med 344(8):539–548
Lehman E, Romano J (2005) Generalizations of the familywise error rate. Ann Stat 33(3):1138–1154
Martínez-Camblor P (2014) On correlated z-values distribution in hypothesis testing. Comput Stat Data Anal 79:30–43
Moerkerke B, Goetghebeur E, De Riek J, Roldán-Ruiz I (2006) Significance and impotence: towards a balanced view of the null and the alternative hypotheses in marker selection for plant breeding. J R Stat Soc A 169(1):61–79
Nichols T, Hayasaka S (2003) Controlling the familywise error rate in functional neuroimaging: a comparative review. Stat Methods Med Res 12(5):419–446
Norris AW, Kahn CR (2006) Analysis of gene expression in pathophysiological states: balancing false discovery and false negative rates. Proc Natl Acad Sci 103(3):649–653
R Core Team (2014) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/
Storey J (2003) The positive false discovery rate: a bayesian interpretation and the q-value. Ann Stat 31:2013–2035
Storey J, Tibshirani R (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci 100(16):9440–9445
Tarone R (1979) Testing the goodness of fit of the binomial distribution. Biometrika 66(3):585–590
Tukey JW (1976) T13 N: the higher criticism. Course Notes, Statistics 411, Princeton University
van der Laan MJ, Dudoit S, Pollard K (2004) Augmentation procedures for control of the generalized family-wise error rate and tail probabilities for the proportion of false positives. Stat Appl Genet Mol Biol 3(1):Article Id 15. www.bepress.com/sagmb/vol3/iss1/art15
Acknowledgments
Work was supported by the Grant MTM2011-23204 (FEDER support included) of the Spanish Ministry of Science and Innovation.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Castro-Conde, I., de Uña-Álvarez, J. Power, FDR and conservativeness of BB-SGoF method. Comput Stat 30, 1143–1161 (2015). https://doi.org/10.1007/s00180-015-0553-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-015-0553-2