Restoring coverage to the Bayesian false discovery rate control procedure

Gold, David L.

doi:10.1007/s10115-012-0503-z

Restoring coverage to the Bayesian false discovery rate control procedure

Regular Paper
Published: 14 June 2012

Volume 33, pages 401–417, (2012)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

David L. Gold¹^nAff2

167 Accesses
Explore all metrics

Abstract

Principal among knowledge discovery tasks is recognition of insightful patterns or features from data that can inform otherwise challenging decisions. For the costly future decisions, there is little room for error. Features must provide substantial evidence to be robust for classification and dependable for important decisions. Here we seek statistical evidence for feature selection, that feature signals are of sufficient magnitude and frequency to be generalizable for classification. The Bayesian false discovery rate (bFDR) error control procedure is powerfully suited for this task. In realistic situations often encountered in practice, the bFDR procedure is biased, yielding a greater than desired FDR. In other less typical cases, the FDR is less than desired. We investigate the sources of bias in the bFDR procedure, and predict the direction of bias. A new algorithm has been developed to recover the bias in the bFDR control procedure. In simulation and real data mining examples, the new bFDR control algorithm shows promise. The strengths and limitations of the new approach are presented with examples and discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Classes of multiple decision functions strongly controlling FWER and FDR

Article 30 October 2014

Confidence distributions and hypothesis testing

Article Open access 29 March 2024

Blending Bayesian and frequentist methods according to the precision of prior information with applications to hypothesis testing

Article 14 February 2015

Abbreviations

y :: Observed data continuous response for
θ :: Mean of y
σ ² :: Variance of y
μ :: Assumed, prior mean of, θ
ω, ν ² :: Additional parameters of prior distributions of (θ, σ ²)
H ₀/H ₁ :: Null/alternative hypothesis
p _j :: p value for j th test
t :: p value threshold for rejecting or failing to reject H ₀
U ₀ :: Posterior probability of H ₀ given data y
α :: Rate at which error is controlled, or desired, i.e. FDR
f ():: Probability density function
F():: Probability distribution function, or, cumulative density function
M :: Number of attributes, features, i.e. tests
π :: Probability among M tests that the alternative hypothesis is true

References

Wu X, Kumar V, Quinlan JR et al (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14: 137
Article Google Scholar
Niculescu-Mizil A, Perlich C, Swirszcz G et al (2009) Winning the KDD cup orange challenge with ensemble selection. JMLR 7: 23–34
Google Scholar
Blalock EM, Geddes JW, Chen KC, Porter NM et al (2004) Incipient Alzheimer’s disease: microarray correlation analyses reveal major transcriptional and tumor suppressor responses. Proc Natl Acad Sci 101(7): 2173–2178
Article Google Scholar
Yang Y, Hao C (2010) Product selection for promotion planning. Knowl Inf Syst. doi:10.1007/s10115-010-0326-8
Wozniak M (2010) A hybrid decision tree training method using data Streams. Knowl Inf Syst. doi:10.1007/s10115-010-0345-5
Czarnowski I (2011) Cluster-based instance selection for machine classification. Knowl Inf Syst. doi:10.1007/s10115-010-0375-z
Salam A, Khayal MSH (2010) Mining top-k frequent patterns without minimum support threshold. Knowl Inf Syst. doi:10.1007/s10115-010-0363-3
Kong X, Yu PS (2011) gMLC: a multi-label feature selection framework for graph classification. Knowl Inf Syst. doi:10.1007/s10115-011-0407-3
Cheng C, Pounds S (2007) False discovery rate paradigms for statistical analyses of microarray gene expression data. Bioinformation 1(10): 436–446
Article Google Scholar
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol 57(1): 289–300
MathSciNet MATH Google Scholar
Storey JD (2002) A direct approach to false discovery rates. J R Stat Soc Ser B Methodol 64: 479–498
Article MathSciNet MATH Google Scholar
Pounds S, Morris SW (2003) Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values. Bioinformatics 19(1): 1236–1242
Article Google Scholar
Efron E, Tibshirani R, Storey JD, Tusher V (2001) Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 96: 1151–1160
Article MathSciNet MATH Google Scholar
Genovese C, Wasserman L (2004) A stochastic process approach to false discovery control. Ann Stat 32: 1035–1061
Article MathSciNet MATH Google Scholar
Gold D, Miecznikowski JC, Liu S (2009) Error control variability in pathway-based microarray analysis. Bioinformatics 25: 2216–2221
Article Google Scholar
Whittmore AS (2007) A Bayesian false discovery rate for multiple testing. J Appl Stat 34(1): 1–9
Article MathSciNet Google Scholar
Gelman A, Carlin JB, Stern AS, Rubin DB (2003) Bayesian data analysis, 2nd edn. Chapman & Hall, CRC Texts in Statistical Science, Boca Raton, FL
Google Scholar
Wachi S, Yoneda K, Wu R et al (2005) Interactome–transcriptome analysis reveals the high centrality of genes differentially expressed in lung cancer tissues. Bioinformatics 21(23): 4205–4208
Article Google Scholar
Danziger SA, Zeng J, Wang Y, Brachmann RK, Lathrop RH (2007) Choosing where to look next in a mutation sequence space: Active Learning of informative p53 cancer rescue mutants. Bioinformatics 23(13): 104–114
Article Google Scholar

Download references

Author information

David L. Gold
Present address: MedImmune LLC, One MedImmune Way, Gaitherburg, MD, 20874, USA

Authors and Affiliations

Roswell Park Cancer Institute, Elm & Carlton, Buffalo, NY, 14263, USA
David L. Gold

Authors

David L. Gold
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David L. Gold.

Electronic Supplementary Material

The Below is the Electronic Supplementary Material.

DOC 1 (DOC 46 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gold, D.L. Restoring coverage to the Bayesian false discovery rate control procedure. Knowl Inf Syst 33, 401–417 (2012). https://doi.org/10.1007/s10115-012-0503-z

Download citation

Received: 25 March 2011
Revised: 16 January 2012
Accepted: 21 January 2012
Published: 14 June 2012
Issue Date: November 2012
DOI: https://doi.org/10.1007/s10115-012-0503-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Restoring coverage to the Bayesian false discovery rate control procedure

Abstract

Access this article

Similar content being viewed by others

Classes of multiple decision functions strongly controlling FWER and FDR

Confidence distributions and hypothesis testing

Blending Bayesian and frequentist methods according to the precision of prior information with applications to hypothesis testing

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Electronic Supplementary Material

DOC 1 (DOC 46 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Restoring coverage to the Bayesian false discovery rate control procedure

Abstract

Access this article

Similar content being viewed by others

Classes of multiple decision functions strongly controlling FWER and FDR

Confidence distributions and hypothesis testing

Blending Bayesian and frequentist methods according to the precision of prior information with applications to hypothesis testing

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Electronic Supplementary Material

DOC 1 (DOC 46 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation