Skip to main content
Log in

Restoring coverage to the Bayesian false discovery rate control procedure

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Principal among knowledge discovery tasks is recognition of insightful patterns or features from data that can inform otherwise challenging decisions. For the costly future decisions, there is little room for error. Features must provide substantial evidence to be robust for classification and dependable for important decisions. Here we seek statistical evidence for feature selection, that feature signals are of sufficient magnitude and frequency to be generalizable for classification. The Bayesian false discovery rate (bFDR) error control procedure is powerfully suited for this task. In realistic situations often encountered in practice, the bFDR procedure is biased, yielding a greater than desired FDR. In other less typical cases, the FDR is less than desired. We investigate the sources of bias in the bFDR procedure, and predict the direction of bias. A new algorithm has been developed to recover the bias in the bFDR control procedure. In simulation and real data mining examples, the new bFDR control algorithm shows promise. The strengths and limitations of the new approach are presented with examples and discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Abbreviations

y :

Observed data continuous response for

θ :

Mean of y

σ 2 :

Variance of y

μ :

Assumed, prior mean of, θ

ω, ν 2 :

Additional parameters of prior distributions of (θ, σ 2)

H 0/H 1 :

Null/alternative hypothesis

p j :

p value for j th test

t :

p value threshold for rejecting or failing to reject H 0

U 0 :

Posterior probability of H 0 given data y

α :

Rate at which error is controlled, or desired, i.e. FDR

f ():

Probability density function

F():

Probability distribution function, or, cumulative density function

M :

Number of attributes, features, i.e. tests

π :

Probability among M tests that the alternative hypothesis is true

References

  1. Wu X, Kumar V, Quinlan JR et al (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14: 137

    Article  Google Scholar 

  2. Niculescu-Mizil A, Perlich C, Swirszcz G et al (2009) Winning the KDD cup orange challenge with ensemble selection. JMLR 7: 23–34

    Google Scholar 

  3. Blalock EM, Geddes JW, Chen KC, Porter NM et al (2004) Incipient Alzheimer’s disease: microarray correlation analyses reveal major transcriptional and tumor suppressor responses. Proc Natl Acad Sci 101(7): 2173–2178

    Article  Google Scholar 

  4. Yang Y, Hao C (2010) Product selection for promotion planning. Knowl Inf Syst. doi:10.1007/s10115-010-0326-8

  5. Wozniak M (2010) A hybrid decision tree training method using data Streams. Knowl Inf Syst. doi:10.1007/s10115-010-0345-5

  6. Czarnowski I (2011) Cluster-based instance selection for machine classification. Knowl Inf Syst. doi:10.1007/s10115-010-0375-z

  7. Salam A, Khayal MSH (2010) Mining top-k frequent patterns without minimum support threshold. Knowl Inf Syst. doi:10.1007/s10115-010-0363-3

  8. Kong X, Yu PS (2011) gMLC: a multi-label feature selection framework for graph classification. Knowl Inf Syst. doi:10.1007/s10115-011-0407-3

  9. Cheng C, Pounds S (2007) False discovery rate paradigms for statistical analyses of microarray gene expression data. Bioinformation 1(10): 436–446

    Article  Google Scholar 

  10. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol 57(1): 289–300

    MathSciNet  MATH  Google Scholar 

  11. Storey JD (2002) A direct approach to false discovery rates. J R Stat Soc Ser B Methodol 64: 479–498

    Article  MathSciNet  MATH  Google Scholar 

  12. Pounds S, Morris SW (2003) Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values. Bioinformatics 19(1): 1236–1242

    Article  Google Scholar 

  13. Efron E, Tibshirani R, Storey JD, Tusher V (2001) Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 96: 1151–1160

    Article  MathSciNet  MATH  Google Scholar 

  14. Genovese C, Wasserman L (2004) A stochastic process approach to false discovery control. Ann Stat 32: 1035–1061

    Article  MathSciNet  MATH  Google Scholar 

  15. Gold D, Miecznikowski JC, Liu S (2009) Error control variability in pathway-based microarray analysis. Bioinformatics 25: 2216–2221

    Article  Google Scholar 

  16. Whittmore AS (2007) A Bayesian false discovery rate for multiple testing. J Appl Stat 34(1): 1–9

    Article  MathSciNet  Google Scholar 

  17. Gelman A, Carlin JB, Stern AS, Rubin DB (2003) Bayesian data analysis, 2nd edn. Chapman & Hall, CRC Texts in Statistical Science, Boca Raton, FL

    Google Scholar 

  18. Wachi S, Yoneda K, Wu R et al (2005) Interactome–transcriptome analysis reveals the high centrality of genes differentially expressed in lung cancer tissues. Bioinformatics 21(23): 4205–4208

    Article  Google Scholar 

  19. Danziger SA, Zeng J, Wang Y, Brachmann RK, Lathrop RH (2007) Choosing where to look next in a mutation sequence space: Active Learning of informative p53 cancer rescue mutants. Bioinformatics 23(13): 104–114

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David L. Gold.

Electronic Supplementary Material

The Below is the Electronic Supplementary Material.

DOC 1 (DOC 46 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gold, D.L. Restoring coverage to the Bayesian false discovery rate control procedure. Knowl Inf Syst 33, 401–417 (2012). https://doi.org/10.1007/s10115-012-0503-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-012-0503-z

Keywords

Navigation