Skip to main content
Log in

On the fusion of threshold classifiers for categorization and dimensionality reduction

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

We study ensembles of simple threshold classifiers for the categorization of high-dimensional data of low cardinality and give a compression bound on their prediction risk. Two approaches are utilized to produce such classifiers. One is based on univariate feature selection employing the area under the ROC curve as ranking criterion. The other approach uses a greedy selection strategy. The methods are applied to artificial data, published microarray expression profiles, and highly imbalanced data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96(12): 6745–6750

    Article  Google Scholar 

  • Anthony M, Biggs N (1992) Computational learning theory. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Bittner M, Meltzer P, Chen Y, Jiang Y, Seftor E, Hendrix M, Radmacher M, Simon R, Yakhini Z, Ben-Dor A, Sampas N, Dougherty E, Wang E, Marincola F, Gooden C, Lueders J, Glatfelter A, Pollock P, Carpten J, Gillanders E, Leja D, Dietrich K, Beaudry C, Berens M, Alberts D, Sondak V (2000) Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 406(6795): 536–540

    Article  Google Scholar 

  • Blum A, Langford J (2003) PAC-MDL bounds. In: Schölkopf Bernhard, Warmuth Manfred K (eds) COLT. Springer, Berlin, pp 344–357

    Google Scholar 

  • Breiman L (1996) Bagging predictors. Mach Learn 24(2): 123–140

    MathSciNet  MATH  Google Scholar 

  • Breiman L (1998) Arcing classifiers. Annal Stat 26(3): 801–824

    Article  MathSciNet  MATH  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45(1): 5–32

    Article  MATH  Google Scholar 

  • Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth Publishing Company, Belmont

    MATH  Google Scholar 

  • Buchholz M, Kestler HA, Bauer A, Böck W, Rau B, Leder G, Kratzer W, Bommer M, Scarpa A, Schilling MK, Adler G, Hoheisel JD, Gress TM (2005) Specialized DNA arrays for the differentiation of pancreatic tumors. Clin Cancer Res 11(22): 8048–8054

    Article  Google Scholar 

  • Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3): 273–297

    MATH  Google Scholar 

  • Cover TM (1965) Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans Electron Comput 14(3): 326–334

    Article  MATH  Google Scholar 

  • Duch W (2004) Filter methods. In: Guyon I, Gunn S, Nikravesh M, Zadeh LA (eds) Feature extraction, foundations and applications. Springer, Berlin, pp 89–118

    Google Scholar 

  • Floyd S, Warmuth MK (1995) Sample compression, learnability, and the Vapnik-Chervonenkis dimension. Mach Learn 21(3): 269–304

    Google Scholar 

  • Freund Y (1995) Boosting a weak learning algorithm by majority. Inf Comput 121(2): 256–285

    Article  MathSciNet  MATH  Google Scholar 

  • Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10): 906–914

    Article  Google Scholar 

  • Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439): 531–537

    Article  Google Scholar 

  • Graepel T, Herbrich R, Shawe-Taylor J (2005) PAC-Bayesian compression bounds on the prediction error of learning algorithms for classification. Mach Learn 59(1–2): 55–76

    Article  MATH  Google Scholar 

  • Hanley JA, Mcneil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1): 29–36

    Google Scholar 

  • Haussler D (1988) Quantifying inductive bias: AI learning algorithms and Valiant’s learning framework. Artif Intell 36(2): 177–221

    Article  MathSciNet  MATH  Google Scholar 

  • Kearns MJ, Vazirani UV (1994) An introduction to computational learning theory. MIT, Cambridge

    Google Scholar 

  • Klivans AR, Servedio RA (2006) Toward attribute efficient learning of decision lists and parities. J Mach Learn Res 7: 587–602

    MathSciNet  Google Scholar 

  • Kuncheva LI, Whitaker CJ, Shipp CA (2003) Limits on the majority vote accuracy in classifier fusion. Pattern Anal Appl 6(1): 22–31

    Article  MathSciNet  MATH  Google Scholar 

  • Lai C, Reinders MJT, van’t Veer LJ, Wessels LFA (2006) A comparison of univariate and multivariate gene selection techniques for classification of cancer datasets. BMC Bioinf 7: 235

    Article  Google Scholar 

  • Langford J (2005) Tutorial on practical prediction theory for classification. J Mach Learn Res 6: 273–306

    MathSciNet  Google Scholar 

  • Laviolette F, Marchand M, Shah M (2005) Margin-sparsity trade-off for the set covering machine. In: Proceedings of the 16th European conference on machine learning. Springer, pp 206–217

  • Laviolette F, Marchand M, Shah M, Shanian S (2010) Learning the set covering machine by bound minimization and margin-sparsity trade-off. Mach Learn 78(1–2): 175–201

    Article  Google Scholar 

  • Littlestone N, Warmuth M (1986) Relating data compression and learnability. Unpublished manuscript

  • Marchand M, Shah M (2004) PAC-Bayes learning of conjunctions and classification of gene-expression data. In: Proceedings of the 18th annual conference on neural information processing systems. MIT, Cambridge, pp 881–888

  • Marchand M, Shawe-Taylor J (2002) The set covering machine. J Mach Learn Res 3(4–5): 723–746

    MathSciNet  Google Scholar 

  • McAllester DA (1999) PAC-Bayesian model averaging. In: COLT ’99: Proceedings of the twelfth annual conference on computational learning theory. ACM, pp 164–170

  • Pomeroy SL, Tamayo P, Gaasenbeek M, Sturla LM, Angelo M, McLaughlin ME, Kim JYH, Goumnerova LC, Black PM, Lau C, Allen JC, Zagzag D, Olson JM, Curran T, Wetmore C, Biegel JA, Poggio T, Mukherjee S, Rifkin R, Califano A, Stolovitzky G, Louis DN, Mesirov JP, Lander ES, Golub TR (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870): 436–442

    Article  Google Scholar 

  • Ruschhaupt M, Huber W, Poustka A, Mansmann U (2004) A compendium to ensure computational reproducibility in high-dimensional classification tasks. Stat Appl Genet Mol Biol 3(1): 37

    MathSciNet  Google Scholar 

  • Vapnik V (1998) Statistical learning theory. Wiley, New York

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hans A. Kestler.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kestler, H.A., Lausser, L., Lindner, W. et al. On the fusion of threshold classifiers for categorization and dimensionality reduction. Comput Stat 26, 321–340 (2011). https://doi.org/10.1007/s00180-011-0243-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-011-0243-7

Keywords

Navigation