Differential prioritization in feature selection and classifier aggregation for multiclass microarray datasets

Ooi, Chia Huey; Chetty, Madhu; Teng, Shyh Wei

doi:10.1007/s10618-006-0055-5

Differential prioritization in feature selection and classifier aggregation for multiclass microarray datasets

Published: 03 February 2007

Volume 14, pages 329–366, (2007)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Chia Huey Ooi¹,
Madhu Chetty¹ &
Shyh Wei Teng¹

212 Accesses
18 Citations
Explore all metrics

Abstract

The high dimensionality of microarray datasets endows the task of multiclass tissue classification with various difficulties—the main challenge being the selection of features deemed relevant and non-redundant to form the predictor set for classifier training. The necessity of varying the emphases on relevance and redundancy, through the use of the degree of differential prioritization (DDP) during the search for the predictor set is also of no small importance. Furthermore, there are several types of decomposition technique for the feature selection (FS) problem—all-classes-at-once, one-vs.-all (OVA) or pairwise (PW). Also, in multiclass problems, there is the need to consider the type of classifier aggregation used—whether non-aggregated (a single machine), or aggregated (OVA or PW). From here, first we propose a systematic approach to combining the distinct problems of FS and classification. Then, using eight well-known multiclass microarray datasets, we empirically demonstrate the effectiveness of the DDP in various combinations of FS decomposition types and classifier aggregation methods. Aided by the variable DDP, feature selection leads to classification performance which is better than that of rank-based or equal-priorities scoring methods and accuracies higher than previously reported for benchmark datasets with large number of classes. Finally, based on several criteria, we make general recommendations on the optimal choice of the combination of FS decomposition type and classifier aggregation method for multiclass microarray datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Iterative ensemble feature selection for multiclass classification of imbalanced microarray data

Article Open access 04 July 2016

Feature selection techniques for microarray datasets: a comprehensive review, taxonomy, and future directions

Article 24 October 2022

Comparative Study of Embedded Feature Selection Methods on Microarray Data

References

Ambroise C, McLachlan GJ (2002) Selection bias in gene extraction on the basis of microarray gene-expression data. Proc Natl Acad Sci USA 99:6562–6566
Article MATH Google Scholar
Armstrong SA, Staunton JE, Silverman LB, Pieters R, den Boer ML, Minden MD, Sallan SE, Lander ES, Golub TR, Korsmeyer SJ (2002) MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet 30:41–47
Article Google Scholar
Bhattacharjee A, Richards WG, Staunton JE, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M, Loda M, Weber G, Mark EJ, Lander ES, Wong W, Johnson BE, Golub TR, Sugarbaker DJ, Meyerson M (2001) Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci USA 98:13790–13795
Article Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24:123–140
MATH MathSciNet Google Scholar
Decoste D, Schölkopf B (2002) Training invariant support vector machines. Mach Learn 46:161–190
Article MATH Google Scholar
Ding C, Peng H (2003) Minimum redundancy feature selection from microarray gene expression data. In: Proceedings of the 2nd IEEE computational systems bioinformatics conference, pp 523–529
Dudoit S, Fridlyand J, Speed T (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 97:77–87
Article MATH MathSciNet Google Scholar
Franc V (2005) Optimization algorithms for kernel methods. PhD thesis, Center for Machine Perception, Department of Cybernetics, Faculty of Electrical Engineering, Czech Technical University, 29 July, 2005. ftp://cmp.felk.cvut.cz/pub/cmp/articles/franc/Franc-PhD.pdf
Garber ME, Troyanskaya OG, Schluens K, Petersen S, Thaesler Z, Pacyna-Gengelbach M, van de Rijn M, Rosen GD, Perou CM, Whyte RI, Altman RB, Brown PO, Botstein D, Petersen I (2001) Diversity of gene expression in adenocarcinoma of the lung. Proc Natl Acad Sci 98(24):13784–13789
Article Google Scholar
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537
Article Google Scholar
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Article MATH Google Scholar
Hall MA, Smith LA (1998) Practical feature subset selection for machine learning. In: Proceedings of the 21st Australasian computer science conference, pp 181–191
Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6:65–70
MathSciNet Google Scholar
Hsu CW, Lin CJ (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13(2):415–425
Article Google Scholar
Jirapech-Umpai T, Aitken S (2005) Feature selection and classification for microarray data analysis: evolutionary methods for identifying predictive genes. BMC Bioinform 6:148
Article Google Scholar
Khan J, Wei JS, Ringner M, Saal LH, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu CR, Peterson C, Meltzer PS (2001) Classification and diagnostic prediction of cancers using expression profiling and artificial neural networks. Nat Med 7:673–679
Article Google Scholar
Knijnenburg TA, Reinders MJT, Wessels LFA (2005) The selection of relevant and non-redundant features to improve classification performance of microarray gene expression data. In: Procedings of the 11th annual conference of the advanced school for computing and imaging, Heijen, NL. http://www.ict.ewi.tudelft.nl/pub/marcel/Knij05a.pdf
Li T, Zhang C, Ogihara M (2004) A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20:2429–2437
Article Google Scholar
Linder R, Dew D, Sudhoff H, Theegarten D, Remberger K, Poppl SJ, Wagner M (2004) The subsequent artificial neural network (SANN) approach might bring more classificatory power to ANN-based DNA microarray analyses. Bioinformatics 20:3544–3552
Article Google Scholar
Massart DL, Vandeginste BGM, Deming SN, Michotte Y, Kaufman L (1988) The k-nearest neighbor method. Chemometrics: a textbook (Data handling in science and technology) vol 2, pp 395–397
Mitchell T (1997) Machine learning. McGraw-Hill
Munagala K, Tibshirani R, Brown P (2004) Cancer characterization and feature set extraction by discriminative margin clustering. BMC Bioinform 5:21
Article Google Scholar
Ooi CH, Chetty M, Gondal I (2004) The role of feature redundancy in tumor classification. In: Proceedings of the international conference bioinformatics and its applications (ICBA’04). Advances in bioinformatics and its applications (Mathematical Biology and Medicine), vol 8, pp 197–208
Ooi CH, Chetty M, Teng SW (2005a) Relevance, redundancy and differential prioritization in feature selection for multiclass gene expression data. In: Proceedings of the 6th international symposium on biological and medical data analysis. Lecture notes in computer science, vol 3745, pp 367–378
Ooi CH, Chetty M, Teng SW. (2005b) Modeling microarray datasets for efficient feature selection. In: Proceedings of the 4th Australasian conference on knowledge discovery and data mining (AusDM05), pp 115–129
Park M, Hastie T (2005) Hierarchical classification using shrunken centroids. Department of Statistics, Stanford University. Technical report. http://www-stat.stanford.edu/~hastie/Papers/hpam.pdf
Platt JC (1998) Fast training of support vector machines using sequential minimal optimization. In: Schölkopf B, Burges CJC, Smola AJ (eds). Advances in Kernel methods. MIT Press, Cambridge, pp. 185–208
Google Scholar
Platt JC, Cristianini N, Shawe-Taylor J (2000) Large margin DAGs for multiclass classification. Adv Neural Inf Process Syst 12:547–553
Google Scholar
Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang CH, Angelo M, Ladd C, Reich M, Latulippe E, Mesirov JP, Poggio T, Gerald W, Loda M, Lander ES, Golub TR (2001) Multi-class cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci USA 98:15149–15154
Article Google Scholar
Ross DT, Scherf U, Eisen MB, Perou CM, Rees C, Spellman P, Iyer V, Jeffrey SS, Van de Rijn M, Waltham M, Pergamenschikov A, Lee JCF, Lashkari D, Shalon D, Myers TG, Weinstein JN, Botstein D, Brown PO (2000) Systematic variation in gene expression patterns in human cancer cell lines. Nat Genet 24:227–235
Article Google Scholar
Schena M, Shalon D, Davis RW, Brown PO (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270:467–470
Article Google Scholar
Shalon D, Smith SJ, Brown PO (1996) A DNA microarray system for analyzing complex DNA samples using two-color fluorescent probe hybridization. Genome Res 6(7):639–645
Google Scholar
Slonim DK, Tamayo P, Mesirov JP, Golub TR, Lander ES (2000) Class prediction and discovery using gene expression data. In: RECOMB 2000, pp 263–272
Vapnik VN (1998) Statistical learning theory. John Wiley and Sons
Yeoh E-J, Ross ME, Shurtleff SA, Williams WK, Patel D, Mahfouz R, Behm FG, Raimondi SC, Relling MV, Patel A, Cheng C, Campana D, Wilkins D, Zhou X, Li J, Liu H, Pui C-H, Evans WE, Naeve C, Wong L, Downing JR (2002) Classification, subtype discovery, and prediction of outcome in pediatric lymphoblastic leukemia by gene expression profiling. Cancer Cell 1:133–143
Article Google Scholar

Download references

Author information

Authors and Affiliations

Gippsland School of Information Technology, Monash University, Churchill, VIC, 3842, Australia
Chia Huey Ooi, Madhu Chetty & Shyh Wei Teng

Authors

Chia Huey Ooi
View author publications
You can also search for this author in PubMed Google Scholar
Madhu Chetty
View author publications
You can also search for this author in PubMed Google Scholar
Shyh Wei Teng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chia Huey Ooi.

Additional information

Responsible editor: Pierre Baldi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ooi, C.H., Chetty, M. & Teng, S.W. Differential prioritization in feature selection and classifier aggregation for multiclass microarray datasets. Data Min Knowl Disc 14, 329–366 (2007). https://doi.org/10.1007/s10618-006-0055-5

Download citation

Received: 16 February 2006
Accepted: 10 July 2006
Published: 03 February 2007
Issue Date: June 2007
DOI: https://doi.org/10.1007/s10618-006-0055-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Differential prioritization in feature selection and classifier aggregation for multiclass microarray datasets

Abstract

Access this article

Similar content being viewed by others

Iterative ensemble feature selection for multiclass classification of imbalanced microarray data

Feature selection techniques for microarray datasets: a comprehensive review, taxonomy, and future directions

Comparative Study of Embedded Feature Selection Methods on Microarray Data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Differential prioritization in feature selection and classifier aggregation for multiclass microarray datasets

Abstract

Access this article

Similar content being viewed by others

Iterative ensemble feature selection for multiclass classification of imbalanced microarray data

Feature selection techniques for microarray datasets: a comprehensive review, taxonomy, and future directions

Comparative Study of Embedded Feature Selection Methods on Microarray Data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation