Meta-classifiers for high-dimensional, small sample classification for gene expression analysis

Kim, Kyung-Joong; Cho, Sung-Bae

doi:10.1007/s10044-014-0369-7

Meta-classifiers for high-dimensional, small sample classification for gene expression analysis

Theoretical Advances
Published: 06 May 2014

Volume 18, pages 553–569, (2015)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Kyung-Joong Kim¹ &
Sung-Bae Cho²

499 Accesses
9 Citations
Explore all metrics

Abstract

Classification using small sample size (limited number of samples) with high dimension is a challenging problem in both machine learning and medicine as there are a wide variety of possible modeling approaches. Furthermore, it is not always clear which method is optimal for a prediction task. Different modeling choices include feature selection (dimensionality reduction), classification algorithms, and ensemble selection. There are several possible combinations of these methods, and it is not always clear which is the best. In the previous works, researchers show that evolutionary computation is useful to build an ensemble from the pairs of feature selection and classification algorithms. However, there are several parameters to be determined for the evolutionary computation and it requires computational time for the optimization. In this paper, we attempt to improve the approach by adopting meta-classification with the farthest-first clustering algorithm. The effectiveness and accuracy of our method are validated by experiments on four real microarray datasets (colon, breast, prostate and lymphoma cancers) publicly available. The results confirm that the proposed method outperforms single individual classifiers and other alternatives (standard genetic algorithm, and methods from literature).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-population adaptive genetic algorithm for selection of microarray biomarkers

Article 17 December 2019

Unleashing the power of machine learning in cancer analysis: a novel gene selection and classifier ensemble strategy

Article 08 January 2024

An Optimize Gene Selection Approach for Cancer Classification Using Hybrid Feature Selection Methods

Abbreviations

AVG:: Average
CC:: Cosine coefficient
CF:: Classification
DCGA:: Deterministic crowding genetic algorithm
DLDA:: Diagonal linear discriminant analysis
ED:: Euclidean distance
F1–F4:: Fitness functions
FS:: Feature selection
G:: The number of genes
G1–G2:: Global ranking feature selection methods
GA:: Genetic algorithm
IG:: Information gain
IV:: Ideal vector
KNN:: K-nearest neighbor
KNNC:: KNN with cosine coefficient
KNNE:: KNN with Euclidean distance
KNNP:: KNN with Pearson correlation
KNNS:: KNN with Spearman correlation
LOOCV:: Leave-one-out cross-validation
M :: The number of classification algorithms
MDL:: Minimum description length
MI:: Mutual information
MLP:: Multi-layer perceptron
N :: The number of feature selection methods
NNGE:: Non-nested generalized exemplars
P :: The number of training samples
PAM:: Prediction analysis with microarray
PC:: Pearson correlation
PCP:: Pattern classification program
SNR:: Signal-to-noise ratio
SP:: Spearman correlation
SPEGASOS:: Stochastic variant of primal estimated sub-gradient solver for SVM
SVM:: Support vector machine
SVML:: Linear SVM
TS:: Training sample

References

Psomopoulos FE, Mitkas PA (2010) Bioinformatics algorithm development for grid environments. J Syst Softw 83:1249–1257
Article Google Scholar
Slonim DK (2002) From patterns to pathways: gene expression data analysis comes of age. Nat Genet 32:502–508
Article Google Scholar
Braga-Neto U (2007) Fads and fallacies in the name of small-sample microarray classification. IEEE Signal Process Mag 24:91–99
Article Google Scholar
Goldberg DE (1989) Genetic algorithms in search, optimization, and machine learning. Addison-Wesley, Boston
MATH Google Scholar
Kim KJ, Cho SB (2008) An evolutionary algorithm approach to optimal ensemble classifiers for DNA microarray data analysis. IEEE Trans Evol Comput 12:377–388
Article Google Scholar
Xie X, Ho JWK, Murhpy C, Kaiser G, Xu B, Chen TY (2011) Testing and validating machine learning classifiers by metamorphic testing. J Syst Softw 84:544–558
Article Google Scholar
Saeys Y, Inza I, Larranaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23:2507–2517
Article Google Scholar
Blanco R, Larranaga P, Inza I, Sierra B (2004) Gene selection for cancer classification using wrapper approaches. Int J Pattern Recognit Artif Intell 18:1373–1390
Article Google Scholar
Inza I, Larranaga P, Blanco R, Cerrolaza AJ (2004) Filter versus wrapper gene selection approaches in DNA microarray domains. Artif Intell Med 31:91–103
Article Google Scholar
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422
Article MATH Google Scholar
Su Y, Murali TM, Pavlovic V, Schaffer M, Kasif S (2003) RankGene: identification of diagnostic genes based on expression data. Bioinformatics 19:1578–1579
Article Google Scholar
Liu H, Liu L, Zhang H (2010) Ensemble gene selection by grouping for microarray data classification. J Biomed Inform 43:81–87
Article Google Scholar
Buturovic LJ (2006) PCP: a program for supervised classification of gene expression profiles. Bioinformatics 22:245–247
Article Google Scholar
Diaz-Uriarte R, de Andres SA (2006) Gene selection and classification of microarray data using random forest. BMC Bioinform 7:3
Article Google Scholar
Dettling M (2004) Bagboosting for tumor classification with gene expression data. Bioinformatics 20:3583–3593
Article Google Scholar
Jirapech-Umpai T, Aitken S (2005) Feature selection and classification for microarray data analysis: evolutionary methods for identifying predictive genes. BMC Bioinform 6:148
Article Google Scholar
Li L, Weinberg CR, Darden TA, Pedersen LG (2001) Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics 17:1131–1142
Article Google Scholar
Dudoit S, Fridlyand J, Speed TP (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 97:77–87
Article MathSciNet MATH Google Scholar
Cho SB, Won HH (2003) Data mining for gene expression profiles from DNA microarray. Int J Softw Eng Knowl Eng 13:593–608
Article Google Scholar
Pochet N, Smet FD, Suykens JAK, Moor BLRD (2004) Systematic benchmarking of microarray data classification: assessing the role of non-linearity and dimensionality reduction. Bioinformatics 20:3185–3195
Article Google Scholar
Lee JW, Lee JB, Park M, Song SH (2005) An extensive comparison of recent classification tools applied to microarray data. Comput Stat Data Anal 48:869–885
Article MathSciNet MATH Google Scholar
Kuncheva LI (2004) Combining pattern classifiers: methods and algorithms. Wiley, New York
Book Google Scholar
Tan AC, Gilbert D (2003) Ensemble machine learning on gene expression data for cancer classification. Appl Bioinform 2:S75–S83
Google Scholar
Cho SB, Ryu JW (2002) Classifying gene expression data of cancer using classifier ensemble with mutually exclusive features. Proc IEEE 90:1744–1753
Article Google Scholar
Cho SB, Won HH (2007) Cancer classification using ensemble of neural networks with multiple significant gene subsets. Appl Intell 26:243–250
Article Google Scholar
Won HH, Cho SB (2003) Neural network ensemble with negatively correlated features for cancer classification. Lect Notes Comput Sci 2714:1143–1150
Article Google Scholar
Hochbaum D, Shmoys DB (1985) A best possible heuristic for the k-center problem. Math Oper Res 10:180–184
Article MathSciNet MATH Google Scholar
Dasgupta S (2010) Hierarchical clustering with performance guarantees. In: Classification as a tool for research, studies in classification, data analysis, and knowledge organization, pp. 3–14. doi:10.1007/978-3-642-10745-0_1
Gonzalez TF (1985) Clustering to minimize the maximum intercluster distance. Theoret Comput Sci 38:293–306
Article MathSciNet MATH Google Scholar
Cho SB, Park CH (2004) Speciated GA for optimal ensemble classifiers in DNA microarray classification. IEEE Congr Evolut Comput 590–597
Kim KJ, Cho SB (2005) DNA gene expression classification with ensemble classifiers optimized by speciated genetic algorithm. In: First international conference on pattern recognition and machine intelligence, pp 649–653
Park CH, Cho SB (2003) Evolutionary ensemble classifier for lymphoma and colon cancer classification. IEEE Congr Evolut Comput 2378–2385
Park CH, Cho SB (2003) Evolutionary computation for optimal ensemble classifier in lymphoma cancer. In: 14th international symposium on methodologies for intelligent systems, pp 521–530
Kim KJ, Cho SB (2010) Exploring features and classifiers to classify microRNA expression profiles of human cancer. In: 17th international conference on neural information processing, pp 234–241
Xu L, Krzyzak A, Suen CY (1992) Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Trans Syst Man Cybern 22:418–435
Article Google Scholar
RANKGENE. http://genomics10.bu.edu/yangsu/rankgene/
LIBSVM. http://www.csie.ntu.edu.tw/~cjlin/libsvm/
Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D et al (1999) Broad patterns of gene expression revealed by clustering of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96:6745–6750
Article Google Scholar
Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C et al (2002) Gene expression correlates of clinical prostate cancer behaviour. Cancer Cell 1:203–209
Article Google Scholar
van’t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AAM, Mao M et al (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415:530–536
Article Google Scholar
Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A et al (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403:503–511
Article Google Scholar
Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, London
Google Scholar
WEKA Toolkit. www.cs.waikato.ac.nz/ml/weka/
Kim KJ, Cho SB (2006) Ensemble classifiers based on correlation analysis for DNA microarray classification. Neurocomputing 70:187–199
Article Google Scholar
Dehuri S, Roy R, Cho SB, Ghosh A (2012) An improved swarm optimized functional link artificial neural network (ISO-FLANN) for classification. J Syst Softw 85:1333–1345
Article Google Scholar
Luo Y, Tao D, Geng Bo, Xu C, Maybank SJ (2013) Manifold regularized multitask learning for semi-supervised multilabel image classification. IEEE Trans Image Process 22:523–536
Article MathSciNet Google Scholar
Luo Y, Tao D, Xu C, Xu C, Liu H, Wen Y (2013) Multiview vector-valued manifold regularization for multilabel image classification. IEEE Trans Neural Netw Learn Syst 24:709–722
Article Google Scholar
Hwang TH, Tian Z, Kuang R, Kocher JP (2008) Learning on weighted hypergraphs to integrate protein interactions and gene expressions for cancer outcome prediction. In: IEEE international conference on data mining, pp 293–302
Tian Z, Hwang TH, Kuang R (2009) A hypergraph-based learning algorithm for classifying gene expression and array CGH data with prior knowledge. Bioinformatics 25:2831–2838
Article MATH Google Scholar
Zhou D, Huang J, Scholkopf (2005) Learning from labeled and unlabeled data on a directed graph. In: Proceedings of the 22nd international conference on machine learning, pp 1036–1043
Zhu X, Ghahramani Z, Lafferty J (2003) Semi-supervised learning using Gaussian fields and harmonic functions. In: Proceedings of the international conference on machine learning, pp 912–919
Wu M, Scholkopf B (2007) Transductive classification via local learning regularization. J Mach Learn Res-Proc Track 2:628–635
Google Scholar
Yu J, Tao D, Wang M (2012) Adaptive hypergraph learning and its application in image classification. IEEE Trans Image Process 21:3262–3272
Article MathSciNet Google Scholar
Yu J, Wang M, Tao D (2012) Semisupervised multiview distance metric learning for cartoon synthesis. IEEE Trans Image Process 21:4636–4648
Article MathSciNet Google Scholar
Yu J, Liu D, Tao D, Seah HS (2011) Complex object correspondence construction in two-dimensional animation. IEEE Trans Image Process 20:3257–3269
Article MathSciNet Google Scholar
Tao D, Li X, Wu X, Maybank SJ (2007) General tensor discriminant analysis and Gabor features for gait recognition. IEEE Trans Pattern Anal Mach Intell 29:1700–1715
Article Google Scholar
Tao D, Li X, Wu X, Maybank SJ (2009) Geometric mean for subspace selection. IEEE Trans Pattern Anal Mach Intell 31:260–274
Article Google Scholar
Zhang T, Tao D, Li X, Yang J (2009) Patch alignment for dimensionality reduction. IEEE Trans Knowl Data Eng 21:1299–1313
Article Google Scholar
Yu J, Liu D, Tao D, Seah HS (2012) On combining multiple features for cartoon character retrieval and clip synthesis. IEEE Trans Syst Man Cybern––Part B: Cybern 42:1413–1427
Article Google Scholar
Yu J, Tao D (2013) Modern machine learning techniques and their applications in cartoon animation research, Wiley-IEEE Press, Piscataway
Dhillon IS, Guan Y, Kulis B (2004) Kernel k-menas: Spectral clustering and normalized cuts. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, pp 551–556
Pauca VP, Shahnaz F, Berry MW, Plemmons RJ (2004) Text mining using non-negative matrix factorizations. In: Proceedings of the fourth SIAM international conference on data mining, pp 452–456
Guan N, Tao D, Luo Z, Yuan B (2011) Non-negative patch alignment framework. IEEE Trans Neural Netw 22:1218–1230
Article Google Scholar
Guan N, Tao D, Luo Z, Yuan B (2012) NeNMF: an optimal gradient method for nonnegative matrix factorization. IEEE Trans Signal Process 60:2882–2898

Download references

Acknowledgement

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (2013 R1A2A2A01016589, 2010-0018950, 2010-0018948).

Author information

Authors and Affiliations

Department of Computer Engineering, Sejong University, Seoul, 143-747, South Korea
Kyung-Joong Kim
Department of Computer Science, Yonsei University, Seoul, South Korea
Sung-Bae Cho

Authors

Kyung-Joong Kim
View author publications
You can also search for this author in PubMed Google Scholar
Sung-Bae Cho
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kyung-Joong Kim.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, KJ., Cho, SB. Meta-classifiers for high-dimensional, small sample classification for gene expression analysis. Pattern Anal Applic 18, 553–569 (2015). https://doi.org/10.1007/s10044-014-0369-7

Download citation

Received: 12 August 2012
Accepted: 15 April 2014
Published: 06 May 2014
Issue Date: August 2015
DOI: https://doi.org/10.1007/s10044-014-0369-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Meta-classifiers for high-dimensional, small sample classification for gene expression analysis

Abstract

Access this article

Similar content being viewed by others

Multi-population adaptive genetic algorithm for selection of microarray biomarkers

Unleashing the power of machine learning in cancer analysis: a novel gene selection and classifier ensemble strategy

An Optimize Gene Selection Approach for Cancer Classification Using Hybrid Feature Selection Methods

Abbreviations

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Meta-classifiers for high-dimensional, small sample classification for gene expression analysis

Abstract

Access this article

Similar content being viewed by others

Multi-population adaptive genetic algorithm for selection of microarray biomarkers

Unleashing the power of machine learning in cancer analysis: a novel gene selection and classifier ensemble strategy

An Optimize Gene Selection Approach for Cancer Classification Using Hybrid Feature Selection Methods

Abbreviations

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation