Abstract
Micorarray data are often extremely asymmetric in dimensionality, such as thousands or even tens of thousands of genes and a few hundreds of samples. Such extreme asymmetry between the dimensionality of genes and samples presents several challenges to conventional clustering and classification methods. In this paper, a novel ensemble method based on correlation analysis is proposed. Firstly, in order to extract useful features and reduce dimensionality, different feature selection methods based on correlation analysis are used to form different feature subsets. Then a pool of candidate base classifiers is generated to learn the subsets which are re-sampling from the different feature subsets. At last, appropriate classifiers are selected to construct the classification committee using EDA (Estimation of Distribution Algorithms) algorithm. Experiments show that the proposed method produces the best recognition rates on two benchmark databases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Chu, F., Wang, L.: Appliations of Support Vector Machines to Cancer Classification with Microarray Data. International Journal of Neural Systems 15(6), 475–484 (2005)
Chen, Y., Peng, L., Abraham, A.: Gene Expression Profiling Using Flexible Neural Trees. In: Corchado, E.S., Yin, H., Botti, V., Fyfe, C. (eds.) IDEAL 2006. LNCS, vol. 4224, pp. 1121–1128. Springer, Heidelberg (2006)
Roth, V., Lange, T.: Bayesian Class Discovery in Microarray Datasets. IEEE Trans. Biomed. Eng. 51(5), 707–818 (2004)
Zhang, A.: Advanced Analysis of Gene Expression Microarray Data, pp. 183–184. World Scientific, Singapore (2006)
Tan, A., Gilbert, D.: Ensemble Machine Learning on Gene Expression Data for Cancer Classification. Appl. Bioinform. 2( Suppl. 3), 75–83 (2003)
Zhou, Z.H., Wu, J., Tang, W.: Ensembling Neural Networks: Many Could Be Better Than All. Artificial Intelligence 137(1-2), 239–263 (2002)
Camp, N., Slattery, M.: Classification Tree Analysis: A Statistical Tool to Investigate Risk Factor Interactions with an Example for Colon Cancer. Cancer Causes Contr. 13(9), 813–823 (2002)
Li, L., Weinberg, C., Darden, T., Pedersen, L.: Gene Selection for Sample Classification Based on Gene Expression Data: Study of Sensitivity to Choice of Parameters of the GA/KNN Method. Bioinformatics 17(12), 1131–1142 (2001)
Azuaje, F.: A Computational Neural approach to Support the Discovery of Gene Function and Classes of Cancer. IEEE Trans. Biomed. Eng. 48(3), 332–339 (2001)
Larranaga, P., Lozano, J.A.: Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation. Kluwer Academic Publishers, Dordrecht (2001)
Cho, S.-B.: Exploring Features and Classifiers to Classify Gene Expression Profiles Of acute Leukemia. Int. J. Pattern Recogn. Artif. Intell. 16(7), 1–13 (2002)
Harrington, C.A., Rosenow, C., Retief, J.: Monitoring Gene Expression Using DNA Microarrays. Curr. Opin. Microbiol. 3, 285–291 (2000)
Clerc, M., Kennedy, J.: The Particle Swarm: Explosion, Stability, and Convergence in a Multidimensional Complex Space. IEEE Transactions on Evolutionary Computation 6, 58–73 (2002)
Sarkar, I., Planet, P., Bael, T., Stanley, S., Siddall, M., DeSalle, R., et al.: Characteristic Attributes in Cancer Microarrays. J. Biomed. Inform. 35(2), 111–122 (2002)
Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Blomfield, C.D., Lander, E.S.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286(12), 531–537 (1999)
Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad Patterns of Gene Expresson Revealed by Clustering analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Array. Proc. Natl. Acad. Sci. 96(12), 6745–6750 (1999)
Funahashi, K.: On the Approximate Realization of Continuous Mapping by Neural Networks. Neural Networks 2, 183–332 (1989)
Eisen, M.B., Brown, B.O.: DNA Arrays for Analysis of Gene Expression. Methods Enzymol. 303, 179–205 (1999)
Terrence, S.F., Nello, C., Nigel, D., David, W.B., Michel, S., David, H.: Support Vector Machine Classification and Validation of Cancer Tissue Samples Using Microarray Expression Data. Bioinformatics 16(10), 906–914 (2000)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Zhao, Y., Chen, Y., Zhang, X. (2007). A Novel Ensemble Approach for Cancer Data Classification. In: Liu, D., Fei, S., Hou, Z., Zhang, H., Sun, C. (eds) Advances in Neural Networks – ISNN 2007. ISNN 2007. Lecture Notes in Computer Science, vol 4492. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72393-6_143
Download citation
DOI: https://doi.org/10.1007/978-3-540-72393-6_143
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72392-9
Online ISBN: 978-3-540-72393-6
eBook Packages: Computer ScienceComputer Science (R0)