Abstract
Gene expression data that is being used to gather information from tissue samples is expected to significantly improve the development of efficient tumor diagnosis. For more accurate classification of tumor, extracting discriminant components from thousands of genes is an important problem which becomes a challenging task due to its characteristics such as the large number of genes and small sample size. We propose a novel approach which combines gene ranking with independent component analysis that has been developing recently to further improve the classification performance of gene expression data based on support vector machines. Two sets of gene expression data (colon dataset and leukemia dataset) are examined to confirm that the proposed approach can extract a small quantity of independent components which can drastically reduce the dimensionality of the original gene expression data when retaining higher recognition rate. The cross-validation accuracy of 100% has been achieved with extracting only 3 independent components from the leukemia dataset, and 93.55% for the colon dataset.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Vapnik, V.N.: Statistical learning theory. Springer, New York (1998)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines. Cambridge University Press, Cambridge (2000)
Furey, T.S., Cristianini, N., Duffy, N., Bednarski, D.W., Schummer, M., Haussler, D.: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10), 906–914 (2000)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)
Cho, S.-B., Won, H.-H.: Machine learning in DNA microarray analysis for cancer classification. In: Proceedings of the First Asia-Pacific Bioinformatics Conference on Bioinformatics, pp. 189-198 (2003)
Furlanello, C., Serafini, M., Merler, S., Jurman, G.: An accelerated procedure for recursive feature ranking on microarray data. Neural Networks 16, 641–648 (2003)
Nishimura, K., Abe, K., Ishikawa, S.: Shumpei Ishikawa, Shuichi Tsutsumi, Koichi Hirota, and Hiroyuki Aburatani. A PCA based method of gene expression visual analysis. Genome Informatics 14, 346–347 (2003)
Wolfram, L.: Linear modes of gene expression determined by independent component analysis. Bioinformatics 18(1), 51–60 (2002)
Yingxin, L., Xiaogang, R.: Feature selection for cancer classification based on Support Vector Machine. Journal of Computer Research and Development 42(10), 1796–1801 (2005)
Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues by oligonucleotide arrays. Proc. Nat. Acad. Sci. USA 96, 6745–6750 (1999)
Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines (2001), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Li, L., Weinberg, C.R., Darden, T.A., Pedersen, L.G.: Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics 17(12), 1131–1142 (2001)
Furey, T.S., Cristianini, N., Duffy, N., Bednarski, D.W., Schummer, M., Haussler, D.: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10), 906–914 (2000)
Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M., Yakhini, N.: Tissue classification with gene expression profiles. Journal of computional Biology, 7, 559–584
Nguyen, D.V., Rocke, D.M.: Tumor classification by partial least squares using microarray gene expression data. Bioinformatics 18(1), 39–45 (2002)
Zhang, X., Yap, Y.L., Wei, D., Chen, F., Danchin, A.: Molecular diagnosis of human cancer type by gene expression profiles and independent component analysis. European Journal of Human Genetics 05(9), 1018–4813 (2005)
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Komura, D., Nakamura, H., Tsutsumi, S.: Multidimensional support vector machines for visualization of gene expression data. Bioinformatics 21(4), 439–444 (2005)
Berger, J.A., Hautaniemi, S., Edgren, H., Monni, O., Mitra, S.K., Yli-Harja, O., Astola, J.: Identifying underlying factors in breast cancer using independent component analysis. In: Proceedings of the IEEE International Workshop on Neural Networks for Signal Processing (NNSP 2003), Toulouse, France, September 17-19, 2003, pp. 81–90 (2003)
Berger, J.A., Mitra, S.K., Edgren, H.: Studying DNA microarray data using independent component analysis. In: Proceedings of the International Symposium on Control, Communications, and Signal Processing (ISCCSP 2004), Hammamet, Tunisia, March 21-24, 2004, pp. 747–750 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, H., Wang, J., Zhang, D., Li, S. (2007). Molecular Diagnosis of Tumor Based on Independent Component Analysis and Support Vector Machines. In: Wang, Y., Cheung, Ym., Liu, H. (eds) Computational Intelligence and Security. CIS 2006. Lecture Notes in Computer Science(), vol 4456. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74377-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-540-74377-4_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74376-7
Online ISBN: 978-3-540-74377-4
eBook Packages: Computer ScienceComputer Science (R0)