Abstract
This paper describes a novel method for improving classification of support vector machines (SVM) with recursive feature selection (SVM-RFE) when applied to cancer classification with gene expression data. The method employs pairs of support vectors of a linear SVM-RFE classifier for generating a sequence of new SVM classifiers, called local support classifiers. This sequence is used in two Bayesian learning techniques: as ensemble of classifiers in Optimal Bayes, and as attributes in Naive Bayes. The resulting classifiers are applied to four publically available gene expression datasets from leukemia, ovarian, lymphoma, and colon cancer data, respectively. The results indicate that the proposed approach improves significantly the predictive performance of the baseline SVM classifier, its stability and robustness, with satisfactory results on all datasets. In particular, perfect classification is achieved on the leukemia and ovarian cancer datasets.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Alon, U., Barkai, N., Notterman, D.A., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. PNAS 96, 6745–6750 (1999)
Ben-Dor, A., Bruhn, L., Friedman, N., et al.: Tissue classification with gene expression profiles. Journal of Computational Biology 7, 559–584 (2000)
Liu, B., Cui, Q., Jiang, T., et al.: A combinational feature selection and ensemble neural network method for classification of gene expression data. BMC Bioinformatics 5(136) (2004)
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
Breiman, L.: Arcing classifiers. The Annals of Statistics 26(3), 801–849 (1998)
Brown, M., Grundy, W., Lin, D., Cristianini, N., Sugnet, C., Furey, T., Ares Jr., M., Haussler, D.: Knowledge-based analysis of microarray gene expression data by using support vector machines. In: Proc. Natl. Acad. Sci., vol. 97, pp. 262–267 (2000)
Cho, S., Won, H.: Machine learning in DNA microarray analysis for cancer classification. In: Proceedings of the First Asia-Pacific bioinformatics conference on Bioinformatics 2003, pp. 189–198. Australian Computer Society (2003)
Cristianini, N., Shawe-Taylor, J.: Support Vector machines. Cambridge Press, New York (2000)
Dettling, M., Buhlmann, P.: Boosting for tumor classification with gene expression data. BMC Bioinformatics 19, 1061–1069 (2003)
Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association 97(457) (2002)
Eisen, M.B., Brown, P.O.: DNA arrays for analysis of gene expression. Methods Enzymbol. (303), 179–205 (1999)
Evgeniou, T., Pontil, M., Elisseeff, A.: Leave one out error, stability, and generalization of voting combinations of classifiers. Mach. Learn. 55(1), 71–97 (2004)
Furey, T.S., Christianini, N., Duffy, N., Bednarski, D.W., Schummer, M., Hauessler, D.: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10), 906–914 (2000)
Golub, T.R., Slonim, D.K., Tamayo, P., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1-3), 389–422 (2002)
Jong, K., Mary, J., Cornuejols, A., Marchiori, E., Sebag, M.: Ensemble feature ranking. In: Proceedings Eur. Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML/PKDD 2004 (2004)
Khan, J., Wei, J.S., Ringner, M., et al.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine 7, 673–679 (2001)
Li, D., Weinberg, L., Pedersen: Gene assessment and sample classification for gene expression data using a genetic algorithm/k-nearest neighbor method. Combinatorial Chemistry and High Throughput Screening 4(8), 727–739 (2001)
Lossos, I., Alizadeh, A., Eisen, M., et al.: Ongoing immunoglobulin somatic mutation in germinal center b cell-like but not in activated b cell-like diffuse large cell lymphomas. In: Proc. Natl. Acad. Sci. USA, vol. 97, pp. 10209–10213 (2000)
Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)
Noble, W.S.: Support vector machine applications in computational biology. In: Schoelkopf, B., Tsuda, K., Vert, J.-P. (eds.) Kernel Methods in Computational Biology, pp. 71–92. MIT Press, Cambridge (2004)
Schummer, M., Ng, W.V., Bumgarnerd, R.E., et al.: Comparative hybridization of an array of 21,500 ovarian cdnas for the discovery of genes overexpressed in ovarian carcinomas. Gene. 238(2), 375–385 (1999)
Tan, A.C., Gilbert, D.: Ensemble machine learning on gene expression data for cancer classification. Applied Bioinformatics 2((3 Suppl.), 75–83 (2003)
Vapnik, V.N.: Statistical Learning Theory. John Wiley & Sons, Chichester (1998)
Xu, Y., Selaru, F.M., Yin, J., et al.: Artificial neural networks and gene filtering distinguish between global gene expression profiles of barrett’s esophagus and esophageal cancer. Cancer Research 62(12), 3493–3497 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Marchiori, E., Sebag, M. (2005). Bayesian Learning with Local Support Vector Machines for Cancer Classification with Gene Expression Data. In: Rothlauf, F., et al. Applications of Evolutionary Computing. EvoWorkshops 2005. Lecture Notes in Computer Science, vol 3449. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-32003-6_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-32003-6_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25396-9
Online ISBN: 978-3-540-32003-6
eBook Packages: Computer ScienceComputer Science (R0)