Abstract
DNA microarrays provide rich profiles that are used in cancer prediction considering the gene expression levels across a collection of samples.Support Vector Machines (SVM), have been applied to the classification of cancer samples with encouraging results. However, they are usually based on Euclidean distances that fail to reflect accurately the sample proximities. Besides, SVM classifiers based on non-Euclidean dissimilarities fail to reduce significantly the errors. In this paper, we propose an ensemble of SVM classifiers in order to reduce the errors. The diversity among classifiers is induced considering a set of complementary dissimilarities and kernels. The experimental results suggest that that our algorithm improves classifiers based on a single dissimilarity and a combination strategy such as Bagging.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, C. C.: Re-designing Distance Functions and Distance-Based Applications for High Dimensional Applications, in Proc. of the ACM International Conference on Management of Data and Symposium on Principles of Database Systems (SIGMODPODS), vol. 1, March 2001, pp. 13–18.
Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays. Proc. Nat’l Acad Sci USA, 96:6745–6750, 1999.
Bauer, E., Kohavi, R.: An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants, Machine Learning, vol. 36, pp. 105–139, 1999.
Braga-Neto, U., Dougherty, E.: Is Cross-Validation Valid for Small-Sample Microarray Classification? Bioinformatics, vol. 20, no. 3, pp. 374–380, 2004.
Breiman, L.: Bagging predictors, Machine Learning, vol. 24, pp. 123–140, 1996.
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge: Cambridge University Press, 2000.
Drãghici, S.: Data Analysis Tools for DNA Microarrays. New York: Chapman & Hall/CRC Press, 2003.
Furey, T., Cristianini, N., Duffy, N., Bednarski, D., Schummer, M., Haussler, D.: Support Vector Machine Classification and Validation of Cancer Tissue Samples Using Microarray Expression Data, Bioinformatics, vol. 16, no. 10, pp. 906–914, 2000.
Gentleman, R., Carey, V., Huber, W., Irizarry, R., Dudoit, S.: Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Berlin: Springer Verlag, 2006.
Golub, T., Slonim, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M., Downing, J., Caligiuri, M., Bloomfield, C., Lander, E.: Molecular classification of cancer: Class Discovery and Class Prediction by Gene Expression Monitoring, Science, vol. 286, no. 15, pp. 531–537, 1999.
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene Selection for Cancer Classification Using Support Vector Machines, Machine Learning, vol. 46, pp. 389–422, 2002.
Hinneburg C. C. A., Keim, D. A.: What is the Nearest Neighbor in High Dimensional Spaces? In Proc. of the International Conference on Database Theory (ICDT). Cairo, Egypt: Morgan Kaufmann, September 2000, pp. 506–515.
Jiang, D., Tang, C. Zhang, A.: Cluster Analysis for Gene Expression Data: A survey, IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 11, November 2004.
Kuncheva, L. I.: Combining Pattern Classifiers. John Wiley, New Jersey, 2004.
Martín-Merino, M., Muñoz, A.: Self Organizing Map and Sammon Mapping for Asymmetric Proximities, Neurocomputing, vol. 63, pp. 171–192, 2005.
Martín-Merino, M., Muñoz, A.: A New Sammon Algorithm for Sparse Data Visualization, In International Conference on Pattern Recognition (ICPR), vol. 1. Cambridge (UK): IEEE Press, August 2004, pp. 477–481.
Molinaro, A., Simon, R. Pfeiffer, R.: Prediction Error Estimation: a Comparison of Resampling Methods, Bioinformatics, vol. 21, no. 15, pp. 3301–3307, 2005.
Pekalska, E., Paclick, P., Duin, R.: A Generalized Kernel Approach to Dissimilarity-Based Classification,” Journal of Machine Learning Research, vol. 2, pp. 175–211, 2001.
Valentini, G., Dietterich, T.: Bias-Variance Analysis of Support Vector Machines for the Development of Svm-Based Ensemble Methods, Journal of Machine Learning Research, vol. 5, pp. 725–775, 2004.
Vapnik, V.: Statistical Learning Theory. New York: John Wiley & Sons, 1998.
West, M., Blanchette, C., Dressman, H., Huang, E., Ishida, S., Spang, R., Zuzan, H., Olson, J., Marks, J., Nevins, J.: Predicting the Clinical Status of Human Breast Cancer by Using Gene Expression Profiles, PNAS, vol. 98, no. 20, September 2001.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Blanco, Á., Martín-Merino, M., Rivas, J.D.L. (2007). Ensemble of Support Vector Machines to Improve the Cancer Class Prediction Based on the Gene Expression Profiles. In: Corchado, E., Corchado, J.M., Abraham, A. (eds) Innovations in Hybrid Intelligent Systems. Advances in Soft Computing, vol 44. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74972-1_51
Download citation
DOI: https://doi.org/10.1007/978-3-540-74972-1_51
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74971-4
Online ISBN: 978-3-540-74972-1
eBook Packages: EngineeringEngineering (R0)