Abstract
Support Vector Machines (SVM) are powerful machine learning techniques that are able to deal with high dimensional and noisy data. They have been successfully applied to a wide range of problems and particularly to the analysis of gene expression data.
However SVM algorithms rely usually on the use of the Euclidean distance that often fails to reflect the object proximities. Several versions of the SVM have been proposed that incorporate non Euclidean dissimilarities. Nevertheless, different dissimilarities reflect complementary features of the data and no one can be considered superior to the others. In this paper, we present an ensemble of SVM classifiers that reduces the misclassification error combining different dissimilarities. The method proposed has been applied to identify cancerous tissues using Microarray gene expression data with remarkable results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.C.: Re-designing distance functions and distance-based applications for high dimensional applications. In: Proc. of SIGMOD-PODS, vol. 1, pp. 13–18 (2001)
Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning 36, 105–139 (1999)
Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)
Cox, T., Cox, M.: Multidimensional Scaling, 2nd edn. Chapman & Hall/CRC Press, Boca Raton, USA (2001)
Drãghici, S.: Data Analysis Tools for DNA Microarrays. Chapman & Hall/CRC Press, New York (2003)
Gentleman, R., Carey, V., Huber, W., Irizarry, R., Dudoit, S.: Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer, Berlin (2006)
Golub, G.H., Loan, C.F.V.: Matrix Computations, 3rd edn. Johns Hopkins University press, Baltimore, Maryland, USA (1996)
Golub, T., Slonim, D., Tamayo, P.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286(15), 531–537 (1999)
Hinneburg, C.C.A.A., Keim, D.A.: What is the nearest neighbor in high dimensional spaces? In: Eiter, T., Libkin, L. (eds.) ICDT 2005. LNCS, vol. 3363, pp. 506–515. Springer, Heidelberg (2004)
Jiang, D., Tang, C., Zhang, A.: Cluster analysis for gene expression data: A survey. IEEE Transactions on Knowledge and Data Engineering 16(11) (November 2004)
Kittler, J., Hatef, M., Duin, R., Matas, J.: On combining classifiers. IEEE Transactions on Neural Networks 20(3), 228–239 (1998)
Martín-Merino, M., Muñoz, A.: A new Sammon algorithm for sparse data visualization. In: International Conference on Pattern Recognition (ICPR), vol. 1, pp. 477–481. IEEE Press, Cambridge (UK) (2004)
Martín-Merino, M., Muñoz, A.: Self organizing map and Sammon mapping for asymmetric proximities. Neurocomputing 63, 171–192 (2005)
Molinaro, A., Simon, R., Pfeiffer, R.: Prediction error estimation: a comparison of resampling methods. Bioinformatics 21(15), 3301–3307 (2005)
Pekalska, E., Paclick, P., Duin, R.: A generalized kernel approach to dissimilarity-based classification. Journal of Machine Learning Research 2, 175–211 (2001)
Valentini, G., Dietterich, T.: Bias-variance analysis of support vector machines for the development of svm-based ensemble methods. Journal of Machine Learning Research 5, 725–775 (2004)
Vapnik, V.: Statistical Learning Theory. John Wiley & Sons, New York (1998)
West, M., Blanchette, C., Dressman, H., Huang, E., Ishida, S., Spang, R., Zuzan, H., Olson, J., Marks, J., Nevins, J.: Predicting the clinical status of human breast cancer by using gene expression profiles. PNAS, 98(20) (September 2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Blanco, Á., Martín-Merino, M. (2007). Boosting Support Vector Machines Using Multiple Dissimilarities. In: Apolloni, B., Howlett, R.J., Jain, L. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2007. Lecture Notes in Computer Science(), vol 4692. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74819-9_18
Download citation
DOI: https://doi.org/10.1007/978-3-540-74819-9_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74817-5
Online ISBN: 978-3-540-74819-9
eBook Packages: Computer ScienceComputer Science (R0)