Abstract
This paper presents an experimental study using different projection strategies and techniques to improve the performance of Support Vector Machine (SVM) ensembles. The study has been made over 62 UCI datasets using Principal Component Analysis (PCA) and three types of Random Projections (RP), taking into account the size of the projected space and using linear SVMs as base classifiers. Random Projections are also combined with the sparse matrix strategy used by Rotation Forests, which is a method based in projections too. Experiments show that for SVMs ensembles (i) sparse matrix strategy leads to the best results, (ii) results improve when projected space dimension is bigger than the original one, and (iii) Random Projections also contribute to the results enhancement when used instead of PCA. Finally, random projected SVMs are tested as base classifiers of some state of the art ensembles, improving their performance.
Similar content being viewed by others
References
Achlioptas D (2003) Database-friendly random projections: Johnson-Lindenstrauss with binary coins. J Comput Syst Sci 66(4):671–687
Bingham E, Mannila H (2001) Random projection in dimensionality reduction: applications to image and text data. In: Knowledge Discovery and Data Mining. ACM, New York, pp 245–250
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46
Dasgupta S, Freund Y (2008) Random projection trees and low dimensional manifolds. In: STOC ’08: proceedings of the 40th annual ACM symposium on theory of computing. ACM, New York, pp 537–546
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Dietterich TG (1998) Approximate statistical test for comparing supervised classification learning algorithms. Neural Comput 10(7):1895–1923
Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9:1871–1874
Fradkin D, Madigan D (2003) Experiments with random projections for machine learning. In: KDD ’03: proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 517–522
Frank A, Asuncion A (2010) UCI machine learning repository. URL http://archive.ics.uci.edu/ml
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. SIGKDD Explor 11(1):10–18
Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844
Johnson W, Lindenstrauss J (1984) Extensions of Lipschitz maps into a Hilbert space. In: Conference in modern analysis and probability (1982, Yale University). Contemporary mathematics, vol 26. AMS, New York, pp 189–206
Kuncheva LI (2004) Combining pattern classifiers: methods and algorithms. Wiley-Interscience, New York
Kuncheva LI, Rodríguez JJ (2007) An experimental study on rotation forest ensembles. In: 7th international workshop on multiple classifier systems, MCS 2007. LNCS, vol 4472. Springer, Berlin, pp 459–468
Margineantu DD, Dietterich TG (1997) Pruning adaptive boosting. In: Proc. 14th international conference on machine learning. Morgan Kaufmann, San Mateo, pp 211–218
Maudes J, Rodríguez JJ, García-Osorio C (2009) Disturbing neighbors diversity for decision forests. In: Okun O, Valentini G (eds) Applications of supervised and unsupervised ensemble methods. Studies in computational intelligence, vol 245. Springer, Berlin, pp 113–133
Rodríguez JJ, Kuncheva LI, Alonso CJ (2006) Rotation forest: A new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell 28(10):1619–1630
Schclar A, Rokach L (2009) Random projection ensemble classifiers. In: Enterprise information systems 11th international conference proceedings. Lecture notes in business information processing, pp 309–316
Vapnik VN (1999) The nature of statistical learning theory. Information science and statistics. Springer, Berlin
GI Webb (2000) Multiboosting: a technique for combining boosting and wagging. Mach Learn 40(2)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Maudes, J., Rodríguez, J.J., García-Osorio, C. et al. Random projections for linear SVM ensembles. Appl Intell 34, 347–359 (2011). https://doi.org/10.1007/s10489-011-0283-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-011-0283-2