Abstract
The new boosting of Least-Squares SVM (LS-SVM), Proximal SVM (PSVM), Newton SVM (NSVM) algorithms aim at classifying very large datasets on standard personal computers (PCs). We extend the PSVM, LS-SVM and NSVM in several ways to efficiently classify large datasets. We developed a row incremental version for datasets with billions of data points. By adding a Tikhonov regularization term and using the Sherman-Morrison-Woodbury formula, we developed new algorihms to process datasets with a small number of data points but very high dimensionality. Finally, by applying boosting including AdaBoost and Arcx4 to these algorithms, we developed classification algorithms for massive, very-high-dimensional datasets. Numerical test results on large datasets from the UCI repository showed that our algorithms are often significantly faster and/or more accurate than state-of-the-art algorithms LibSVM, CB-SVM, SVM-perf and LIBLINEAR.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Guyon, I.: Web page on svm applications (1999). http://www.clopinet.com/isabelle/Projects/-SVM/app-list.html
Fayyad, U., Piatetsky-Shapiro, G., Uthurusamy, R.: Summary from the kdd-03 panel - data mining: the next 10 years. SIGKDD Explor. 5(2), 191–196 (2004)
Lyman, P., Varian, H.R., Swearingen, K., Charles, P., Good, N., Jordan, L., Pal, J.: How much information (2003). http://www.sims.berkeley.edu/research/projects/how-much-info-2003/
Boser, B., Guyon, I., Vapnik, V.: An training algorithm for optimal margin classifiers. In: Proceedings of 5th ACM Annual Workshop on Computational Learning Theoryof 5th ACM Annual Workshop on Computational Learning Theory, pp. 144–152. ACM (1992)
Osuna, E., Freund, R., Girosi, F.: An improved training algorithm for support vector machines. In: Principe, J., Gile, L., Morgan, N., Wilson, E., (eds.) Neural Networks for Signal Processing VII, pp. 276–285 (1997)
Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning, pp. 185–208. MIT Press, Cambridge (1999)
Cauwenberghs, G., Poggio, T.: Incremental and decremental support vector machine learning. Adv. Neural Inf. Process. Syst. 13, 409–415 (2001)
Do, T.N., Poulet, F.: Incremental svm and visualization tools for bio-medical data mining. In: Proceedings of Workshop on Data Mining and Text Mining in Bioinformatics, pp. 14–19 (2003)
Do, T.N., Poulet, F.: Towards high dimensional data mining with boosting of psvm and visualization tools. In: Proceedings of 6th International Conference on Entreprise Information Systems, pp. 36–41 (2004)
Do, T.N., Poulet, F.: Classifying one billion data with a new distributed svm algorithm. In: Proceedings of 4th IEEE International Conference on Computer Science, Research, Innovation and Vision for the Future, pp. 59–66. IEEE Press (2006)
Fung, G., Mangasarian, O.: Incremental support vector machine classification. In: Proceedings of the 2nd SIAM International Conference on Data Mining (2002)
Poulet, F., Do, T.N.: Mining very large datasets with support vector machine algorithms. In: Camp, O., Filipe, J., Hammoudi, S., Piattini, M., et al. (eds.) Enterprise Information Systems V, pp. 177–184. Kluwer Academic Publishers, Dordrecht (2004)
Do, T.N., Le-Thi, H.A.: Classifying large datasets with svm. In: Proceedings of 4th International Conference on Computational Management Science (2007)
Syed, N., Liu, H., Sung, K.: Incremental learning with support vector machines. In: Proceedings of the ACM SIGKDD International Conference on KDD. ACM (1999)
Do, T.N., Poulet, F.: Mining very large datasets with svm and visualization. In: Proceedings of 7th International Conference on Entreprise Information Systems, pp. 127–134 (2005)
Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. In: Proceedings of the 17th International Conference on Machine Learning, pp. 999–1006. ACM (2000)
Suykens, J., Vandewalle, J.: Least squares support vector machines classifiers. Neural Process. Lett. 9(3), 293–300 (1999)
Fung, G., Mangasarian, O.: Proximal support vector classifiers. In: Proceedings of the ACM SIGKDD International Conference on KDD, pp. 77–86. ACM (2001)
Mangasarian, O.: A finite newton method for classification problems. Technical report 01–11, Data Mining Institute, Computer Sciences Department, University of Wisconsin (2001)
Tikhonov, A.N.: On the stability of inverse problems. Dokl Akad. Nauk SSSR 39(5), 195–198 (1943)
Golub, G., Loan, C.V.: Matrix Computations, 3rd edn. John Hopkins University Press, Baltimore (1996)
Rao, C.: Linear Statistical Inference and Its applications. Wiley, New York (1965)
Freund, Y., Schapire, R.: A short introduction to boosting. J. Japan. Soc. Artif. Intell. 14(5), 771–780 (1999)
Breiman, L.: Arcing classifiers. Ann. Stat. 26(3), 801–849 (1998)
Mangasarian, O., Musicant, D.: Lagrangian support vector machines. J. Mach. Learn. Res. 1, 161–177 (2001)
Frank, A., Asuncion, A.: UCI machine learning repository (2010). http://www.ics.uci.edu/~mlearn/MLRepository.html
Lewis, D.: Reuters-21578 text classification test collection (1997). http://www.david-dlewis.com/resources/testcollections/reuters21578/
Chang, C.C., Lin, C.J.: LIBSVM - a library for support vector machines (2001). http://www.csie.ntu.edu.tw/~cjlin/libsvm
Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9(4), 1871–1874 (2008)
Joachims, T.: Training linear svms in linear time. In: Proceedings of the ACM SIGKDD International Conference on KDD, pp. 217–226. ACM (2006)
Yu, H., Yang, J., Han, J.: Classifying large data sets using svms with hierarchical clusters. In: Proceedings of the ACM SIGKDD International Conference on KDD, pp. 306–315. ACM (2003)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge (2000)
Reyzin, L., Schapire, R.: How boosting the margin can also boost classifier complexity. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 753–760. ACM (2006)
Dongarra, J., Pozo, R., Walker, D.: LAPACK++: a design overview of object-oriented extensions for high performance linear algebra. In: Proceedings of Supercomputing, pp. 162–171 (1993)
McCallum, A.: Bow: a toolkit for statistical language modeling, text retrieval, classification and clustering (1998). http://www-2.cs.cmu.edu/~mccallum/bow
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Do, T.N., Le Thi, H.A. (2015). Massive Classification with Support Vector Machines. In: Nguyen, N. (eds) Transactions on Computational Collective Intelligence XVIII. Lecture Notes in Computer Science(), vol 9240. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48145-5_8
Download citation
DOI: https://doi.org/10.1007/978-3-662-48145-5_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-48144-8
Online ISBN: 978-3-662-48145-5
eBook Packages: Computer ScienceComputer Science (R0)