skip to main content
10.1145/2020408.2020420acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Trading representability for scalability: adaptive multi-hyperplane machine for nonlinear classification

Published: 21 August 2011 Publication History

Abstract

Support Vector Machines (SVMs) are among the most popular and successful classification algorithms. Kernel SVMs often reach state-of-the-art accuracies, but suffer from the curse of kernelization due to linear model growth with data size on noisy data. Linear SVMs have the ability to efficiently learn from truly large data, but they are applicable to a limited number of domains due to low representational power. To fill the representability and scalability gap between linear and nonlinear SVMs, we propose the Adaptive Multi-hyperplane Machine (AMM) algorithm that accomplishes fast training and prediction and has capability to solve nonlinear classification problems. AMM model consists of a set of hyperplanes (weights), each assigned to one of the multiple classes, and predicts based on the associated class of the weight that provides the largest prediction. The number of weights is automatically determined through an iterative algorithm based on the stochastic gradient descent algorithm which is guaranteed to converge to a local optimum. Since the generalization bound decreases with the number of weights, a weight pruning mechanism is proposed and analyzed. The experiments on several large data sets show that AMM is nearly as fast during training and prediction as the state-of-the-art linear SVM solver and that it can be orders of magnitude faster than kernel SVM. In accuracy, AMM is somewhere between linear and kernel SVMs. For example, on an OCR task with 8 million highly dimensional training examples, AMM trained in 300 seconds on a single-core processor had 0.54% error rate, which was significantly lower than 2.03% error rate of a linear SVM trained in the same time and comparable to 0.43% error rate of a kernel SVM trained in 2 days on 512 processors. The results indicate that AMM could be an attractive option when solving large-scale classification problems. The software is available at www.dabi.temple.edu/~vucetic/AMM.html.

References

[1]
F. Aiolli and A. Sperduti. Multi-class classification with multi-prototype support vector machines. Journal of Machine Learning Research, 2005.
[2]
A. Bordes, L. Bottou, and P. Gallinari. Sgd-qn: careful quasi-newton stochastic gradient descent. Journal of Machine Learning Research, 2009.
[3]
A. Bordes, S. Ertekin, J. Weston, and L. Bottou. Fast kernel classifiers for online and active learning. Journal of Machine Learning Research, 2005.
[4]
C.-C. Chang and C.-J. Lin. Libsvm: a library for support vector machines, http://www.csie.ntu.edu.tw/~cjlin/libsvm. 2001.
[5]
Y.-W. Chang, K.-W. C. C.-J. Hsie and, M. Ringgaard, and C.-J. Lin. Training and testing low-degree polynomial data mappings via linear svm. Journal of Machine Learning Research, 2010.
[6]
R. Collobert, F. Sinz, J. Weston, and L. Bottou. Trading convexity for scalability. In International Conference on Machine Learning, 2006.
[7]
Cortes and Vapnik. Support-vector networks. Machine Learning, 1995.
[8]
K. Crammer and Y. Singer. On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research, 2001.
[9]
C.-J. Hsieh, K.-W. Chang, C.-J. Lin, S. S. Keerthi, and S. Sundararajan. A dual coordinate descent method for large-scale linear svm. In International Conference on Machine Learning, 2008.
[10]
T. Joachims. Training linear svms in linear time. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2006.
[11]
J. Kivinen, A. J. Smola, and R. C. Williamson. Online learning with kernels. IEEE Transactions on Signal Processing, 2002.
[12]
G. Loosli, S. Canu, and L. Bottou. Training invariant support vector machines using selective sampling. Large Scale Kernel Machines, Cam-bridge, MA, MIT Press, 2007.
[13]
J. Ma, L. K. Saul, S. Savage, and G. M. Voelker. Identifying suspicious urls: An application of large-scale online learning. In International Conference on Machine Learning, 2009.
[14]
J. Platt. Fast training of support vector machines using sequential minimal optimization. Advances in kernel methods - support vector learning, MIT Press, 1998.
[15]
J. Platt, N. Cristianini, and J. S. Taylor. Large margin dags for multiclass classification. In Advance in Nueral Information Processing Systems, 2000.
[16]
A. Rahimi and B. Recht. Random features for large-scale kernel machines. In Advance in Nueral Information Processing Systems, 2007.
[17]
S. Shalev-Shwartz and Y. Singer. Logarithmic regret algorithms for strongly convex repeated games (technical report). The Hebrew University, 2007.
[18]
S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: primal estimated sub-gradient solver for svm. In International Conference on Machine Learning, 2007.
[19]
S. Sonnenburg and V. Franc. Coffin : a computational framework for linear svms. In International Conference on Machine Learning, 2010.
[20]
I. Steinwart. Sparseness of support vector machines. Journal of Machine Learning Research, 2003.
[21]
C. Teo, S. V. N. Vishwanathan, A. J. Smola, and Q. V. Le. Bundle methods for regularized risk minimization. Journal of Machine Learning Research, 2010.
[22]
I. W. Tsang, J. T. Kwok, and P.-M. Cheung. Core vector machines: fast svm training on very large data sets. Journal of Machine Learning Research, 2005.
[23]
S. V. N. Vishwanathan, A. J. Smola, and M. N.Murty. Simplesvm. In International Conference on Machine Learning, 2003.
[24]
Z. Wang, K. Crammer, and S.Vucetic. Multi-class pegasos on a budget. In International Conference on Machine Learning, 2010.
[25]
Z. Wang and S. Vucetic. Online training on a budget of support vector machines using twin prototypes. Statisitcal Analysis and Data Mining Journal, 2010.
[26]
H.-F. Yu, C.-J. Hsieh, K.-W. Chang, and C.-J. Lin. Large linear classification when data cannot fit in memory. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2010.
[27]
T. Zhang. Solving large scale linear prediction problems using stochastic gradient descent. In International Conference on Machine Learning, 2004.
[28]
Z. A. Zhu, W. Chen, G. Wang, C. Zhu, and Z. Chen. P-packsvm: parallel primal gradient descent kernel svm. In IEEE International Conference on Data Mining, 2009.

Cited By

View all
  • (2023)Recasting self-attention with holographic reduced representationsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618431(490-507)Online publication date: 23-Jul-2023
  • (2022)Online Support Vector Machine with a Single Pass for Streaming DataMathematics10.3390/math1017311310:17(3113)Online publication date: 30-Aug-2022
  • (2021)Convex polytope treesProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3540646(5038-5051)Online publication date: 6-Dec-2021
  • Show More Cited By

Index Terms

  1. Trading representability for scalability: adaptive multi-hyperplane machine for nonlinear classification

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2011
      1446 pages
      ISBN:9781450308137
      DOI:10.1145/2020408
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 21 August 2011

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. large-scale learning
      2. nonlinear classification
      3. stochastic gradient descent
      4. support vector machines

      Qualifiers

      • Research-article

      Conference

      KDD '11
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

      Upcoming Conference

      KDD '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)13
      • Downloads (Last 6 weeks)8
      Reflects downloads up to 19 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Recasting self-attention with holographic reduced representationsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618431(490-507)Online publication date: 23-Jul-2023
      • (2022)Online Support Vector Machine with a Single Pass for Streaming DataMathematics10.3390/math1017311310:17(3113)Online publication date: 30-Aug-2022
      • (2021)Convex polytope treesProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3540646(5038-5051)Online publication date: 6-Dec-2021
      • (2021)Robotic welding for filling shape-varying geometry using weld profile control with data-driven fast input allocationMechatronics10.1016/j.mechatronics.2021.10265779(102657)Online publication date: Nov-2021
      • (2021)Application of large-scale L2-SVM for microarray classificationThe Journal of Supercomputing10.1007/s11227-021-03962-7Online publication date: 28-Jun-2021
      • (2020)Growing adaptive multi-hyperplane machinesProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525178(2567-2576)Online publication date: 13-Jul-2020
      • (2019)Scaling Up Kernel SVM on Limited Resources: A Low-Rank Linearization ApproachIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2018.283814030:2(369-378)Online publication date: Feb-2019
      • (2019)Training neural networks on high-dimensional data using random projectionPattern Analysis & Applications10.1007/s10044-018-0697-022:3(1221-1231)Online publication date: 1-Aug-2019
      • (2018)Parsimonious Bayesian deep networksProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327144.3327240(3194-3204)Online publication date: 3-Dec-2018
      • (2018)Towards an Automated Unsupervised Mobility Assessment for Older People Based on Inertial TUG MeasurementsSensors10.3390/s1810331018:10(3310)Online publication date: 2-Oct-2018
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media