ABSTRACT
This paper explores the use of Tensor Engines to accelerate nonlinear and linear SVM training. Support Vector Machine(SVM) is a classical machine learning model for classification and regression and remains to be the state-of-the-art model for some tasks such as text classification and bioinformatics. However large scale SVM training is still challenging because of its high computational complexity. This is especially severe for non-linear SVM with kernel tricks. On the other hand, the surging importance of neural networks fuels the emergence of specialized processors called Tensor Units (TensorCore in GPU and Tensor Processing Unit of Google) which are characterized by extreme efficiency and very limited precision and range. This paper proposes a TensorCore GPU based SVM algorithm and software system that is faster and more scalable than state-of-the-art SVM solvers. It includes a fast, accurate low-rank Gram matrix approximation that effectively utilizes the TensorCore in GPU and a primal-dual interior-point method to solve the quadratic program with a fast and predictable convergence rate. The random projection based Gram matrix approximation can be substantially accelerated by TensorCore on GPU.
This exploration ends up with a tale of randomized numerical linear algebra, convex optimization, and high performance computing on Tensor Engines. Particularly, this paper suggests that the emerging randomized numerical linear algebra algorithms and Tensor Engines are synergistic in opening up exciting new application areas that include statistical machine learning and the wider scientific/engineering computing.
- Chih-chung Chang, Chih-jen Lin, and Tijmen Tieleman. 2008. LIBSVM : A Library for Support Vector Machines. ACM Transactions on Intelligent Systems and Technology (TIST) 307 (2008), 1--39. arXiv: 0-387-31073-8 ISBN: 2157--6904. Google ScholarDigital Library
- Edward Y. Chang, Kaihua Zhu, Hao Wang, and Hongjie Bai. 2008. PSVM: Parallelizing Support Vector Machines on Distributed Computers. In NIPS. ISSN: 0016-5085. Google ScholarCross Ref
- Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine Learning (1995), 273--297.Google Scholar
- Abdul Dakkak, Cheng Li, Isaac Gelado, Jinjun Xiong, and Wen-mei Hwu. 2019. Accelerating Reduction and Scan Using Tensor Core Units. Proceedings of the ACM International Conference on Supercomputing - ICS '19 (2019), 46--57. arXiv: 1811.09736. Google ScholarDigital Library
- Jyotikrishna Dass, V. N. S. Prithvi Sakuru, Vivek Sarin, and Rabi N. Mahapatra. 2017. Distributed QR Decomposition Framework for Training Support Vector Machines. Proceedings - International Conference on Distributed Computing Systems (2017), 753--763. ISBN: 9781538617915. Google ScholarCross Ref
- James Demmel, Laura Grigori, Mark Hoemmen, and Julien Langou. 2012. Communication-optimal Parallel and Sequential QR and LU Factorizations. SIAM Journal on Scientific Computing 34, 1 (Jan. 2012), A206--A239. Google ScholarDigital Library
- Nemanja Djuric, Liang Lan, Slobodan Vucetic, and Zhuang Wang. 2013. BudgetedSVM: A Toolbox for Scalable SVM Approximations. Journal of Machine Learning Research 14 (2013), 3813--3817. http://jmlr.org/papers/v14/djuric13a.html ISBN: 1532-4435.Google ScholarDigital Library
- Jack Dongarra, Mark Gates, Azzam Haidar, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, and Ichitaro Yamazaki. 2014. Accelerating numerical dense linear algebra calculations with GPUs. Numerical Computations with GPUs (2014), 3--28. ISBN: 9783319065489. Google ScholarCross Ref
- Petros Drineas and Michael W. Mahoney. 2005. Approximating a Gram Matrix for Improved Kernel-Based Learning. Journal of Machine Learning Research 6 (2005), 2153--2175. ISBN: 3540265562. Google ScholarDigital Library
- Petros Drineas and Michael W Mahoney. 2016. RandNLA: randomized numerical linear algebra. Commun. ACM 59, 6 (2016), 80--90.Google ScholarDigital Library
- Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A Library for Large Linear Classification. Journal of Machine Learning Research 9 (2008), 1871--1874.Google ScholarDigital Library
- Shai Fine and Katya Scheinberg. 2001. Efficient SVM Training Using Low-Rank Kernel Representations. Journal of Machine Learning Research 2 (2001), 243--264. ISBN: 0048-9697. Google ScholarCross Ref
- Azzam Haidar, Ahmad Abdelfattah, Mawussi Zounon, Panruo Wu, Srikara Pranesh, Stanimire Tomov, and Jack J. Dongarra. 2018. The Design of Fast and Energy-Efficient Linear Solvers: On the Potential of Half-Precision Arithmetic and Iterative Refinement Techniques. In Computational Science - ICCS 2018 - 18th International Conference, Wuxi, China, June 11-13, 2018, Proceedings, Part I. 586--600. Google ScholarCross Ref
- Azzam Haidar, Stanimire Tomov, Jack Dongarra, and Nicholas J Higham. 2018. Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up Mixed-Precision Iterative Refinement Solvers. In SC.Google Scholar
- Azzam Haidar, Panruo Wu, Stanimire Tomov, and Jack Dongarra. 2017. Investigating Half Precision Arithmetic to Accelerate Dense Linear System Solvers. In 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems.Google ScholarDigital Library
- Azzam Haidar, Panruo Wu, Stanimire Tomov, and Jack J. Dongarra. 2017. Investigating half precision arithmetic to accelerate dense linear system solvers. In Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ScalA@SC 2017, Denver, CO, USA, November 13, 2017. 10:1--10:8. Google ScholarDigital Library
- Cho-Jui Hsieh, Kai-Wei Chang, Chih-Jen Lin, S. Sathiya Keerthi, and S. Sundararajan. 2008. A dual coordinate descent method for large-scale linear SVM. Proceedings of the 25th international conference on Machine learning - ICML '08 2 (2008), 408--415. ISBN: 9781605582054. Google ScholarDigital Library
- Cho-Jui Hsieh and Inderjit S Dhillon. 2014. A Divide-and-Conquer Solver for Kernel Support Vector Machines. (2014), 16.Google Scholar
- Zhe Jia, Marco Maggioni, Benjamin Staiger, and Daniele P. Scarpazza. 2018. Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking. (2018). http://arxiv.org/abs/1804.06826 arXiv: 1804.06826.Google Scholar
- Ching-Pei Lee and Dan Roth. 2015. Distributed Box-Constrained Quadratic Optimization for Dual Linear SVM. Proceedings of the 32nd International Conference on Machine Learning 37 (2015), 987--996. http://proceedings.mlr.press/v37/leea15.html ISBN: 9781510810587.Google Scholar
- Ang Li, Shuaiwen Leon Song, Mark Wijtvliet, Akash Kumar, and Henk Corporaal. 2016. SFU-Driven Transparent Approximation Acceleration on GPUs. In Proceedings of the 2016 International Conference on Supercomputing - ICS '16. ACM Press, Istanbul, Turkey, 1--14. Google ScholarDigital Library
- Chih-jen Jen Lin, Ruby C Weng, and Sathiya Sathiya Keerthi. 2008. Trust Region Newton Method for Logistic Regression. Journal of Machine Learning Research 9 (2008), 627--650. ISBN: 9781595937933. Google ScholarDigital Library
- Stefano Markidis, Steven Wei Der Chien, Erwin Laure, Ivy Bo Peng, and Jeffrey S. Vetter. 2018. NVIDIA tensor core programmability, performance & precision. Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018 (2018), 522--531. arXiv: 1803.04014 ISBN: 9781538655559. Google ScholarCross Ref
- Per-Gunnar Martinsson. 2016. Randomized methods for matrix computations. arXiv:1607.01649 [math] (July 2016). http://arxiv.org/abs/1607.01649 arXiv:1607.01649.Google Scholar
- Per-Gunnar Martinsson and Sergey Voronin. 2016. A randomized blocked algorithm for efficiently computing rank-revealing factorizations of matrices. SIAM J. Sci. Comput. 38, 5 (2016), 485--507. arXiv: 1503.07157. Google ScholarCross Ref
- Sanjay Mehrotra. 1992. On the Implementation of a Primal-Dual Interior Point Method. SIAM Journal on Optimization 2, 4 (1992), 575--601. ISBN: 9783540354451. Google ScholarDigital Library
- Nvidia. 2017. NVIDIA TESLA V100 GPU ARCHITECTURE. Technical Report. 53 pages. Issue: v1.1.Google Scholar
- John C Platt. 1998. Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. Technical Report. Microsoft Research. 1--21 pages.Google Scholar
- K Scheinberg. 2006. An efficient implementation of an active set method for SVMs. Journal of Machine Learning Research 7 (2006), 2237--2257. ISBN: 1532-4435.Google ScholarDigital Library
- Shai Shalev-Shwartz, Yoram Singer, Nathan Srebro, and Andrew Cotter. 2011. Pegasos: Primal estimated sub-gradient solver for SVM. Mathematical Programming 127, 1 (2011), 3--30. ISBN: 9781595937933. Google ScholarCross Ref
- Si Si, Cho-Jui Hsieh, and Inderjit Dhillon. 2017. Memory Efficient Kernel Approximation. Journal of Machine Learning Research 18 (2017). ISBN: 9781634393973. Google ScholarCross Ref
- Alexander J Smola and Bernhard Holkopf. 2000. Sparse Greedy Matrix Approximation for Machine Learning. Proceedings of the Seventeenth International Conference on Machine Learning (2000), 911--918. https://doi.org/1558607072 arXiv: 1558607072 ISBN: 1-55860-707-2.Google Scholar
- Amber Thomas. 2017. 2017 The State of Data Science & Machine Learning. https://www.kaggle.com/surveys/2017Google Scholar
- Zeyi Wen, Jiashuai Shi, Bingsheng He, Qinbin Li, Jian Chen, Kevin Murphy, and Bernhard Schölkopf. 2018. ThunderSVM: A Fast SVM Library on GPUs and CPUs. Journal of Machine Learning Research 1, 201x (2018), 1--5.Google Scholar
- Samuel Williams, Andrew Waterman, and David Patterson. 2009. Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52, 4 (2009), 65--76.Google ScholarDigital Library
- Robert C Williamson, Alex J Smola, and Bernhard Schölkopf. 2001. Generalization Performance of Regularization Networks and Support Vector Machines Via Entropy Numbers of Compact Operators. 47, 6 (2001), 2516--2532.Google Scholar
- Stephen J Wright. 1997. Primal-dual interior-point methods. Vol. 54. Siam.Google Scholar
- Kai Zhang, Liang Lan, Zhuang Wang, and Fabian Moerchen. 2012. Scaling Up Kernel SVM on Limited Resources: A Low-Rank Linearization Approach. Proceedings of the 15th International Conference on Artificial Intelligence and Statistics (AISTAT) (2012), 1425--1434. Google ScholarCross Ref
Index Terms
- TensorSVM: accelerating kernel machines with tensor engine
Recommendations
Model selection for the LS-SVM. Application to handwriting recognition
The support vector machine (SVM) is a powerful classifier which has been used successfully in many pattern recognition problems. It has also been shown to perform well in the handwriting recognition field. The least squares SVM (LS-SVM), like the SVM, ...
Fisher-regularized support vector machine
This paper proposes Fisher regularized support vector machine (FisherSVM).FisherSVM is a graph-based supervised learning method.FisherSVM has two regularization terms: maximum margin and Fisher regularizations.FisherSVM aims to maximize the margin and ...
Analysis of legendre polynomial kernel in support vector machines
For several types of machines learning problems, the support vector machine is a method of choice. The kernel functions are a basic ingredient in support vector machine theory. Kernels based on the concepts of orthogonal polynomials gave the great ...
Comments