research-article

TensorSVM: accelerating kernel machines with tensor engine

Authors:

Shaoshuai Zhang,

Panruo WuAuthors Info & Claims

ICS '20: Proceedings of the 34th ACM International Conference on Supercomputing

Article No.: 7, Pages 1 - 11

https://doi.org/10.1145/3392717.3392770

Published: 29 June 2020 Publication History

Abstract

This paper explores the use of Tensor Engines to accelerate nonlinear and linear SVM training. Support Vector Machine(SVM) is a classical machine learning model for classification and regression and remains to be the state-of-the-art model for some tasks such as text classification and bioinformatics. However large scale SVM training is still challenging because of its high computational complexity. This is especially severe for non-linear SVM with kernel tricks. On the other hand, the surging importance of neural networks fuels the emergence of specialized processors called Tensor Units (TensorCore in GPU and Tensor Processing Unit of Google) which are characterized by extreme efficiency and very limited precision and range. This paper proposes a TensorCore GPU based SVM algorithm and software system that is faster and more scalable than state-of-the-art SVM solvers. It includes a fast, accurate low-rank Gram matrix approximation that effectively utilizes the TensorCore in GPU and a primal-dual interior-point method to solve the quadratic program with a fast and predictable convergence rate. The random projection based Gram matrix approximation can be substantially accelerated by TensorCore on GPU.

This exploration ends up with a tale of randomized numerical linear algebra, convex optimization, and high performance computing on Tensor Engines. Particularly, this paper suggests that the emerging randomized numerical linear algebra algorithms and Tensor Engines are synergistic in opening up exciting new application areas that include statistical machine learning and the wider scientific/engineering computing.

References

[1]

Chih-chung Chang, Chih-jen Lin, and Tijmen Tieleman. 2008. LIBSVM : A Library for Support Vector Machines. ACM Transactions on Intelligent Systems and Technology (TIST) 307 (2008), 1--39. arXiv: 0-387-31073-8 ISBN: 2157--6904.

Digital Library

[2]

Edward Y. Chang, Kaihua Zhu, Hao Wang, and Hongjie Bai. 2008. PSVM: Parallelizing Support Vector Machines on Distributed Computers. In NIPS. ISSN: 0016-5085.

[3]

Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine Learning (1995), 273--297.

[4]

Abdul Dakkak, Cheng Li, Isaac Gelado, Jinjun Xiong, and Wen-mei Hwu. 2019. Accelerating Reduction and Scan Using Tensor Core Units. Proceedings of the ACM International Conference on Supercomputing - ICS '19 (2019), 46--57. arXiv: 1811.09736.

Digital Library

[5]

Jyotikrishna Dass, V. N. S. Prithvi Sakuru, Vivek Sarin, and Rabi N. Mahapatra. 2017. Distributed QR Decomposition Framework for Training Support Vector Machines. Proceedings - International Conference on Distributed Computing Systems (2017), 753--763. ISBN: 9781538617915.

[6]

James Demmel, Laura Grigori, Mark Hoemmen, and Julien Langou. 2012. Communication-optimal Parallel and Sequential QR and LU Factorizations. SIAM Journal on Scientific Computing 34, 1 (Jan. 2012), A206--A239.

Digital Library

[7]

Nemanja Djuric, Liang Lan, Slobodan Vucetic, and Zhuang Wang. 2013. BudgetedSVM: A Toolbox for Scalable SVM Approximations. Journal of Machine Learning Research 14 (2013), 3813--3817. http://jmlr.org/papers/v14/djuric13a.html ISBN: 1532-4435.

Digital Library

[8]

Jack Dongarra, Mark Gates, Azzam Haidar, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, and Ichitaro Yamazaki. 2014. Accelerating numerical dense linear algebra calculations with GPUs. Numerical Computations with GPUs (2014), 3--28. ISBN: 9783319065489.

[9]

Petros Drineas and Michael W. Mahoney. 2005. Approximating a Gram Matrix for Improved Kernel-Based Learning. Journal of Machine Learning Research 6 (2005), 2153--2175. ISBN: 3540265562.

Digital Library

[10]

Petros Drineas and Michael W Mahoney. 2016. RandNLA: randomized numerical linear algebra. Commun. ACM 59, 6 (2016), 80--90.

Digital Library

[11]

Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A Library for Large Linear Classification. Journal of Machine Learning Research 9 (2008), 1871--1874.

Digital Library

[12]

Shai Fine and Katya Scheinberg. 2001. Efficient SVM Training Using Low-Rank Kernel Representations. Journal of Machine Learning Research 2 (2001), 243--264. ISBN: 0048-9697.

[13]

Azzam Haidar, Ahmad Abdelfattah, Mawussi Zounon, Panruo Wu, Srikara Pranesh, Stanimire Tomov, and Jack J. Dongarra. 2018. The Design of Fast and Energy-Efficient Linear Solvers: On the Potential of Half-Precision Arithmetic and Iterative Refinement Techniques. In Computational Science - ICCS 2018 - 18th International Conference, Wuxi, China, June 11-13, 2018, Proceedings, Part I. 586--600.

[14]

Azzam Haidar, Stanimire Tomov, Jack Dongarra, and Nicholas J Higham. 2018. Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up Mixed-Precision Iterative Refinement Solvers. In SC.

[15]

Azzam Haidar, Panruo Wu, Stanimire Tomov, and Jack Dongarra. 2017. Investigating Half Precision Arithmetic to Accelerate Dense Linear System Solvers. In 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems.

Digital Library

[16]

Azzam Haidar, Panruo Wu, Stanimire Tomov, and Jack J. Dongarra. 2017. Investigating half precision arithmetic to accelerate dense linear system solvers. In Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ScalA@SC 2017, Denver, CO, USA, November 13, 2017. 10:1--10:8.

Digital Library

[17]

Cho-Jui Hsieh, Kai-Wei Chang, Chih-Jen Lin, S. Sathiya Keerthi, and S. Sundararajan. 2008. A dual coordinate descent method for large-scale linear SVM. Proceedings of the 25th international conference on Machine learning - ICML '08 2 (2008), 408--415. ISBN: 9781605582054.

Digital Library

[18]

Cho-Jui Hsieh and Inderjit S Dhillon. 2014. A Divide-and-Conquer Solver for Kernel Support Vector Machines. (2014), 16.

[19]

Zhe Jia, Marco Maggioni, Benjamin Staiger, and Daniele P. Scarpazza. 2018. Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking. (2018). http://arxiv.org/abs/1804.06826 arXiv: 1804.06826.

[20]

Ching-Pei Lee and Dan Roth. 2015. Distributed Box-Constrained Quadratic Optimization for Dual Linear SVM. Proceedings of the 32nd International Conference on Machine Learning 37 (2015), 987--996. http://proceedings.mlr.press/v37/leea15.html ISBN: 9781510810587.

[21]

Ang Li, Shuaiwen Leon Song, Mark Wijtvliet, Akash Kumar, and Henk Corporaal. 2016. SFU-Driven Transparent Approximation Acceleration on GPUs. In Proceedings of the 2016 International Conference on Supercomputing - ICS '16. ACM Press, Istanbul, Turkey, 1--14.

Digital Library

[22]

Chih-jen Jen Lin, Ruby C Weng, and Sathiya Sathiya Keerthi. 2008. Trust Region Newton Method for Logistic Regression. Journal of Machine Learning Research 9 (2008), 627--650. ISBN: 9781595937933.

Digital Library

[23]

Stefano Markidis, Steven Wei Der Chien, Erwin Laure, Ivy Bo Peng, and Jeffrey S. Vetter. 2018. NVIDIA tensor core programmability, performance & precision. Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018 (2018), 522--531. arXiv: 1803.04014 ISBN: 9781538655559.

[24]

Per-Gunnar Martinsson. 2016. Randomized methods for matrix computations. arXiv:1607.01649 [math] (July 2016). http://arxiv.org/abs/1607.01649 arXiv:1607.01649.

[25]

Per-Gunnar Martinsson and Sergey Voronin. 2016. A randomized blocked algorithm for efficiently computing rank-revealing factorizations of matrices. SIAM J. Sci. Comput. 38, 5 (2016), 485--507. arXiv: 1503.07157.

Digital Library

[26]

Sanjay Mehrotra. 1992. On the Implementation of a Primal-Dual Interior Point Method. SIAM Journal on Optimization 2, 4 (1992), 575--601. ISBN: 9783540354451.

Digital Library

[27]

Nvidia. 2017. NVIDIA TESLA V100 GPU ARCHITECTURE. Technical Report. 53 pages. Issue: v1.1.

[28]

John C Platt. 1998. Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. Technical Report. Microsoft Research. 1--21 pages.

[29]

K Scheinberg. 2006. An efficient implementation of an active set method for SVMs. Journal of Machine Learning Research 7 (2006), 2237--2257. ISBN: 1532-4435.

Digital Library

[30]

Shai Shalev-Shwartz, Yoram Singer, Nathan Srebro, and Andrew Cotter. 2011. Pegasos: Primal estimated sub-gradient solver for SVM. Mathematical Programming 127, 1 (2011), 3--30. ISBN: 9781595937933.

[31]

Si Si, Cho-Jui Hsieh, and Inderjit Dhillon. 2017. Memory Efficient Kernel Approximation. Journal of Machine Learning Research 18 (2017). ISBN: 9781634393973.

[32]

Alexander J Smola and Bernhard Holkopf. 2000. Sparse Greedy Matrix Approximation for Machine Learning. Proceedings of the Seventeenth International Conference on Machine Learning (2000), 911--918. https://doi.org/1558607072 arXiv: 1558607072 ISBN: 1-55860-707-2.

[33]

Amber Thomas. 2017. 2017 The State of Data Science & Machine Learning. https://www.kaggle.com/surveys/2017

[34]

Zeyi Wen, Jiashuai Shi, Bingsheng He, Qinbin Li, Jian Chen, Kevin Murphy, and Bernhard Schölkopf. 2018. ThunderSVM: A Fast SVM Library on GPUs and CPUs. Journal of Machine Learning Research 1, 201x (2018), 1--5.

[35]

Samuel Williams, Andrew Waterman, and David Patterson. 2009. Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52, 4 (2009), 65--76.

Digital Library

[36]

Robert C Williamson, Alex J Smola, and Bernhard Schölkopf. 2001. Generalization Performance of Regularization Networks and Support Vector Machines Via Entropy Numbers of Compact Operators. 47, 6 (2001), 2516--2532.

[37]

Stephen J Wright. 1997. Primal-dual interior-point methods. Vol. 54. Siam.

[38]

Kai Zhang, Liang Lan, Zhuang Wang, and Fabian Moerchen. 2012. Scaling Up Kernel SVM on Limited Resources: A Low-Rank Linearization Approach. Proceedings of the 15th International Conference on Artificial Intelligence and Statistics (AISTAT) (2012), 1425--1434.

Cited By

Leng YZou GWang HWu PZhang S(2025)High Performance Householder QR Factorization on Emerging GPU Architectures Using Tensor CoresIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.352277636:3(422-436)Online publication date: Mar-2025
https://doi.org/10.1109/TPDS.2024.3522776
Zhang SShah ROotomo HYokota RWu PDehnavi MKulkarni MKrishnamoorthy S(2023)Fast Symmetric Eigenvalue Decomposition via WY Representation on Tensor CoreProceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3577516(301-312)Online publication date: 25-Feb-2023
https://dl.acm.org/doi/10.1145/3572848.3577516
Xu BWen ZYan LZhao ZYin ZLiu WHe B(2023)Leveraging Data Density and Sparsity for Efficient SVM Training on GPUs2023 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM58522.2023.00079(698-707)Online publication date: 1-Dec-2023
https://doi.org/10.1109/ICDM58522.2023.00079
Show More Cited By

Index Terms

Recommendations

Model selection for the LS-SVM. Application to handwriting recognition

The support vector machine (SVM) is a powerful classifier which has been used successfully in many pattern recognition problems. It has also been shown to perform well in the handwriting recognition field. The least squares SVM (LS-SVM), like the SVM, ...
Fisher-regularized support vector machine

This paper proposes Fisher regularized support vector machine (FisherSVM).FisherSVM is a graph-based supervised learning method.FisherSVM has two regularization terms: maximum margin and Fisher regularizations.FisherSVM aims to maximize the margin and ...
Analysis of legendre polynomial kernel in support vector machines

For several types of machines learning problems, the support vector machine is a method of choice. The kernel functions are a basic ingredient in support vector machine theory. Kernels based on the concepts of orthogonal polynomials gave the great ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '20: Proceedings of the 34th ACM International Conference on Supercomputing

June 2020

499 pages

ISBN:9781450379830

DOI:10.1145/3392717

General Chairs:
Eduard Ayguadé
Universitat Politècnica de Catalunya and Barcelona Supercomputing Center
,
Wen-mei Hwu
University of Illinois at Urbana-Champaign
,
Program Chairs:
Rosa M. Badia
Barcelona Supercomputing Center and Universitat Politècnica de Catalunya
,
H. Peter Hofstee
IBM Austin

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 June 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

University of Houston
state of Texas

Conference

ICS '20

Sponsor:

SIGARCH

ICS '20: 2020 International Conference on Supercomputing

June 29 - July 2, 2020

Spain, Barcelona

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
310
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)2

Reflects downloads up to 12 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Leng YZou GWang HWu PZhang S(2025)High Performance Householder QR Factorization on Emerging GPU Architectures Using Tensor CoresIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.352277636:3(422-436)Online publication date: Mar-2025
https://doi.org/10.1109/TPDS.2024.3522776
Zhang SShah ROotomo HYokota RWu PDehnavi MKulkarni MKrishnamoorthy S(2023)Fast Symmetric Eigenvalue Decomposition via WY Representation on Tensor CoreProceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3577516(301-312)Online publication date: 25-Feb-2023
https://dl.acm.org/doi/10.1145/3572848.3577516
Xu BWen ZYan LZhao ZYin ZLiu WHe B(2023)Leveraging Data Density and Sparsity for Efficient SVM Training on GPUs2023 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM58522.2023.00079(698-707)Online publication date: 1-Dec-2023
https://doi.org/10.1109/ICDM58522.2023.00079
Zhou S(2021)Sparse SVM for Sufficient Data ReductionIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2021.3075339(1-1)Online publication date: 2021
https://doi.org/10.1109/TPAMI.2021.3075339
Zhang SKarihaloo VWu P(2020)Basic Linear Algebra Operations on TensorCore GPU2020 IEEE/ACM 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA)10.1109/ScalA51936.2020.00011(44-52)Online publication date: Nov-2020
https://doi.org/10.1109/ScalA51936.2020.00011

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten