skip to main content
10.1145/3392717.3392770acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

TensorSVM: accelerating kernel machines with tensor engine

Published:29 June 2020Publication History

ABSTRACT

This paper explores the use of Tensor Engines to accelerate nonlinear and linear SVM training. Support Vector Machine(SVM) is a classical machine learning model for classification and regression and remains to be the state-of-the-art model for some tasks such as text classification and bioinformatics. However large scale SVM training is still challenging because of its high computational complexity. This is especially severe for non-linear SVM with kernel tricks. On the other hand, the surging importance of neural networks fuels the emergence of specialized processors called Tensor Units (TensorCore in GPU and Tensor Processing Unit of Google) which are characterized by extreme efficiency and very limited precision and range. This paper proposes a TensorCore GPU based SVM algorithm and software system that is faster and more scalable than state-of-the-art SVM solvers. It includes a fast, accurate low-rank Gram matrix approximation that effectively utilizes the TensorCore in GPU and a primal-dual interior-point method to solve the quadratic program with a fast and predictable convergence rate. The random projection based Gram matrix approximation can be substantially accelerated by TensorCore on GPU.

This exploration ends up with a tale of randomized numerical linear algebra, convex optimization, and high performance computing on Tensor Engines. Particularly, this paper suggests that the emerging randomized numerical linear algebra algorithms and Tensor Engines are synergistic in opening up exciting new application areas that include statistical machine learning and the wider scientific/engineering computing.

References

  1. Chih-chung Chang, Chih-jen Lin, and Tijmen Tieleman. 2008. LIBSVM : A Library for Support Vector Machines. ACM Transactions on Intelligent Systems and Technology (TIST) 307 (2008), 1--39. arXiv: 0-387-31073-8 ISBN: 2157--6904. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Edward Y. Chang, Kaihua Zhu, Hao Wang, and Hongjie Bai. 2008. PSVM: Parallelizing Support Vector Machines on Distributed Computers. In NIPS. ISSN: 0016-5085. Google ScholarGoogle ScholarCross RefCross Ref
  3. Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine Learning (1995), 273--297.Google ScholarGoogle Scholar
  4. Abdul Dakkak, Cheng Li, Isaac Gelado, Jinjun Xiong, and Wen-mei Hwu. 2019. Accelerating Reduction and Scan Using Tensor Core Units. Proceedings of the ACM International Conference on Supercomputing - ICS '19 (2019), 46--57. arXiv: 1811.09736. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jyotikrishna Dass, V. N. S. Prithvi Sakuru, Vivek Sarin, and Rabi N. Mahapatra. 2017. Distributed QR Decomposition Framework for Training Support Vector Machines. Proceedings - International Conference on Distributed Computing Systems (2017), 753--763. ISBN: 9781538617915. Google ScholarGoogle ScholarCross RefCross Ref
  6. James Demmel, Laura Grigori, Mark Hoemmen, and Julien Langou. 2012. Communication-optimal Parallel and Sequential QR and LU Factorizations. SIAM Journal on Scientific Computing 34, 1 (Jan. 2012), A206--A239. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Nemanja Djuric, Liang Lan, Slobodan Vucetic, and Zhuang Wang. 2013. BudgetedSVM: A Toolbox for Scalable SVM Approximations. Journal of Machine Learning Research 14 (2013), 3813--3817. http://jmlr.org/papers/v14/djuric13a.html ISBN: 1532-4435.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jack Dongarra, Mark Gates, Azzam Haidar, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, and Ichitaro Yamazaki. 2014. Accelerating numerical dense linear algebra calculations with GPUs. Numerical Computations with GPUs (2014), 3--28. ISBN: 9783319065489. Google ScholarGoogle ScholarCross RefCross Ref
  9. Petros Drineas and Michael W. Mahoney. 2005. Approximating a Gram Matrix for Improved Kernel-Based Learning. Journal of Machine Learning Research 6 (2005), 2153--2175. ISBN: 3540265562. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Petros Drineas and Michael W Mahoney. 2016. RandNLA: randomized numerical linear algebra. Commun. ACM 59, 6 (2016), 80--90.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A Library for Large Linear Classification. Journal of Machine Learning Research 9 (2008), 1871--1874.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Shai Fine and Katya Scheinberg. 2001. Efficient SVM Training Using Low-Rank Kernel Representations. Journal of Machine Learning Research 2 (2001), 243--264. ISBN: 0048-9697. Google ScholarGoogle ScholarCross RefCross Ref
  13. Azzam Haidar, Ahmad Abdelfattah, Mawussi Zounon, Panruo Wu, Srikara Pranesh, Stanimire Tomov, and Jack J. Dongarra. 2018. The Design of Fast and Energy-Efficient Linear Solvers: On the Potential of Half-Precision Arithmetic and Iterative Refinement Techniques. In Computational Science - ICCS 2018 - 18th International Conference, Wuxi, China, June 11-13, 2018, Proceedings, Part I. 586--600. Google ScholarGoogle ScholarCross RefCross Ref
  14. Azzam Haidar, Stanimire Tomov, Jack Dongarra, and Nicholas J Higham. 2018. Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up Mixed-Precision Iterative Refinement Solvers. In SC.Google ScholarGoogle Scholar
  15. Azzam Haidar, Panruo Wu, Stanimire Tomov, and Jack Dongarra. 2017. Investigating Half Precision Arithmetic to Accelerate Dense Linear System Solvers. In 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Azzam Haidar, Panruo Wu, Stanimire Tomov, and Jack J. Dongarra. 2017. Investigating half precision arithmetic to accelerate dense linear system solvers. In Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ScalA@SC 2017, Denver, CO, USA, November 13, 2017. 10:1--10:8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Cho-Jui Hsieh, Kai-Wei Chang, Chih-Jen Lin, S. Sathiya Keerthi, and S. Sundararajan. 2008. A dual coordinate descent method for large-scale linear SVM. Proceedings of the 25th international conference on Machine learning - ICML '08 2 (2008), 408--415. ISBN: 9781605582054. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Cho-Jui Hsieh and Inderjit S Dhillon. 2014. A Divide-and-Conquer Solver for Kernel Support Vector Machines. (2014), 16.Google ScholarGoogle Scholar
  19. Zhe Jia, Marco Maggioni, Benjamin Staiger, and Daniele P. Scarpazza. 2018. Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking. (2018). http://arxiv.org/abs/1804.06826 arXiv: 1804.06826.Google ScholarGoogle Scholar
  20. Ching-Pei Lee and Dan Roth. 2015. Distributed Box-Constrained Quadratic Optimization for Dual Linear SVM. Proceedings of the 32nd International Conference on Machine Learning 37 (2015), 987--996. http://proceedings.mlr.press/v37/leea15.html ISBN: 9781510810587.Google ScholarGoogle Scholar
  21. Ang Li, Shuaiwen Leon Song, Mark Wijtvliet, Akash Kumar, and Henk Corporaal. 2016. SFU-Driven Transparent Approximation Acceleration on GPUs. In Proceedings of the 2016 International Conference on Supercomputing - ICS '16. ACM Press, Istanbul, Turkey, 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Chih-jen Jen Lin, Ruby C Weng, and Sathiya Sathiya Keerthi. 2008. Trust Region Newton Method for Logistic Regression. Journal of Machine Learning Research 9 (2008), 627--650. ISBN: 9781595937933. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Stefano Markidis, Steven Wei Der Chien, Erwin Laure, Ivy Bo Peng, and Jeffrey S. Vetter. 2018. NVIDIA tensor core programmability, performance & precision. Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018 (2018), 522--531. arXiv: 1803.04014 ISBN: 9781538655559. Google ScholarGoogle ScholarCross RefCross Ref
  24. Per-Gunnar Martinsson. 2016. Randomized methods for matrix computations. arXiv:1607.01649 [math] (July 2016). http://arxiv.org/abs/1607.01649 arXiv:1607.01649.Google ScholarGoogle Scholar
  25. Per-Gunnar Martinsson and Sergey Voronin. 2016. A randomized blocked algorithm for efficiently computing rank-revealing factorizations of matrices. SIAM J. Sci. Comput. 38, 5 (2016), 485--507. arXiv: 1503.07157. Google ScholarGoogle ScholarCross RefCross Ref
  26. Sanjay Mehrotra. 1992. On the Implementation of a Primal-Dual Interior Point Method. SIAM Journal on Optimization 2, 4 (1992), 575--601. ISBN: 9783540354451. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Nvidia. 2017. NVIDIA TESLA V100 GPU ARCHITECTURE. Technical Report. 53 pages. Issue: v1.1.Google ScholarGoogle Scholar
  28. John C Platt. 1998. Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. Technical Report. Microsoft Research. 1--21 pages.Google ScholarGoogle Scholar
  29. K Scheinberg. 2006. An efficient implementation of an active set method for SVMs. Journal of Machine Learning Research 7 (2006), 2237--2257. ISBN: 1532-4435.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Shai Shalev-Shwartz, Yoram Singer, Nathan Srebro, and Andrew Cotter. 2011. Pegasos: Primal estimated sub-gradient solver for SVM. Mathematical Programming 127, 1 (2011), 3--30. ISBN: 9781595937933. Google ScholarGoogle ScholarCross RefCross Ref
  31. Si Si, Cho-Jui Hsieh, and Inderjit Dhillon. 2017. Memory Efficient Kernel Approximation. Journal of Machine Learning Research 18 (2017). ISBN: 9781634393973. Google ScholarGoogle ScholarCross RefCross Ref
  32. Alexander J Smola and Bernhard Holkopf. 2000. Sparse Greedy Matrix Approximation for Machine Learning. Proceedings of the Seventeenth International Conference on Machine Learning (2000), 911--918. https://doi.org/1558607072 arXiv: 1558607072 ISBN: 1-55860-707-2.Google ScholarGoogle Scholar
  33. Amber Thomas. 2017. 2017 The State of Data Science & Machine Learning. https://www.kaggle.com/surveys/2017Google ScholarGoogle Scholar
  34. Zeyi Wen, Jiashuai Shi, Bingsheng He, Qinbin Li, Jian Chen, Kevin Murphy, and Bernhard Schölkopf. 2018. ThunderSVM: A Fast SVM Library on GPUs and CPUs. Journal of Machine Learning Research 1, 201x (2018), 1--5.Google ScholarGoogle Scholar
  35. Samuel Williams, Andrew Waterman, and David Patterson. 2009. Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52, 4 (2009), 65--76.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Robert C Williamson, Alex J Smola, and Bernhard Schölkopf. 2001. Generalization Performance of Regularization Networks and Support Vector Machines Via Entropy Numbers of Compact Operators. 47, 6 (2001), 2516--2532.Google ScholarGoogle Scholar
  37. Stephen J Wright. 1997. Primal-dual interior-point methods. Vol. 54. Siam.Google ScholarGoogle Scholar
  38. Kai Zhang, Liang Lan, Zhuang Wang, and Fabian Moerchen. 2012. Scaling Up Kernel SVM on Limited Resources: A Low-Rank Linearization Approach. Proceedings of the 15th International Conference on Artificial Intelligence and Statistics (AISTAT) (2012), 1425--1434. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. TensorSVM: accelerating kernel machines with tensor engine

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in
                • Published in

                  cover image ACM Conferences
                  ICS '20: Proceedings of the 34th ACM International Conference on Supercomputing
                  June 2020
                  499 pages
                  ISBN:9781450379830
                  DOI:10.1145/3392717
                  • General Chairs:
                  • Eduard Ayguadé,
                  • Wen-mei Hwu,
                  • Program Chairs:
                  • Rosa M. Badia,
                  • H. Peter Hofstee

                  Copyright © 2020 ACM

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 29 June 2020

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • research-article

                  Acceptance Rates

                  Overall Acceptance Rate584of2,055submissions,28%

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader