skip to main content
10.1145/3075564.3075569acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article

Data Analytics with NVLink: An SpMV Case Study

Published: 15 May 2017 Publication History

Abstract

A recent advancement in the world of heterogeneous computing, the NVLink interconnect enables high-speed communication between GPUs and CPUs and among GPUs. In this paper we show how NVLink changes the role GPUs can play in graph, and more in general, data analytics. With the technology preceding NVLink, the processing efficiency of GPUs is limited to data sets that fit into their local memory.
The increased bandwidth provided by NVLink imposes a reassessment of many algorithms---including those used in data analytics---that in the past could not efficiently exploit GPUs because of their limited bandwidth towards host memory.
Our contributions consist in the introduction of the basic properties of one of the first systems using NVLink, and the description of how one of the most pervasive data analytics kernels, SpMV, can be tailored to the system in question. We evaluate the resulting SpMV implementation on a variety of data sets, and compare favorably to the best results available in the literature.

References

[1]
Pham Nguyen Quang Anh, Rui Fan, and Yonggang Wen. 2015. Reducing Vector I/O for Faster GPU Sparse Matrix-Vector Multiplication. In Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International. 1043--1052.
[2]
Arash Ashari, Naser Sedaghati, John Eisenlohr, Srinivasan Parthasarathy, and P. Sadayappan. 2014. Fast Sparse Matrix-vector Multiplication on GPUs for Graph Applications. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '14). IEEE Press, Piscataway, NJ, USA, 781--792.
[3]
Erik G. Boman, Karen D. Devine, and Sivasankaran Rajamanickam. 2013. Scalable Matrix Computations on Large Scale-free Graphs Using 2D Graph Partitioning. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '13). ACM, New York, NY, USA, Article 50, 12 pages.
[4]
Daniele Buono, Fabrizio Petrini, Fabio Checconi, Xing Liu, Xinyu Que, Chris Long, and Tai-Ching Tuan. 2016. Optimizing Sparse Matrix-Vector Multiplication for Large-Scale Data Analytics. In Proceedings of the 2016 International Conference on Supercomputing (ICS '16). ACM, New York, NY, USA, Article 37, 12 pages.
[5]
J.W. Choi, A. Singh, and R.W. Vuduc. 2010. Model-driven autotuning of sparse matrix-vector multiply on GPUs. In ACM SIGPLAN Notices, Vol. 45. ACM, 115 126.
[6]
T.A. Davis. 1994. The University of Florida sparse matrix collection. In NA digest. Citeseer.
[7]
Joseph L. Greathouse and Mayank Daga. 2014. Efficient Sparse Matrix-vector Multiplication on GPUs Using the CSR Storage Format. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '14). IEEE Press, Piscataway, NJ, USA, 769--780.
[8]
E.J. Im, K. Yelick, and R. Vuduc. 2004. SPARSITY: Optimization framework for sparse matrix kernels. Intl J. High Perf. Comput. Appl. 18 (2004), 135158.
[9]
Tamara G. Kolda, Ali Pinar, Todd Plantenga, and C. Seshadhri. 2014. A Scalable Generative Graph Model with Community Structure. SIAM Journal on Scientific Computing 36, 5 (September 2014), C424--C452.
[10]
Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data. (June 2014).
[11]
Xing Liu, Mikhail Smelyanskiy, Edmond Chow, and Pradeep Dubey. 2013. Efficient Sparse Matrix-vector Multiplication on x86-based Many-core Processors. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing (ICS '13). ACM, New York, NY, USA, 273--282.
[12]
Wai Teng Tang, Ruizhe Zhao, Mian Lu, Yun Liang, Huynh Phung Huynh, Xibai Li, and Rick Siow Mong Goh. 2015. Optimizing and Auto-tuning Scale-free Sparse Matrix-vector Multiplication on Intel Xeon Phi. In Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO '15). IEEE Computer Society, Washington, DC, USA, 136145. http://dl.acm.org/citation.cfm?id=2738600.2738618
[13]
Richard Wilson Vuduc. 2003. Automatic performance tuning of sparse matrix kernels. Ph.D. Dissertation. Univ. of California, Berkeley.
[14]
Samuel Williams, Leonid Oliker, Richard Vuduc, John Shalf, Katherine Yelick, and James Demmel. 2007. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In Proc. ACM/IEEE Conf. Supercomputing (SC '07). ACM, New York, NY, USA, 38:1--38:12.
[15]
Shengen Yan, Chao Li, Yunquan Zhang, and Huiyang Zhou. 2014. yaSpMV: Yet Another SpMV Framework on GPUs. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '14). ACM, New York, NY, USA, 107118.
[16]
Andy Yoo, Allison H. Baker, Roger Pearce, and Van Emden Henson. 2011. A Scalable Eigensolver for Large Scale-free Graphs Using 2D Graph Partitioning. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC '11). ACM, New York, NY, USA, Article 63, 11 pages.

Cited By

View all
  • (2020)Efficient Sparse Matrix-Vector Multiplication on Intel PIUMA Architecture2020 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC43674.2020.9286245(1-2)Online publication date: 22-Sep-2020
  • (2020)A Domain-Specific Architecture for Accelerating Sparse Matrix Vector Multiplication on FPGAs2020 30th International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL50879.2020.00031(127-132)Online publication date: Aug-2020
  • (2020)ComScribe: Identifying Intra-node GPU CommunicationBenchmarking, Measuring, and Optimizing10.1007/978-3-030-71058-3_10(157-174)Online publication date: 15-Nov-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CF'17: Proceedings of the Computing Frontiers Conference
May 2017
450 pages
ISBN:9781450344876
DOI:10.1145/3075564
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 May 2017

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

CF '17
Sponsor:
CF '17: Computing Frontiers Conference
May 15 - 17, 2017
Siena, Italy

Acceptance Rates

CF'17 Paper Acceptance Rate 43 of 87 submissions, 49%;
Overall Acceptance Rate 273 of 785 submissions, 35%

Upcoming Conference

CF '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)2
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Efficient Sparse Matrix-Vector Multiplication on Intel PIUMA Architecture2020 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC43674.2020.9286245(1-2)Online publication date: 22-Sep-2020
  • (2020)A Domain-Specific Architecture for Accelerating Sparse Matrix Vector Multiplication on FPGAs2020 30th International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL50879.2020.00031(127-132)Online publication date: Aug-2020
  • (2020)ComScribe: Identifying Intra-node GPU CommunicationBenchmarking, Measuring, and Optimizing10.1007/978-3-030-71058-3_10(157-174)Online publication date: 15-Nov-2020
  • (2018)Functionality and performance of NVLink with IBM POWER9 processorsIBM Journal of Research and Development10.1147/JRD.2018.284697862:4-5(9:1-9:10)Online publication date: 1-Jul-2018
  • (2017)Leveraging NVLINK and asynchronous data transfer to scale beyond the memory capacity of GPUsProceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems10.1145/3148226.3148232(1-5)Online publication date: 12-Nov-2017
  • (2017)Beyond 16GBProceedings of the Workshop on Memory Centric Programming for HPC10.1145/3145617.3145619(20-29)Online publication date: 12-Nov-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media