skip to main content
10.1145/3075564.3075569acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article

Data Analytics with NVLink: An SpMV Case Study

Published:15 May 2017Publication History

ABSTRACT

A recent advancement in the world of heterogeneous computing, the NVLink interconnect enables high-speed communication between GPUs and CPUs and among GPUs. In this paper we show how NVLink changes the role GPUs can play in graph, and more in general, data analytics. With the technology preceding NVLink, the processing efficiency of GPUs is limited to data sets that fit into their local memory.

The increased bandwidth provided by NVLink imposes a reassessment of many algorithms---including those used in data analytics---that in the past could not efficiently exploit GPUs because of their limited bandwidth towards host memory.

Our contributions consist in the introduction of the basic properties of one of the first systems using NVLink, and the description of how one of the most pervasive data analytics kernels, SpMV, can be tailored to the system in question. We evaluate the resulting SpMV implementation on a variety of data sets, and compare favorably to the best results available in the literature.

References

  1. Pham Nguyen Quang Anh, Rui Fan, and Yonggang Wen. 2015. Reducing Vector I/O for Faster GPU Sparse Matrix-Vector Multiplication. In Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International. 1043--1052. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Arash Ashari, Naser Sedaghati, John Eisenlohr, Srinivasan Parthasarathy, and P. Sadayappan. 2014. Fast Sparse Matrix-vector Multiplication on GPUs for Graph Applications. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '14). IEEE Press, Piscataway, NJ, USA, 781--792. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Erik G. Boman, Karen D. Devine, and Sivasankaran Rajamanickam. 2013. Scalable Matrix Computations on Large Scale-free Graphs Using 2D Graph Partitioning. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '13). ACM, New York, NY, USA, Article 50, 12 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Daniele Buono, Fabrizio Petrini, Fabio Checconi, Xing Liu, Xinyu Que, Chris Long, and Tai-Ching Tuan. 2016. Optimizing Sparse Matrix-Vector Multiplication for Large-Scale Data Analytics. In Proceedings of the 2016 International Conference on Supercomputing (ICS '16). ACM, New York, NY, USA, Article 37, 12 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J.W. Choi, A. Singh, and R.W. Vuduc. 2010. Model-driven autotuning of sparse matrix-vector multiply on GPUs. In ACM SIGPLAN Notices, Vol. 45. ACM, 115 126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T.A. Davis. 1994. The University of Florida sparse matrix collection. In NA digest. Citeseer.Google ScholarGoogle Scholar
  7. Joseph L. Greathouse and Mayank Daga. 2014. Efficient Sparse Matrix-vector Multiplication on GPUs Using the CSR Storage Format. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '14). IEEE Press, Piscataway, NJ, USA, 769--780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. E.J. Im, K. Yelick, and R. Vuduc. 2004. SPARSITY: Optimization framework for sparse matrix kernels. Intl J. High Perf. Comput. Appl. 18 (2004), 135158. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Tamara G. Kolda, Ali Pinar, Todd Plantenga, and C. Seshadhri. 2014. A Scalable Generative Graph Model with Community Structure. SIAM Journal on Scientific Computing 36, 5 (September 2014), C424--C452.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data. (June 2014).Google ScholarGoogle Scholar
  11. Xing Liu, Mikhail Smelyanskiy, Edmond Chow, and Pradeep Dubey. 2013. Efficient Sparse Matrix-vector Multiplication on x86-based Many-core Processors. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing (ICS '13). ACM, New York, NY, USA, 273--282. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Wai Teng Tang, Ruizhe Zhao, Mian Lu, Yun Liang, Huynh Phung Huynh, Xibai Li, and Rick Siow Mong Goh. 2015. Optimizing and Auto-tuning Scale-free Sparse Matrix-vector Multiplication on Intel Xeon Phi. In Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO '15). IEEE Computer Society, Washington, DC, USA, 136145. http://dl.acm.org/citation.cfm?id=2738600.2738618 Google ScholarGoogle ScholarCross RefCross Ref
  13. Richard Wilson Vuduc. 2003. Automatic performance tuning of sparse matrix kernels. Ph.D. Dissertation. Univ. of California, Berkeley.Google ScholarGoogle Scholar
  14. Samuel Williams, Leonid Oliker, Richard Vuduc, John Shalf, Katherine Yelick, and James Demmel. 2007. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In Proc. ACM/IEEE Conf. Supercomputing (SC '07). ACM, New York, NY, USA, 38:1--38:12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Shengen Yan, Chao Li, Yunquan Zhang, and Huiyang Zhou. 2014. yaSpMV: Yet Another SpMV Framework on GPUs. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '14). ACM, New York, NY, USA, 107118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Andy Yoo, Allison H. Baker, Roger Pearce, and Van Emden Henson. 2011. A Scalable Eigensolver for Large Scale-free Graphs Using 2D Graph Partitioning. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC '11). ACM, New York, NY, USA, Article 63, 11 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    CF'17: Proceedings of the Computing Frontiers Conference
    May 2017
    450 pages
    ISBN:9781450344876
    DOI:10.1145/3075564

    Copyright © 2017 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 15 May 2017

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    CF'17 Paper Acceptance Rate43of87submissions,49%Overall Acceptance Rate240of680submissions,35%

    Upcoming Conference

    CF '24

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader