skip to main content
10.1145/3437801.3441592acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

Efficiently running SpMV on long vector architectures

Published: 17 February 2021 Publication History

Abstract

Sparse Matrix-Vector multiplication (SpMV) is an essential kernel for parallel numerical applications. SpMV displays sparse and irregular data accesses, which complicate its vectorization. Such difficulties make SpMV to frequently experiment non-optimal results when run on long vector ISAs exploiting SIMD parallelism. In this context, the development of new optimizations becomes fundamental to enable high performance SpMV executions on emerging long vector architectures. In this paper, we improve the state-of-the-art SELL-C-σ sparse matrix format by proposing several new optimizations for SpMV. We target aggressive long vector architectures like the NEC Vector Engine. By combining several optimizations, we obtain an average 12% improvement over SELL-C-σ considering a heterogeneous set of 24 matrices. Our optimizations boost performance in long vector architectures since they expose a high degree of SIMD parallelism.

References

[1]
[n.d.]. LLVM-VE github repository. https://github.com/sx-auroradev/llvm-project - last accesses April 2020.
[2]
[n.d.]. LLVM VE intrinsics. https://sx-aurora-dev.github.io/velintrin.html - last accesses April 2020.
[3]
2018. SX-Aurora TSUBASA Architecture Guide. https://www.hpc.nec/documents/guide/pdfs/Aurora_ISA_guide.pdf.
[4]
Hartwig Anzt, Stanimire Tomov, and Jack Dongarra. 2014. Implementing a Sparse Matrix Vector Product for the SELL-C/SELL-C-σ formats on NVIDIA GPUs. University of Tennessee, Tech. Rep. ut-eecs-14-727 (2014).
[5]
Scott Beamer, Krste Asanović, and David Patterson. 2017. Reducing Pagerank Communication via Propagation Blocking. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 820--831.
[6]
OpenMP Architecture Review Board. November 2018. OpenMP 5.0 Specification. Technical Report. https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf
[7]
Daniele Buono, Fabrizio Petrini, Fabio Checconi, Xing Liu, Xinyu Que, Chris Long, and Tai-Ching Tuan. 2016. Optimizing Sparse Matrix-Vector Multiplication for Large-Scale Data Analytics. In Proceedings of the 2016 International Conference on Supercomputing (ICS '16). Association for Computing Machinery, Istanbul, Turkey, 1--12.
[8]
Shizhao Chen, Jianbin Fang, Donglin Chen, Chuanfu Xu, and Zheng Wang. 2018. Adaptive optimization of sparse matrix-vector multiplication on emerging many-core architectures. In 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). IEEE, 649--658.
[9]
Xinhai Chen, Peizhen Xie, Lihua Chi, Jie Liu, and Chunye Gong. 2018. An efficient SIMD compression format for sparse matrix-vector multiplication. Concurrency and Computation: Practice and Experience 30, 23 (2018), e4800. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/cpe.4800 e4800CPE-18-0532.R1.
[10]
NEC CORPORATION. 2018. SX-Aurora TSUBASA Architecture Guide Revision 1.1. Technical Report.
[11]
D R Kincaid, T C Oppe, and D M Young. 1989. ITPACKV 2D user's guide. (5 1989).
[12]
Kazuhiko Komatsu, Shintaro Momose, Yoko Isobe, Osamu Watanabe, Akihiro Musa, Mitsuo Yokokawa, Toshikazu Aoyama, Masayuki Sato, and Hiroaki Kobayashi. 2018. Performance Evaluation of a Vector Supercomputer SX-Aurora TSUBASA. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. 685--696.
[13]
Moritz Kreutzer, Georg Hager, Gerhard Wellein, Holger Fehske, and Alan R. Bishop. 2014. A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector Multiplication on Modern Processors with Wide SIMD Units. SIAM Journal on Scientific Computing 36, 5 (Jan. 2014), C401--C423.
[14]
Yishui Li, Peizhen Xie, Xinhai Chen, Jie Liu, Bo Yang, Shengguo Li, Chunye Gong, Xinbiao Gan, and Han Xu. 2019. VBSF: a new storage format for SIMD sparse matrix-vector multiplication on modern processors. The Journal of Supercomputing (April 2019).
[15]
Weifeng Liu and Brian Vinter. 2015. CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication. In Proceedings of the 29th ACM on International Conference on Supercomputing. Association for Computing Machinery, Newport Beach, California, USA, 339--350.
[16]
Xing Liu, Mikhail Smelyanskiy, Edmond Chow, and Pradeep Dubey. 2013. Efficient sparse matrix-vector multiplication on x86-based many-core processors. In Proceedings of the 27th international ACM conference on International conference on supercomputing. Association for Computing Machinery, Eugene, Oregon, USA, 273--282.
[17]
Chris Lomont. 2011. Introduction to Intel Advanced Vector Extensions. Intel White Paper.
[18]
Alexander Monakov, Anton Lokhmotov, and Arutyun Avetisyan. 2010. Automatically tuning sparse matrix-vector multiplication for GPU architectures. In International Conference on High-Performance Embedded Architectures and Compilers. Springer, 111--125.
[19]
Nigel Stephens, Stuart Biles, Matthias Boettcher, Jacob Eapen, Mbou Eyole, Giacomo Gabrielli, Matt Horsnell, Grigorios Magklis, Alejandro Martinez, Nathanaël Prémillieu, Alastair Reid, Alejandro Rico, and Paul Walker. 2017. The ARM Scalable Vector Extension. IEEE Micro 37, 2 (2017), 26--39.
[20]
Yohei Yamada and Shintaro Momose. 2018. Vector engine processor of NEC's brand-new supercomputer SX-Aurora TSUBASA. In Proceedings of A Symposium on High Performance Chips, Hot Chips, Vol. 30. 19--21.
[21]
Marco Zagha and Guy E Blelloch. 1991. Radix sort for vector multiprocessors. In Proceedings of the 1991 ACM/IEEE conference on Supercomputing. 712--721.

Cited By

View all
  • (2024)Near-Memory Parallel Indexing and Coalescing: Enabling Highly Efficient Indirect Access for SpMV2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546797(1-6)Online publication date: 25-Mar-2024
  • (2024)Co-designing ab initio electronic structure methods on a RISC-V vector architectureOpen Research Europe10.12688/openreseurope.18321.34(165)Online publication date: 14-Nov-2024
  • (2024)Co-designing ab initio electronic structure methods on a RISC-V vector architectureOpen Research Europe10.12688/openreseurope.18321.24(165)Online publication date: 28-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '21: Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
February 2021
507 pages
ISBN:9781450382946
DOI:10.1145/3437801
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 February 2021

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. NEC vector engine
  2. SpMV
  3. long-vector architectures
  4. performance optimization

Qualifiers

  • Research-article

Funding Sources

Conference

PPoPP '21

Acceptance Rates

PPoPP '21 Paper Acceptance Rate 31 of 150 submissions, 21%;
Overall Acceptance Rate 230 of 1,014 submissions, 23%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)153
  • Downloads (Last 6 weeks)16
Reflects downloads up to 12 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Near-Memory Parallel Indexing and Coalescing: Enabling Highly Efficient Indirect Access for SpMV2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546797(1-6)Online publication date: 25-Mar-2024
  • (2024)Co-designing ab initio electronic structure methods on a RISC-V vector architectureOpen Research Europe10.12688/openreseurope.18321.34(165)Online publication date: 14-Nov-2024
  • (2024)Co-designing ab initio electronic structure methods on a RISC-V vector architectureOpen Research Europe10.12688/openreseurope.18321.24(165)Online publication date: 28-Oct-2024
  • (2024)Co-designing ab initio electronic structure methods on a RISC-V vector architectureOpen Research Europe10.12688/openreseurope.18321.14(165)Online publication date: 5-Aug-2024
  • (2024)Machine Learning-Based Kernel Selector for SpMV Optimization in Graph AnalysisACM Transactions on Parallel Computing10.1145/365257911:2(1-25)Online publication date: 8-Jun-2024
  • (2024)AmgT: Algebraic Multigrid Solver on Tensor CoresSC24: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41406.2024.00058(1-16)Online publication date: 17-Nov-2024
  • (2024)Graph Computing on Long Vector Architectures (Yes, It Works!)2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00169(986-995)Online publication date: 27-May-2024
  • (2024)Exploiting long vectors with a CFD code: a co-design show case2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00047(453-464)Online publication date: 27-May-2024
  • (2024)VNEC: A Vectorized Non-Empty Column Format for SpMV on CPUs2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00011(14-25)Online publication date: 27-May-2024
  • (2024)Optimal Scheduling for the Performance Optimization of SpMV Computation using Machine Learning Techniques2024 7th International Conference on Information and Computer Technologies (ICICT)10.1109/ICICT62343.2024.00022(99-104)Online publication date: 15-Mar-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media