Open-Source SpMV Multiplication Hardware Accelerator for FPGA-Based HPC Systems

Mpakos, Panagiotis; Tasou, Ioanna; Alverti, Chloe; Miliadis, Panagiotis; Malakonakis, Pavlos; Theodoropoulos, Dimitris; Goumas, Georgios; Pnevmatikatos, Dionisios N.; Koziris, Nectarios

doi:10.1007/978-3-031-55673-9_2

Panagiotis Mpakos¹¹,
Ioanna Tasou¹¹,
Chloe Alverti¹³,
Panagiotis Miliadis¹¹,
Pavlos Malakonakis¹²,
Dimitris Theodoropoulos¹¹,
Georgios Goumas¹¹,
Dionisios N. Pnevmatikatos¹¹ &
…
Nectarios Koziris¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14553))

Included in the following conference series:

International Symposium on Applied Reconfigurable Computing

152 Accesses

Abstract

The Sparse Matrix Vector (SpMV) multiplication kernel is a key component of many high-performance computing applications, but at the same time one of the most challenging to optimize, primarily due to its low flop-per-byte ratio and irregular memory accesses. As such, modern FPGAs, combined with High-Bandwidth Memory (HBM) modules, are much better-suited to the memory-bound nature of this kernel, compared to general purpose CPUs. Current FPGA-based approaches on SpMV support only single-precision floating point arithmetic. Moreover, they target for highly-streamed implementations that, although enhance performance, facilitate custom matrix storage formats, which (i) can increase the matrix footprint up to 3x, and (ii) drop the burden of input matrix transformation to developers. Towards widening the spectrum of FPGA-supported floating point formats for sparse algebra, this paper presents a first set of effective optimizations for double-precision SpMV hardware kernels using High-Level Synthesis (HLS) tools on HBM-equipped FPGAs. Results show that our work can provide 52.4x on average better performance compared to state-of-practice SpMV double-precision multiplication implementations on FPGAs for applications with volatile matrices, and up to 5.1x better performance-per-Watt compared to server-class CPUs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Attarde, S., Joshi, S., Deshpande, Y., Puranik, S., Patkar, S.: Double precision sparse matrix vector multiplication accelerator on FPGA. In: International Conference on Pervasive and Embedded Computing and Communication Systems, pp. 476–484. IEEE (2021)
Google Scholar
Chen, X., Tan, H., Chen, Y., He, B., Wong, W.F., Chen, D.: ThunderGP: HLS-based graph processing framework on FPGAs. In: The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 69–80 (2021)
Google Scholar
Du, Y., Hu, Y., Zhou, Z., Zhang, Z.: High-performance sparse linear algebra on HBM-equipped FPGAs using HLS: a case study on SPMV. In: Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 54–64 (2022)
Google Scholar
Fowers, J., Ovtcharov, K., Strauss, K., Chung, E.S., Stitt, G.: A high memory bandwidth FPGA accelerator for sparse matrix-vector multiplication. In: FCCM 2014
Google Scholar
Gautier, Q., Althoff, A., Meng, P., Kastner, R.: Spector: an OpenCL FPGA benchmark suite. In: FPT 2016
Google Scholar
Giefers, H., Staar, P., Bekas, C., Hagleitner, C.: Analyzing the energy-efficiency of sparse matrix multiplication on heterogeneous systems: a comparative study of GPU, Xeon Phi and FPGA. In: 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 46–56. IEEE (2016)
Google Scholar
Grigoras, P., Burovskiy, P., Hung, E., Luk, W.: Accelerating SpMV on FPGAs by compressing nonzero values. In: 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines, pp. 64–67. IEEE (2015)
Google Scholar
Hosseinabady, M., Nunez-Yanez, J.L.: A streaming dataflow engine for sparse matrix-vector multiplication using high-level synthesis. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 39(6), 1272–1285 (2019)
Article Google Scholar
Hu, Y., Du, Y., Ustun, E., Zhang, Z.: GraphLily: accelerating graph linear algebra on HBM-equipped FPGAs. In: 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD), pp. 1–9. IEEE (2021)
Google Scholar
Intel Corporation: Intel math kernel library (2018). https://bit.ly/intel_mkl. Version 2018.1
Jain, A.K., Omidian, H., Fraisse, H., Benipal, M., Liu, L., Gaitonde, D.: A domain-specific architecture for accelerating sparse matrix vector multiplication on FPGAs. In: 2020 30th International Conference on Field-programmable Logic and Applications (FPL), pp. 127–132. IEEE (2020)
Google Scholar
Kestur, S., Davis, J.D., Chung, E.S.: Towards a universal FPGA matrix-vector multiplication architecture. In: 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines, pp. 9–16. IEEE (2012)
Google Scholar
Li, S., Liu, D., Liu, W.: Optimized data reuse via reordering for sparse matrix-vector multiplication on FPGAs. In: 2021 IEEE/ACM International Conference on Computer Aided Design (ICCAD), pp. 1–9. IEEE (2021)
Google Scholar
M3E: M3E matrix collection. https://bit.ly/m3e_matrix_collection
Mpakos, P., Papadopoulou, N., Alverti, C., Goumas, G., Koziris, N.: On the performance and energy efficiency of sparse matrix-vector multiplication on FPGAs. In: Parallel Computing: Technology Trends, pp. 624–633. IOS Press (2020)
Google Scholar
Oyarzun, G., Peyrolon, D., Alvarez, C., Martorell, X.: An FPGA cached sparse matrix vector product (SPMV) for unstructured computational fluid dynamics simulations. arXiv preprint arXiv:2107.12371 (2021)
Song, L., Chi, Y., Guo, L., Cong, J.: Serpens: a high bandwidth memory based accelerator for general-purpose sparse matrix-vector multiplication. In: Proceedings of the 59th ACM/IEEE Design Automation Conference, pp. 211–216 (2022)
Google Scholar
Xilinx: Vitis sparse library. https://bit.ly/vitis_sparse_library
Zhang, Y., Shalabi, Y.H., Jain, R., Nagar, K.K., Bakos, J.D.: FPGA vs. GPU for sparse matrix vector multiply. In: 2009 International Conference on Field-Programmable Technology, pp. 255–262. IEEE (2009)
Google Scholar
Zhuo, L., Prasanna, V.K.: Sparse matrix-vector multiplication on FPGAs. In: Proceedings of the 2005 ACM/SIGDA 13th International Symposium on Field-Programmable Gate Arrays, pp. 63–74 (2005)
Google Scholar

Download references

Acknowledgment

This project has received funding from the European High-Performance Computing Joint Undertaking Joint Undertaking (JU) under grant agreement No 955739 (project OPTIMA). The JU receives support from the European Union’s Horizon 2020 research and innovation programme and Greece, Germany, Italy, Netherlands, Spain, Switzerland.

Author information

Authors and Affiliations

Computing Systems Laboratory, National Technical University of Athens, Athens, Greece
Panagiotis Mpakos, Ioanna Tasou, Panagiotis Miliadis, Dimitris Theodoropoulos, Georgios Goumas, Dionisios N. Pnevmatikatos & Nectarios Koziris
Technical University of Crete, Chania, Greece
Pavlos Malakonakis
University of Illinois at Urbana-Champaign, Champaign, USA
Chloe Alverti

Authors

Panagiotis Mpakos
View author publications
You can also search for this author in PubMed Google Scholar
Ioanna Tasou
View author publications
You can also search for this author in PubMed Google Scholar
Chloe Alverti
View author publications
You can also search for this author in PubMed Google Scholar
Panagiotis Miliadis
View author publications
You can also search for this author in PubMed Google Scholar
Pavlos Malakonakis
View author publications
You can also search for this author in PubMed Google Scholar
Dimitris Theodoropoulos
View author publications
You can also search for this author in PubMed Google Scholar
Georgios Goumas
View author publications
You can also search for this author in PubMed Google Scholar
Dionisios N. Pnevmatikatos
View author publications
You can also search for this author in PubMed Google Scholar
Nectarios Koziris
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Panagiotis Mpakos .

Editor information

Editors and Affiliations

University of Aveiro, Aveiro, Portugal
Iouliia Skliarova
University of Seville, Sevilla, Spain
Piedad Brox Jiménez
Instituto Superior de Engenharia de Lisb, Lisbon, Portugal
Mário Véstias
University of Porto, Porto, Portugal
Pedro C. Diniz

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mpakos, P. et al. (2024). Open-Source SpMV Multiplication Hardware Accelerator for FPGA-Based HPC Systems. In: Skliarova, I., Brox Jiménez, P., Véstias, M., Diniz, P.C. (eds) Applied Reconfigurable Computing. Architectures, Tools, and Applications. ARC 2024. Lecture Notes in Computer Science, vol 14553. Springer, Cham. https://doi.org/10.1007/978-3-031-55673-9_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-55673-9_2
Published: 10 March 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-55672-2
Online ISBN: 978-3-031-55673-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Open-Source SpMV Multiplication Hardware Accelerator for FPGA-Based HPC Systems