Abstract
The Sparse Matrix Vector (SpMV) multiplication kernel is a key component of many high-performance computing applications, but at the same time one of the most challenging to optimize, primarily due to its low flop-per-byte ratio and irregular memory accesses. As such, modern FPGAs, combined with High-Bandwidth Memory (HBM) modules, are much better-suited to the memory-bound nature of this kernel, compared to general purpose CPUs. Current FPGA-based approaches on SpMV support only single-precision floating point arithmetic. Moreover, they target for highly-streamed implementations that, although enhance performance, facilitate custom matrix storage formats, which (i) can increase the matrix footprint up to 3x, and (ii) drop the burden of input matrix transformation to developers. Towards widening the spectrum of FPGA-supported floating point formats for sparse algebra, this paper presents a first set of effective optimizations for double-precision SpMV hardware kernels using High-Level Synthesis (HLS) tools on HBM-equipped FPGAs. Results show that our work can provide 52.4x on average better performance compared to state-of-practice SpMV double-precision multiplication implementations on FPGAs for applications with volatile matrices, and up to 5.1x better performance-per-Watt compared to server-class CPUs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Attarde, S., Joshi, S., Deshpande, Y., Puranik, S., Patkar, S.: Double precision sparse matrix vector multiplication accelerator on FPGA. In: International Conference on Pervasive and Embedded Computing and Communication Systems, pp. 476–484. IEEE (2021)
Chen, X., Tan, H., Chen, Y., He, B., Wong, W.F., Chen, D.: ThunderGP: HLS-based graph processing framework on FPGAs. In: The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 69–80 (2021)
Du, Y., Hu, Y., Zhou, Z., Zhang, Z.: High-performance sparse linear algebra on HBM-equipped FPGAs using HLS: a case study on SPMV. In: Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 54–64 (2022)
Fowers, J., Ovtcharov, K., Strauss, K., Chung, E.S., Stitt, G.: A high memory bandwidth FPGA accelerator for sparse matrix-vector multiplication. In: FCCM 2014
Gautier, Q., Althoff, A., Meng, P., Kastner, R.: Spector: an OpenCL FPGA benchmark suite. In: FPT 2016
Giefers, H., Staar, P., Bekas, C., Hagleitner, C.: Analyzing the energy-efficiency of sparse matrix multiplication on heterogeneous systems: a comparative study of GPU, Xeon Phi and FPGA. In: 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 46–56. IEEE (2016)
Grigoras, P., Burovskiy, P., Hung, E., Luk, W.: Accelerating SpMV on FPGAs by compressing nonzero values. In: 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines, pp. 64–67. IEEE (2015)
Hosseinabady, M., Nunez-Yanez, J.L.: A streaming dataflow engine for sparse matrix-vector multiplication using high-level synthesis. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 39(6), 1272–1285 (2019)
Hu, Y., Du, Y., Ustun, E., Zhang, Z.: GraphLily: accelerating graph linear algebra on HBM-equipped FPGAs. In: 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD), pp. 1–9. IEEE (2021)
Intel Corporation: Intel math kernel library (2018). https://bit.ly/intel_mkl. Version 2018.1
Jain, A.K., Omidian, H., Fraisse, H., Benipal, M., Liu, L., Gaitonde, D.: A domain-specific architecture for accelerating sparse matrix vector multiplication on FPGAs. In: 2020 30th International Conference on Field-programmable Logic and Applications (FPL), pp. 127–132. IEEE (2020)
Kestur, S., Davis, J.D., Chung, E.S.: Towards a universal FPGA matrix-vector multiplication architecture. In: 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines, pp. 9–16. IEEE (2012)
Li, S., Liu, D., Liu, W.: Optimized data reuse via reordering for sparse matrix-vector multiplication on FPGAs. In: 2021 IEEE/ACM International Conference on Computer Aided Design (ICCAD), pp. 1–9. IEEE (2021)
M3E: M3E matrix collection. https://bit.ly/m3e_matrix_collection
Mpakos, P., Papadopoulou, N., Alverti, C., Goumas, G., Koziris, N.: On the performance and energy efficiency of sparse matrix-vector multiplication on FPGAs. In: Parallel Computing: Technology Trends, pp. 624–633. IOS Press (2020)
Oyarzun, G., Peyrolon, D., Alvarez, C., Martorell, X.: An FPGA cached sparse matrix vector product (SPMV) for unstructured computational fluid dynamics simulations. arXiv preprint arXiv:2107.12371 (2021)
Song, L., Chi, Y., Guo, L., Cong, J.: Serpens: a high bandwidth memory based accelerator for general-purpose sparse matrix-vector multiplication. In: Proceedings of the 59th ACM/IEEE Design Automation Conference, pp. 211–216 (2022)
Xilinx: Vitis sparse library. https://bit.ly/vitis_sparse_library
Zhang, Y., Shalabi, Y.H., Jain, R., Nagar, K.K., Bakos, J.D.: FPGA vs. GPU for sparse matrix vector multiply. In: 2009 International Conference on Field-Programmable Technology, pp. 255–262. IEEE (2009)
Zhuo, L., Prasanna, V.K.: Sparse matrix-vector multiplication on FPGAs. In: Proceedings of the 2005 ACM/SIGDA 13th International Symposium on Field-Programmable Gate Arrays, pp. 63–74 (2005)
Acknowledgment
This project has received funding from the European High-Performance Computing Joint Undertaking Joint Undertaking (JU) under grant agreement No 955739 (project OPTIMA). The JU receives support from the European Union’s Horizon 2020 research and innovation programme and Greece, Germany, Italy, Netherlands, Spain, Switzerland.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Disclosure of Interests
The authors have no competing interests to declare that are relevant to the content of this article.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Mpakos, P. et al. (2024). Open-Source SpMV Multiplication Hardware Accelerator for FPGA-Based HPC Systems. In: Skliarova, I., Brox Jiménez, P., Véstias, M., Diniz, P.C. (eds) Applied Reconfigurable Computing. Architectures, Tools, and Applications. ARC 2024. Lecture Notes in Computer Science, vol 14553. Springer, Cham. https://doi.org/10.1007/978-3-031-55673-9_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-55673-9_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-55672-2
Online ISBN: 978-3-031-55673-9
eBook Packages: Computer ScienceComputer Science (R0)