Performance Modelling-Driven Optimization of RISC-V Hardware for Efficient SpMV

Rodrigues, Alexandre; Sousa, Leonel; Ilic, Aleksandar

doi:10.1007/978-3-031-40843-4_36

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13999))

Included in the following conference series:

International Conference on High Performance Computing

1085 Accesses
1 Citations

Abstract

The growing need for inference on edge devices brings with it a necessity for efficient hardware, optimized for particular computational kernels, such as Sparse Matrix-Vector Multiplication (SpMV). With the RISC-V Instruction Set Architecture (ISA) providing unprecedented freedom to hardware designers, there is now a greater opportunity to tailor these microarchitectures to both the application requirements and the data it is expected to process. In this paper, we demonstrate the use of the insights provided by the Cache-Aware Roofline Model (CARM) in the hardware design process, optimizing a RISC-V architecture for efficient and performant execution of SpMV. Specifically, we assess the effect architectural parameters associated with the processor’s cache and floating-point unit have on the architecture and SpMV performance. Following a reparameterization closely guided by the CARM, we demonstrate a \(2.04\times \) improvement in performance and a significant decrease in underused computational resources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alappat, C., et al.: Level-based blocking for sparse matrices: sparse matrix-power-vector multiplication. IEEE Trans. Parallel Distrib. Syst. 34(2), 581–597 (2023)
Article Google Scholar
Chen, X., Chen, Y., et al.: ReGraph: scaling graph processing on HBM-enabled FPGAs with heterogeneous pipelines. Technical report (2022). arXiv:2203.02676 [cs] type: article
Davis, T.A., Hu, Y.: The university of Florida sparse matrix collection. ACM Trans. Math. Softw. 38(1), 1–25 (2011)
MathSciNet MATH Google Scholar
Elafrou, A., Goumas, G., Koziris, N.: Conflict-free symmetric sparse matrix-vector multiplication on multicore architectures. In: International Conference for High Performance Computing. Networking, Storage and Analysis, Denver, Colorado, pp. 1–15. ACM (2019)
Google Scholar
Ilic, A., Pratas, F., Sousa, L.: Cache-aware roofline model: upgrading the loft. IEEE Comput. Archit. Lett. 13(1), 21–24 (2014)
Article Google Scholar
Koohi Esfahani, M., Kilpatrick, P., Vandierendonck, H.: Exploiting in-hub temporal locality in SpMV-based graph processing. In: International Conference on Parallel Processing, Lemont, IL, USA, pp. 1–10. ACM (2021)
Google Scholar
Li, S., Liu, D., Liu, W.: Optimized data reuse via reordering for sparse matrix-vector multiplication on FPGAs. In: IEEE/ACM International Conference On Computer Aided Design (ICCAD), Munich, Germany, pp. 1–9. IEEE (2021)
Google Scholar
Lowe-Power, J., et al.: The gem5 Simulator: Version 20.0+. arXiv:2007.03152 [cs] (2020)
Marques, D., Duarte, H., et al.: Performance analysis with cache-aware roofline model in intel advisor. In: 2017 International Conference on High Performance Computing & Simulation (HPCS), pp. 898–907 (2017)
Google Scholar
Namashivavam, N., Mehta, S., Yew, P.C.: Variable-sized blocks for locality-aware SpMV. In: IEEE/ACM International Symposium on Code Generation and Optimization (CGO), Seoul, South Korea, pp. 211–221. IEEE (2021)
Google Scholar
Shuvo, M.M.H., et al.: Efficient acceleration of deep learning inference on resource-constrained edge devices: a review. Proc. IEEE 111(1), 42–91 (2023)
Article Google Scholar
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for floating-point programs and multicore architectures. Technical report 1407078 (2009)
Google Scholar
Xia, T., et al.: A comprehensive performance model of sparse matrix-vector multiplication to guide kernel optimization. IEEE Trans. Parallel Distrib. Syst. 34(2), 519–534 (2023)
Article Google Scholar
Yesil, S., et al.: WISE: predicting the performance of sparse matrix vector multiplication with machine learning. In: ACM Symposium on Principles and Practice of Parallel Programming, Montreal, Canada, pp. 329–341. ACM (2023)
Google Scholar
Zhao, H., et al.: Exploring better speculation and data locality in sparse matrix-vector multiplication on Intel Xeon. In: IEEE International Conference on Computer Design (ICCD), Hartford, CT, USA, pp. 601–609. IEEE (2020)
Google Scholar

Download references

Acknowledgement

This project has received funding from the European High Performance Computing Joint Undertaking (JU) under Framework Partnership Agreement No 800928 and Specific Grant Agreement No 101036168 (EPI SGA2) and Grant agreement No 956213 (SparCity). The JU receives support from the European Union’s Horizon 2020 research and innovation programme and from Croatia, France, Germany, Greece, Italy, Netherlands, Norway, Portugal, Spain, Sweden, Switzerland and Turkey. It also received funding from FCT (Fundação para a Ciência e a Tecnologia, Portugal), through the UIDB/50021/2020 project.

Author information

Authors and Affiliations

INESC-ID, Instituto Superior Tecnico, Universidade de Lisboa, Rua Alves Redol, 9, 1000-029, Lisbon, Portugal
Alexandre Rodrigues, Leonel Sousa & Aleksandar Ilic

Authors

Alexandre Rodrigues
View author publications
You can also search for this author in PubMed Google Scholar
Leonel Sousa
View author publications
You can also search for this author in PubMed Google Scholar
Aleksandar Ilic
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexandre Rodrigues .

Editor information

Editors and Affiliations

University of New Mexico, Albuquerque, NM, USA
Amanda Bienz
University of Edinburgh, Edinburgh, UK
Michèle Weiland
Université Paris-Saclay, Gif sur Yvette, France
Marc Baboulin
CERFACS, Toulouse, France
Carola Kruse

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rodrigues, A., Sousa, L., Ilic, A. (2023). Performance Modelling-Driven Optimization of RISC-V Hardware for Efficient SpMV. In: Bienz, A., Weiland, M., Baboulin, M., Kruse, C. (eds) High Performance Computing. ISC High Performance 2023. Lecture Notes in Computer Science, vol 13999. Springer, Cham. https://doi.org/10.1007/978-3-031-40843-4_36

Download citation

DOI: https://doi.org/10.1007/978-3-031-40843-4_36
Published: 25 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40842-7
Online ISBN: 978-3-031-40843-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Performance Modelling-Driven Optimization of RISC-V Hardware for Efficient SpMV