skip to main content
10.1145/2966986.2966987guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

A data locality-aware design framework for reconfigurable sparse matrix-vector multiplication kernel

Published: 07 November 2016 Publication History

Abstract

Sparse matrix-vector multiplication (SpMV) is an important computational kernel in many applications. For performance improvement, software libraries designated for SpMV computation have been introduced, e.g., MKL library for CPUs and cuSPARSE library for GPUs. However, the computational throughput of these libraries is far below the peak floating-point performance offered by hardware platforms, because the efficiency of SpMV kernel is greatly constrained by the limited memory bandwidth and irregular data access patterns. In this work, we propose a data locality-aware design framework for FPGA-based SpMV acceleration. We first include the hardware constraints in sparse matrix compression at software level to regularize the memory allocation and accesses. Moreover, a distributed architecture composed of processing elements is developed to improve the computation parallelism. We implement the reconfigurable SpMV kernel on Convey HC-2<sup>ex</sup> and conduct the evaluation by using the University of Florida sparse matrix collection. The experiments demonstrate an average computational efficiency of 48.2%, which is a lot better than those of CPU and GPU implementations. Our FPGA-based kernel has a comparable runtime as GPU, and achieves 2.1&#x00D7; reduction than CPU. Moreover, our design obtains substantial saving in energy consumption, say, 9.3&#x00D7; and 5.6&#x00D7; better than the implementations on CPU and GPU, respectively.

9. References

[1]
A. Yzelman and R. H. Bisseling, “Cache-oblivious sparse matrix-vector multiplication by using sparse matrix partitioning methods”, SIAM Journal on ScientificComputing, vol. 31, no. 4, pp. 3128–3154, 2009.
[2]
P. Sonneveld and M. B. van Gijzen, “Idr (s): A family of simple and fast algorithms for solving large nonsymmetric systems of linear equations”, SIAM Journal on Scientific Computing, vol. 31, no. 2, pp. 1035–1062, 2008.
[3]
J. Fowers, K. Ovtcharov, K. Strauss, E. S. Chung, and G. Stitt, “A high memory bandwidth fpga accelerator for sparse matrix-vector multiplication”, in Proceedings of the International Symposium on Field-Programmable Custom Computing Machines, pp. 36–43, 2014.
[4]
[6]
R. Dorrance, F. Ren, and D. Marković, “A scalable sparse matrix-vector multiplication kernel for energy-efficient sparse-blas on fpgas”, in Proceedings of the ACM/SIGDA international symposium on Field-programmable gate arrays, pp. 161–170, ACM, 2014.
[7]
S. Sun, M. Monga, P. Jones, and J. Zambreno, “An I/O bandwidth-sensitive sparse matrix-vector multiplication engine on fpgas”, Circuits and Systems I: Regular Papers, IEEE Transactions on. vol. 59. no. 1. pp. 113–123. 2012.
[8]
M. DeLorimier and A. DeHon, “Floating-point sparse matrix-vector multiply for fpgas”, in Proceedings of the ACM/SIGDA 13th international symposium on Field-programmable gate arrays, pp. 75–85, ACM, 2005.
[9]
Convey computer Convey Reference Manual, 2012. http://www.conveycomputer.com/.
[10]
T. A. Davis and Y. Hu, “The university of florida sparse matrix collection”, ACM Trans. Math. Softw, vol. 38, no. 1, 2011.
[11]
M. Wolf and B. Miller, “Sparse matrix partitioning for parallel eigenanalysis of large static and dynamic graphs”, in IEEE, High Performance Extreme Computing Conference, pp. 1–6, 2014.
[12]
C. Lin, H. Kwok-Hay So, and P. Leong, “A model for matrix multiplication performance on fpgas”, in International Conference on Field Programmable Logic and Applications, pp. 305–310, 2011.
[13]
P. Grigoras, P. Burovskiy, E. Hung, and W. Luk, “Accelerating spmv on fpgas by compressing nonzero values”, in IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines, pp. 64–67, 2015.
[14]
Y. Shan, T. Wu, Y. Wang, B. Wang, Z. Wang, N. Xu, and H. Yang, “Fpga and gpu implementation of large scale Spmv”. in IEEE 8th Symposium on Application Specific Processors„ pp. 64–70, 2010.
[15]
S. Jain, R. Pottathuparambil, and R. Sass, “Implications of memory-efficiency on sparse matrix-vector multiplication”, in Symposium on Application Accelerators in High-Performance Computing, pp. 80–83, IEEE, 2014.
[16]
A. Rafique, G. Constantinides, N. Kapre et al., “Communication optimization of iterative sparse matrix-vector multiply on gpus and fpgas”, IEEE Transactions on Parallel and Distributed Systems, vol. 26, no. 1, pp. 24–34, 2015.
[17]
E. G. Boman, K. D. Devine, and S. Rajamanickam, “Scalable matrix computations on large scale-free graphs using 2d graph partitioning”, in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 50, ACM, 2013.
[18]
U. Catalyurek, M. Deveci, K. Kaya, and B. Ucar, “Multithreaded clustering for multi-level hypergraph partitioning”, in IEEE 26th InternationalParallel Distributed Processing Symposium„ pp. 848–859, 2012.
[19]
K. K. Nagar and J. D. Bakos, “A sparse matrix personality for the convey hc-1” in IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines, pp. 1–8, IEEE, 2011.

Cited By

View all
  • (2024)Power and Delay Efficient Approximate Sparse Matrix Vector Multiplication on FPGA using HLS2024 3rd International Conference for Innovation in Technology (INOCON)10.1109/INOCON60754.2024.10512183(1-6)Online publication date: 1-Mar-2024
  • (2023)A Survey of Accelerating Parallel Sparse Linear AlgebraACM Computing Surveys10.1145/360460656:1(1-38)Online publication date: 28-Aug-2023
  • (2023)An Efficient Gustavson-Based Sparse Matrix–Matrix Multiplication Accelerator on Embedded FPGAsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.328171942:12(4671-4680)Online publication date: Dec-2023
  • Show More Cited By

Index Terms

  1. A data locality-aware design framework for reconfigurable sparse matrix-vector multiplication kernel
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Guide Proceedings
        2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
        Nov 2016
        946 pages

        Publisher

        IEEE Press

        Publication History

        Published: 07 November 2016

        Permissions

        Request permissions for this article.

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 05 Mar 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Power and Delay Efficient Approximate Sparse Matrix Vector Multiplication on FPGA using HLS2024 3rd International Conference for Innovation in Technology (INOCON)10.1109/INOCON60754.2024.10512183(1-6)Online publication date: 1-Mar-2024
        • (2023)A Survey of Accelerating Parallel Sparse Linear AlgebraACM Computing Surveys10.1145/360460656:1(1-38)Online publication date: 28-Aug-2023
        • (2023)An Efficient Gustavson-Based Sparse Matrix–Matrix Multiplication Accelerator on Embedded FPGAsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.328171942:12(4671-4680)Online publication date: Dec-2023
        • (2023)Efficient FPGA-Based Sparse Matrix–Vector Multiplication With Data Reuse-Aware CompressionIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.328171542:12(4606-4617)Online publication date: Dec-2023
        • (2023)ReMCOO: An Efficient Representation of Sparse Matrix-Vector Multiplication2023 IEEE Guwahati Subsection Conference (GCON)10.1109/GCON58516.2023.10183488(01-06)Online publication date: 23-Jun-2023
        • (2021)Optimized Data Reuse via Reordering for Sparse Matrix-Vector Multiplication on FPGAs2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)10.1109/ICCAD51958.2021.9643453(1-9)Online publication date: 1-Nov-2021
        • (2020)A Streaming Dataflow Engine for Sparse Matrix-Vector Multiplication Using High-Level SynthesisIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2019.291292339:6(1272-1285)Online publication date: Jun-2020
        • (2020)A Domain-Specific Architecture for Accelerating Sparse Matrix Vector Multiplication on FPGAs2020 30th International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL50879.2020.00031(127-132)Online publication date: Aug-2020

        View Options

        View options

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media