skip to main content
research-article

Optimizing memory bandwidth use and performance for matrix-vector multiplication in iterative methods

Published: 22 August 2011 Publication History

Abstract

Computing the solution to a system of linear equations is a fundamental problem in scientific computing, and its acceleration has drawn wide interest in the FPGA community [Morris et al. 2006; Zhang et al. 2008; Zhuo and Prasanna 2006]. One class of algorithms to solve these systems, iterative methods, has drawn particular interest, with recent literature showing large performance improvements over General-Purpose Processors (GPPs) [Lopes and Constantinides 2008]. In several iterative methods, this performance gain is largely a result of parallelization of the matrix-vector multiplication, an operation that occurs in many applications and hence has also been widely studied on FPGAs [Zhuo and Prasanna 2005; El-Kurdi et al. 2006]. However, whilst the performance of matrix-vector multiplication on FPGAs is generally I/O bound [Zhuo and Prasanna 2005], the nature of iterative methods allows the use of on-chip memory buffers to increase the bandwidth, providing the potential for significantly more parallelism [deLorimier and DeHon 2005]. Unfortunately, existing approaches have generally only either been capable of solving large matrices with limited improvement over GPPs [Zhuo and Prasanna 2005; El-Kurdi et al. 2006; deLorimier and DeHon 2005], or achieve high performance for relatively small matrices [Lopes and Constantinides 2008; Boland and Constantinides 2008]. This article proposes hardware designs to take advantage of symmetrical and banded matrix structure, as well as methods to optimize the RAM use, in order to both increase the performance and retain this performance for larger-order matrices.

References

[1]
Barrett, R., Berry, M., Chan, T. F., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., and der Vorst, H. V. 1994. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd Ed. SIAM, Philadelphia, PA.
[2]
Boland, D. and Constantinides, G. 2008. An FPGA-based implementation of the MINRES algorithm. In Proceedings of the International Conference on Field Programmable Logic and Applications. 379--384.
[3]
Boland, D. and Constantinides, G. 2010. Optimising memory bandwidth use for matrix-vector multiplication in iterative methods. In Proceedings of the International Symposium on Applied Reconfigurable Computing. 169--181.
[4]
deLorimier, M. and DeHon, A. 2005. Floating-Point sparse matrix-vector multiply for FPGAs. In Proceedings of the ACM/SIGDA 13th International Symposium on Field-Programmable Gate Arrays. ACM, New York, 75--85.
[5]
El-Kurdi, Y., Gross, W. J., and Giannacopoulos, D. 2006. Sparse matrix-vector multiplication for finite element method matrices on FPGAs. In Proceedings of the International Symposium on Field-Programmable Custom Computing Machines. 293--294.
[6]
Golub, G. H. and Loan, C. F. V. 1996. Matrix Computations, 3rd Ed. Johns Hopkins University Press, Baltimore, MD.
[7]
Heath, M. T. 2001. Scientific Computing. McGraw-Hill Higher Education.
[8]
Hoekstra, A. G., Sloot, P., Hoffmann, W., and Hertzberger, L. 1992. Time complexity of a parallel conjugate gradient solver for light scattering simulations: Theory and spmd implementation. Tech. rep., University of Amsterdam.
[9]
Ilog, Inc. 2009. Solver cplex. http://www.ilog.fr/products/cplex/.
[10]
Lopes, A., Constantinides, G., and Kerrigan, E. C. 2008. A floating-point solver for band structured linear equations. In Proceedings of the International Conference on Field Programmable Technology. 353--356.
[11]
Lopes, A. R. and Constantinides, G. A. 2008. A high throughput FPGA-based floating point conjugate gradient implementation. In Proceedings of the Applied Reconfigurable Recomputing. 75--86.
[12]
Morris, G. R., Prasanna, V. K., and Anderson, R. D. 2006. A hybrid approach for mapping conjugate gradient onto an FPGA-augmented reconfigurable supercomputer. In Proceedings of the 14th IEEE Symposium on Field-Programmable Custom Computing Machines. 3--12.
[13]
Sewell, G. 1988. The Numerical Solution of Ordinary and Partial Differential Equations. Academic Press Professional, San Diego, CA.
[14]
Winston, W. L. 2003. Introduction to Mathematical Programming: Applications and Algorithms. Duxbury Resource Center.
[15]
Xilinx. 2010. Virtex-5 FPGA User Guide. http://www.xilinx.com/support/documentation/user-guides/ug190.pdf.
[16]
Zhang, W., Betz, V., and Rose, J. 2008. Portable and scalable FPGA-based acceleration of a direct linear system solver. In Proceedings of the International Conference on Field Programmable Technology. 17--24.
[17]
Zhuo, L., Morris, G. R., and Prasanna, V. K. 2007. High-Performance reduction circuits using deeply pipelined operators on FPGAs. IEEE Trans. Parall. Distrib. Syst. 18, 10, 1377--1392.
[18]
Zhuo, L. and Prasanna, V. K. 2005. Sparse matrix-vector multiplication on FPGAs. In Proceedings of the International Symposium on Field-Programmable Gate Arrays. ACM, New York, 63--74.
[19]
Zhuo, L. and Prasanna, V. K. 2006. High-Performance and parameterized matrix factorization on fpgas. In Proceedings of the International Conference on Field Programmable Logic and Applications. 1--6.

Cited By

View all
  • (2024)Hybrid CPU-GPU solution to regularized divergence-free curl-curl equations for electromagnetic inversion problemsComputers & Geosciences10.1016/j.cageo.2024.105518184:COnline publication date: 1-Feb-2024
  • (2022)Mixed and Multi-Precision SpMV for GPUs with Row-wise Precision Selection2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD55451.2022.00014(31-40)Online publication date: Nov-2022
  • (2020)Performance Analysis of Sparse Matrix-Vector Multiplication (SpMV) on Graphics Processing Units (GPUs)Electronics10.3390/electronics91016759:10(1675)Online publication date: 13-Oct-2020
  • Show More Cited By

Index Terms

  1. Optimizing memory bandwidth use and performance for matrix-vector multiplication in iterative methods

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Reconfigurable Technology and Systems
    ACM Transactions on Reconfigurable Technology and Systems  Volume 4, Issue 3
    August 2011
    204 pages
    ISSN:1936-7406
    EISSN:1936-7414
    DOI:10.1145/2000832
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 August 2011
    Accepted: 01 April 2011
    Revised: 01 March 2011
    Received: 01 September 2010
    Published in TRETS Volume 4, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Iterative methods
    2. integer linear programming

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 17 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Hybrid CPU-GPU solution to regularized divergence-free curl-curl equations for electromagnetic inversion problemsComputers & Geosciences10.1016/j.cageo.2024.105518184:COnline publication date: 1-Feb-2024
    • (2022)Mixed and Multi-Precision SpMV for GPUs with Row-wise Precision Selection2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD55451.2022.00014(31-40)Online publication date: Nov-2022
    • (2020)Performance Analysis of Sparse Matrix-Vector Multiplication (SpMV) on Graphics Processing Units (GPUs)Electronics10.3390/electronics91016759:10(1675)Online publication date: 13-Oct-2020
    • (2018)Nonlinear predictive control on a heterogeneous computing platformControl Engineering Practice10.1016/j.conengprac.2018.06.01678(105-115)Online publication date: Sep-2018
    • (2017)Dynamic bitwidth assignment for efficient dot products2017 27th International Conference on Field Programmable Logic and Applications (FPL)10.23919/FPL.2017.8056829(1-8)Online publication date: Sep-2017
    • (2017)Nonlinear predictive control on a heterogeneous computing platformIFAC-PapersOnLine10.1016/j.ifacol.2017.08.141350:1(11877-11882)Online publication date: Jul-2017
    • (2015)Reconfigurable Computing ArchitecturesProceedings of the IEEE10.1109/JPROC.2014.2386883103:3(332-354)Online publication date: Mar-2015
    • (2014)A scalable and compact systolic architecture for linear solvers2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors10.1109/ASAP.2014.6868658(186-187)Online publication date: Jun-2014
    • (2013)Revisiting the reduction circuit: A case study for simultaneous architecture and precision optimisation2013 International Conference on Field-Programmable Technology (FPT)10.1109/FPT.2013.6718401(410-413)Online publication date: Dec-2013
    • (2012)A scalable approach for automated precision analysisProceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays10.1145/2145694.2145726(185-194)Online publication date: 22-Feb-2012

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media