Optimising Memory Bandwidth Use for Matrix-Vector Multiplication in Iterative Methods

Boland, David; Constantinides, George A.

doi:10.1007/978-3-642-12133-3_17

David Boland²⁰ &
George A. Constantinides²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5992))

Included in the following conference series:

International Symposium on Applied Reconfigurable Computing

1597 Accesses

Abstract

Computing the solution to a system of linear equations is a fundamental problem in scientific computing, and its acceleration has drawn wide interest in the FPGA community [1, 2, 3]. One class of algorithms to solve these systems, iterative methods, has drawn particular interest, with recent literature showing large performance improvements over general purpose processors (GPPs). In several iterative methods, this performance gain is largely a result of parallelisation of the matrixvector multiplication, an operation that occurs in many applications and hence has also been widely studied on FPGAs [4, 5]. However, whilst the performance of matrix-vector multiplication on FPGAs is generally I/O bound [4], the nature of iterative methods allows the use of onchip memory buffers to increase the bandwidth, providing the potential for significantly more parallelism [6]. Unfortunately, existing approaches have generally only either been capable of solving large matrices with limited improvement over GPPs [4,5,6], or achieve high performance for relatively small matrices [2,3]. This paper proposes hardware designs to take advantage of symmetrical and banded matrix structure, as well as methods to optimise the RAM use, in order to both increase the performance and retain this performance for larger order matrices.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Design Principles for Sparse Matrix Multiplication on the GPU

GPU vs FPGA: A Comparative Analysis for Non-standard Precision

Performance Characteristics for Sparse Matrix-Vector Multiplication on GPUs

References

Morris, G.R., Prasanna, V.K., Anderson, R.D.: A hybrid approach for mapping conjugate gradient onto an FPGA-augmented reconfigurable supercomputer. In: Proc. 14th IEEE Symp. Field-Programmable Custom Computing Machines, pp. 3–12 (2006)
Google Scholar
Lopes, A.R., Constantinides, G.A.: A high throughput FPGA-based floating point conjugate gradient implementation. In: Proc. Applied Reconfigurable Recomputing, pp. 75–86 (2008)
Google Scholar
Boland, D., Constantinides, G.: An FPGA-based implementation of the MINRES algorithm. In: Proc. Int. Conf. Field Programmable Logic and Applications, September 2008, pp. 379–384 (2008)
Google Scholar
Zhuo, L., Prasanna, V.K.: Sparse matrix-vector multiplication on FPGAs. In: Proc. ACM/SIGDA 13th Int. Symp. on Field-Programmable Gate Arrays, pp. 63–74. ACM, New York (2005)
Google Scholar
El-Kurdi, Y., Gross, W.J., Giannacopoulos, D.: Sparse matrix-vector multiplication for finite element method matrices on FPGAs. In: Proc. IEEE Symp. Field-Programmable Custom Computing Machines, pp. 293–294 (2006)
Google Scholar
de Lorimier, M., DeHon, A.: Floating-point sparse matrix-vector multiply for FPGAs. In: Proc. ACM/SIGDA 13th Int. Symp. on Field-Programmable Gate Arrays, pp. 75–85. ACM, New York (2005)
Google Scholar
Heath, M.T.: Scientific Computing. McGraw-Hill Higher Education, New York (2001)
Google Scholar
Golub, G.H., Loan, C.F.V.: Matrix computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)
MATH Google Scholar
Hoekstra, A.G., Sloot, P., Hoffmann, W., Hertzberger, L.: Time complexity of a parallel conjugate gradient solver for light scattering simulations: Theory and spmd implementation, Tech. Rep. (1992)
Google Scholar
Sewell, G.: The numerical solution of ordinary and partial differential equations. Academic Press Professional, Inc., San Diego (1988)
MATH Google Scholar
Barrett, R., Berry, M., Chan, T.F., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., der Vorst, H.V.: Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd edn. SIAM, Philadelphia (1994)
Book Google Scholar
Zhuo, L., Morris, G.R., Prasanna, V.K.: High-performance reduction circuits using deeply pipelined operators on FPGAs. IEEE Trans. Parallel Distrib. Syst. 18(10), 1377–1392 (2007)
Article Google Scholar
Lopes, A., Constantinides, G., Kerrigan, E.C.: A floating-point solver for band structured linear equations. In: Proc. Int. Conf. Field Programmable Technology, pp. 353–356 (2008)
Google Scholar
Xilinx, Virtex-5 FPGA User Guide
Google Scholar
Ilog, Inc., Solver cplex (2009), http://www.ilog.fr/products/cplex/ (accessed November 2, 2009)

Download references

Author information

Authors and Affiliations

Electrical and Electronic Engineering Department, Imperial College London, London, SW7 2AZ, UK
David Boland & George A. Constantinides

Authors

David Boland
View author publications
You can also search for this author in PubMed Google Scholar
George A. Constantinides
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electronic Engineering, Mahanakorn University of Technology, 10530, Bangkok, Thailand
Phaophak Sirisuk
Department of Electronic Engineering, National University of Ireland, Galway, Ireland
Fearghal Morgan
Department of Electrical and Computer Engineering, The George Washington University, 20052, Washington, DC, USA
Tarek El-Ghazawi
Department of Information and Computer Science, Yokohama, Keio University, 223–8522, Kanagawa, Japan
Hideharu Amano

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Boland, D., Constantinides, G.A. (2010). Optimising Memory Bandwidth Use for Matrix-Vector Multiplication in Iterative Methods. In: Sirisuk, P., Morgan, F., El-Ghazawi, T., Amano, H. (eds) Reconfigurable Computing: Architectures, Tools and Applications. ARC 2010. Lecture Notes in Computer Science, vol 5992. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12133-3_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-12133-3_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12132-6
Online ISBN: 978-3-642-12133-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Optimising Memory Bandwidth Use for Matrix-Vector Multiplication in Iterative Methods

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Design Principles for Sparse Matrix Multiplication on the GPU

GPU vs FPGA: A Comparative Analysis for Non-standard Precision

Performance Characteristics for Sparse Matrix-Vector Multiplication on GPUs

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Optimising Memory Bandwidth Use for Matrix-Vector Multiplication in Iterative Methods

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Design Principles for Sparse Matrix Multiplication on the GPU

GPU vs FPGA: A Comparative Analysis for Non-standard Precision

Performance Characteristics for Sparse Matrix-Vector Multiplication on GPUs

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation