Abstract
Computing the solution to a system of linear equations is a fundamental problem in scientific computing, and its acceleration has drawn wide interest in the FPGA community [1, 2, 3]. One class of algorithms to solve these systems, iterative methods, has drawn particular interest, with recent literature showing large performance improvements over general purpose processors (GPPs). In several iterative methods, this performance gain is largely a result of parallelisation of the matrixvector multiplication, an operation that occurs in many applications and hence has also been widely studied on FPGAs [4, 5]. However, whilst the performance of matrix-vector multiplication on FPGAs is generally I/O bound [4], the nature of iterative methods allows the use of onchip memory buffers to increase the bandwidth, providing the potential for significantly more parallelism [6]. Unfortunately, existing approaches have generally only either been capable of solving large matrices with limited improvement over GPPs [4,5,6], or achieve high performance for relatively small matrices [2,3]. This paper proposes hardware designs to take advantage of symmetrical and banded matrix structure, as well as methods to optimise the RAM use, in order to both increase the performance and retain this performance for larger order matrices.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Morris, G.R., Prasanna, V.K., Anderson, R.D.: A hybrid approach for mapping conjugate gradient onto an FPGA-augmented reconfigurable supercomputer. In: Proc. 14th IEEE Symp. Field-Programmable Custom Computing Machines, pp. 3–12 (2006)
Lopes, A.R., Constantinides, G.A.: A high throughput FPGA-based floating point conjugate gradient implementation. In: Proc. Applied Reconfigurable Recomputing, pp. 75–86 (2008)
Boland, D., Constantinides, G.: An FPGA-based implementation of the MINRES algorithm. In: Proc. Int. Conf. Field Programmable Logic and Applications, September 2008, pp. 379–384 (2008)
Zhuo, L., Prasanna, V.K.: Sparse matrix-vector multiplication on FPGAs. In: Proc. ACM/SIGDA 13th Int. Symp. on Field-Programmable Gate Arrays, pp. 63–74. ACM, New York (2005)
El-Kurdi, Y., Gross, W.J., Giannacopoulos, D.: Sparse matrix-vector multiplication for finite element method matrices on FPGAs. In: Proc. IEEE Symp. Field-Programmable Custom Computing Machines, pp. 293–294 (2006)
de Lorimier, M., DeHon, A.: Floating-point sparse matrix-vector multiply for FPGAs. In: Proc. ACM/SIGDA 13th Int. Symp. on Field-Programmable Gate Arrays, pp. 75–85. ACM, New York (2005)
Heath, M.T.: Scientific Computing. McGraw-Hill Higher Education, New York (2001)
Golub, G.H., Loan, C.F.V.: Matrix computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)
Hoekstra, A.G., Sloot, P., Hoffmann, W., Hertzberger, L.: Time complexity of a parallel conjugate gradient solver for light scattering simulations: Theory and spmd implementation, Tech. Rep. (1992)
Sewell, G.: The numerical solution of ordinary and partial differential equations. Academic Press Professional, Inc., San Diego (1988)
Barrett, R., Berry, M., Chan, T.F., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., der Vorst, H.V.: Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd edn. SIAM, Philadelphia (1994)
Zhuo, L., Morris, G.R., Prasanna, V.K.: High-performance reduction circuits using deeply pipelined operators on FPGAs. IEEE Trans. Parallel Distrib. Syst. 18(10), 1377–1392 (2007)
Lopes, A., Constantinides, G., Kerrigan, E.C.: A floating-point solver for band structured linear equations. In: Proc. Int. Conf. Field Programmable Technology, pp. 353–356 (2008)
Xilinx, Virtex-5 FPGA User Guide
Ilog, Inc., Solver cplex (2009), http://www.ilog.fr/products/cplex/ (accessed November 2, 2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Boland, D., Constantinides, G.A. (2010). Optimising Memory Bandwidth Use for Matrix-Vector Multiplication in Iterative Methods. In: Sirisuk, P., Morgan, F., El-Ghazawi, T., Amano, H. (eds) Reconfigurable Computing: Architectures, Tools and Applications. ARC 2010. Lecture Notes in Computer Science, vol 5992. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12133-3_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-12133-3_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12132-6
Online ISBN: 978-3-642-12133-3
eBook Packages: Computer ScienceComputer Science (R0)