skip to main content
10.1145/2847263.2847281acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
research-article

Reducing Memory Requirements for High-Performance and Numerically Stable Gaussian Elimination

Published: 21 February 2016 Publication History

Abstract

Gaussian elimination is a well-known technique to compute the solution to a system of linear equations and boosting its performance is highly desirable. While straightforward parallel techniques are limited either by I/O or on-chip memory bandwidth, block-based algorithms offer the potential to bridge this gap by interleaving I/O with computation. However, these algorithms require the amount of on-chip memory to be at least the square of the number of processing elements available. Using the latest generation Altera FPGAs with hardened floating-point units, this is no longer the case. It follows that the amount of on-chip memory limits performance, a problem that is only likely to increase unless on-chip memory dominates FPGA architecture. In addition to this limitation, existing FPGA implementations of block-based Gaussian elimination either sacrifice numerical stability or efficiency. The former limits the usefulness of these implementations to a small class of matrices, the latter limits its performance.
This paper presents a high-performance and numerically stable method to perform Gaussian elimination on an FPGA. This modified algorithm makes use of a deep pipeline to store the matrix and ensures that the peak performance is once again limited by the number of floating-point units that can fit on the FPGA. When applied to large matrices, this technique can obtain a sustained performance of up to 256 GFLOPs on an Arria 10, beginning to tap into the full potential of these devices. This performance is comparable to the peak that could be achieved using a simple block-based algorithm, with the performance on a Stratix 10 predicted to be superior. This is in spite of the fact that the underlying algorithm for the implementation in this paper, Gaussian elimination with pairwise pivoting, is more complex and applicable to a wider range of practical problems.

References

[1]
M. Parker, "Technical White Paper: Understanding Peak Floating-Point Performance Claims," Altera, Tech. Rep., 06 2014.
[2]
W. Zhang, V. Betz, and J. Rose, "Portable and Scalable FPGA-based Acceleration of a Direct Linear System Solver," ACM Trans. Reconfigurable Technol. Syst., vol. 5, no. 1, pp. 6:1--6:26, 2012.
[3]
G. Wu, Y. Dou, J. Sun, and G. Peterson, "A High Performance and Memory Efficient LU Decomposer on FPGAs," IEEE Transactions on Computers, vol. 61, no. 3, pp. 366--378, 2012.
[4]
M. Kumar Jaiswal and N. Chandrachoodan, "FPGA-Based High-Performance and Scalable Block LU Decomposition Architecture," IEEE Transactions on Computers, vol. 61, no. 1, pp. 60--72, 2012.
[5]
N. Higham, "Gaussian elimination," Computational Statistics, vol. 3, pp. 230--238, 2011.
[6]
G. de Matos and H. Neto, "On Reconfigurable Architectures for Efficient Matrix Inversion," in Int. Conf. on Field Programmable Logic and Applications, 2006, pp. 1--6.
[7]
----, "Memory Optimized Architecture for Efficient Gauss-Jordan Matrix Inversion," in Southern Conference on Programmable Logic, 2007, pp. 33--38.
[8]
R. Duarte, H. Neto, and M. Vestias, "Double-precision Gauss-Jordan Algorithm with Partial Pivoting on FPGAs," in Euromicro Conference on Digital System Design, Architectures, Methods and Tools, 2009, pp. 273--280.
[9]
J. Arias-Garcia, R. Jacobi, C. Llanos, and M. Ayala-Rincon, "A suitable FPGA implementation of floating-point matrix inversion based on Gauss-Jordan elimination," in Southern Conference on Programmable Logic, 2011, pp. 263--268.
[10]
J. Arias-Garcia, C. Llanos, M. Ayala-Rincon, and R. Jacobi, "A fast and low cost architecture developed in FPGAs for solving systems of linear equations," in IEEE Third Latin American Symposium on Circuits and Systems, 2012, pp. 1--4.
[11]
G. Wu, Y. Dou, Y. Lei, J. Zhou, M. Wang, and J. Jiang, "A Fine-grained Pipelined Implementation of the LINPACK Benchmark on FPGAs," in Int. Symp. on Field Programmable Custom Computing Machines, 2009, pp. 183--190.
[12]
S. Donfack, J. Dongarra, M. Faverge, M. Gates, J. Kurzak, P. Luszczek, and I. Yamazaki, "On Algorithmic Variants of Parallel Gaussian Elimination: Comparison of Implementations in Terms of Performance and Numerical Properties," LAPACK Working Note, Tech. Rep. 280, 2013.
[13]
Y.-G. Tai, C.-T. Dan Lo, and K. Psarris, "Scalable Matrix Decompositions with Multiple Cores on FPGAs," Microprocess. Microsyst., vol. 37, no. 8, pp. 887--898, 2013.
[14]
A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, "A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures," Parallel Computing, vol. 35, no. 1, pp. 38--53, 2009.
[15]
J. H. Wilkinson, phThe Algebraic Eigenvalue Problem. Oxford University Press, 1965.
[16]
Y. Robert, phThe Impact of Vector and Parallel Architectures on the Gaussian Elimination Algorithm. New York, NY, USA: Halsted Press, 1990.
[17]
P. Grigoras, P. Burovskiy, E. Hung, and W. Luk, "Accelerating SpMV on FPGAs by Compressing Nonzero Values," in Int. Symp. on Field-Programmable Custom Computing Machines, 2015, pp. 64--67.

Cited By

View all
  • (2022)A Scalable Systolic Accelerator for Estimation of the Spectral Correlation Density Function and Its FPGA ImplementationACM Transactions on Reconfigurable Technology and Systems10.1145/354618116:1(1-24)Online publication date: 22-Dec-2022
  • (2022)MI2D: Accelerating Matrix Inversion with 2-Dimensional Tile ManipulationsProceedings of the Great Lakes Symposium on VLSI 202210.1145/3526241.3530314(423-429)Online publication date: 6-Jun-2022
  • (2018)Enhancing FPGAs with Magnetic Tunnel Junction-Based Block RAMsACM Transactions on Reconfigurable Technology and Systems10.1145/315442511:1(1-22)Online publication date: 26-Jan-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
FPGA '16: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
February 2016
298 pages
ISBN:9781450338561
DOI:10.1145/2847263
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 February 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. fpga
  2. gaussian elimination
  3. linear algebra
  4. pairwise pivoting

Qualifiers

  • Research-article

Conference

FPGA'16
Sponsor:

Acceptance Rates

FPGA '16 Paper Acceptance Rate 20 of 111 submissions, 18%;
Overall Acceptance Rate 125 of 627 submissions, 20%

Upcoming Conference

FPGA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)1
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)A Scalable Systolic Accelerator for Estimation of the Spectral Correlation Density Function and Its FPGA ImplementationACM Transactions on Reconfigurable Technology and Systems10.1145/354618116:1(1-24)Online publication date: 22-Dec-2022
  • (2022)MI2D: Accelerating Matrix Inversion with 2-Dimensional Tile ManipulationsProceedings of the Great Lakes Symposium on VLSI 202210.1145/3526241.3530314(423-429)Online publication date: 6-Jun-2022
  • (2018)Enhancing FPGAs with Magnetic Tunnel Junction-Based Block RAMsACM Transactions on Reconfigurable Technology and Systems10.1145/315442511:1(1-22)Online publication date: 26-Jan-2018
  • (2017)Reconfigurable ComputingWiley Encyclopedia of Electrical and Electronics Engineering10.1002/047134608X.W7603.pub3(1-17)Online publication date: 15-Feb-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media