research-article

Implementation of a double-precision multiplier accumulator with exception treatment to a dense matrix multiplier module in FPGA

Authors:
Abner Correa Barros

Federal Univerity of Pernambuco, Recife, Brazil

Federal Univerity of Pernambuco, Recife, Brazil
View Profile

,
Victor Wanderley Costa Medeiros

Federal Univerity of Pernambuco, Recife, Brazil

Federal Univerity of Pernambuco, Recife, Brazil
View Profile

,
Viviane Lucy Santos Souza

Federal Univerity of Pernambuco, Recife, Brazil

Federal Univerity of Pernambuco, Recife, Brazil
View Profile

,
Paulo Sérgio Brandão Nascimento

Federal Univerity of Pernambuco, Recife, Brazil

Federal Univerity of Pernambuco, Recife, Brazil
View Profile

,
Ângelo Mazer

Federal Univerity of Pernambuco, Recife, Brazil

Federal Univerity of Pernambuco, Recife, Brazil
View Profile

,
João Paulo Barbosa

Federal Univerity of Pernambuco, Recife, Brazil

Federal Univerity of Pernambuco, Recife, Brazil
View Profile

,
Bruno P. Neves

Federal Univerity of Pernambuco, Recife, Brazil

Federal Univerity of Pernambuco, Recife, Brazil
View Profile

,
Ismael Santos

CENPES, Rio de Janeiro, Brazil

CENPES, Rio de Janeiro, Brazil
View Profile

,
Manoel Eusebio de Lima

Federal Univerity of Pernambuco, Recife, Brazil

Federal Univerity of Pernambuco, Recife, Brazil
View Profile

SBCCI '08: Proceedings of the 21st annual symposium on Integrated circuits and system designSeptember 2008Pages 40–45https://doi.org/10.1145/1404371.1404392

Published:01 September 2008Publication History

SBCCI '08: Proceedings of the 21st annual symposium on Integrated circuits and system design

Pages 40–45

ABSTRACT

Recently, the manufactures of supercomputers have made use of FPGAs to accelerate scientific applications [16][17]. Traditionally, the FPGAs were used only on non-scientific applications. The main reasons for this fact are: the floating-point computation complexity; the FPGA logic cells are not sufficient for the scientific cores implementation; the cores complexity prevents them to operate on high frequencies.

Nowadays, the increase of specialized blocks availability in complex operations, as sum and multiplier blocks, implemented directly in FPGA and, the increase of internal RAM blocks (BRAMs) have made possible high performance systems that use FPGA as a processing element for scientific computation [2].

These devices are used as co-processors that execute intensive computation. The emphasis of these architectures is the exploration of parallelism present on scientific computation operations and data reuse.

In major of these applications, the scientific computation uses, in general, operations of big floating-point dense matrices, which are normally operated by MACs.

In this work, we describe the architecture of an accumulative multiplier (MAC) in double precision floating-point, according to IEEE-754 standard and we propose the architecture of a multiplier of matrices that uses developed instances of the MACs and explores the reuse of data through the use of the BRAMs (Blocks of RAM internal to the FPGAs) of a Xilinx Virtex 4 LX200 FPGA. The synthesis results showed that the implemented MAC could reach a performance of 4GFLOPs.

References

Ling Zhuo, Viktor K. Prasanna, Scalable and Modular Algorithms for Floating-Point Matrix Multiplication on Reconfigurable Computing Systems, IEEE Transactions on Parallel and Distributed Systems (TPDS), Vol. 18, No. 4, pp. 433--448, April 2007. Google ScholarDigital Library
K.D. Underwood and K.S. Hemmert. Closing the Gap: CPU and FPGA Trends in Sustainable Floating-Point BLAS Performance. In Proc. Of 2004 IEEE Symposium on Field Programmable Custom Computing Machines, California, USA, April 2004. Google ScholarDigital Library
A. Chtchelkanova, J. Gunnels, G. Morrow, J. Overfelt and K. D. Underwood and K. S. Hemmert. Closing the Gap: CPU and FPGA Trends in Sustainable Floating-Point BLAS Performance. In Proc. of 2004 IEEE Symposium on Field-Programmable Custom Computing Machines, California,USA, April 2004. Google ScholarDigital Library
L. Zhuo and V. K. Prasanna. Scalable and Modular Algorithms for Floating-Point matrix Multiplication on FPGAs. In Proc. of the 18th International Parallel & Distributed Processing Simposium, New Mexico, USA, April 2004.Google Scholar
L. Zhuo and V. K. Prasanna. Scalable Hybrid Designs for Linear Algebra on Reconfigurable Computing Systems, submitted to IEEE Transactions on Computers. Google ScholarDigital Library
R. Scrofano and V. K. Prasanna. Computing Lennard-Jones Potentials and Forces with Reconfigurable Hardware. In Proc. Int'l Conf. Eng. of Reconfigurable Systems and Algorithms (ERSA'04), pages 284--290, June 2004.Google Scholar
K. D. Underwood and K. S. Hemmert. Closing the Gap:CPU and FPGA Trends in Sustainable Floating-Point BLAS Performance. In Proc. IEEE Symp. Field-Programmable Custom Computing Machines (FCCM'04), April 2004. Google ScholarDigital Library
L. Zhuo and V. K. Prasanna. Scalable and Modular Algorithms for Floating-Point Matrix Multiplication on FPGAs. In Proc. 18th Int'l Parallel & Distributed Processing Symp. (IPDPS'04), New Mexico, USA, April 2004.Google Scholar
IEEE Standard for Binary Floating-Point Arithmetic 1985.Google Scholar
C. Babb, J. Blank, I. Castellanos, J. Moskal. Floating Point Multiplier Final Project ECE 587Google Scholar
Per Karlström, Andreas Ehliar, Dake Liu. High Performace, Low Latency FPGA based Floating Point Adder and Multiplier Units in a Virtex 4Google Scholar
E. Mark. Free Floating-Point MadnessGoogle Scholar
Youg Dou, S. Vassiliadis, G. K. Kuzmanov, G. N. Gaydadjiev. 64-bit Floating-Point FPGA Matrix MultiplicationGoogle Scholar
Mingw 5.1.3 - http://www.mingw.org/Google Scholar
GSL - GNU Scientific LibraryGoogle Scholar
Cray Inc. Cray XD1 FPGA Development. http://www.cray.com/.Google Scholar
SGI Inc. http://www.sgi.com/>Google Scholar
ModelSim. http://www.model.com/Google Scholar

Index Terms

Implementation of a double-precision multiplier accumulator with exception treatment to a dense matrix multiplier module in FPGA
1. Hardware

Recommendations

Area-efficient architectures for double precision multiplier on FPGA, with run-time-reconfigurable dual single precision support

Floating point arithmetic (FPA) is a crucial basic building block in many application domains such as scientific, numerical and signal processing applications. Multiplication is one of the most commonly used one in FPA. This paper presents three ...
Read More
Run-Time-Reconfigurable Multi-Precision Floating-Point Matrix Multiplier Intellectual Property Core on FPGA

In today's world, high-power computing applications such as image processing, digital signal processing, graphics, robotics require enormous computing power. These applications use matrix operations, especially matrix multiplication. Multiplication ...
Read More
Double Precision Hybrid-Mode Floating-Point FPGA CORDIC Co-processor
HPCC '08: Proceedings of the 2008 10th IEEE International Conference on High Performance Computing and Communications

FPGA chips have become a promising option for accelerating scientific applications, which involve many floating-point transcendental functions, such as sin, log, exp, sqrt and etc. In this paper, we present a 64-bit ANSI/IEEE floating-point CORDIC co-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SBCCI '08: Proceedings of the 21st annual symposium on Integrated circuits and system design
September 2008
256 pages
ISBN:9781605582313
DOI:10.1145/1404371
General Chair:
Marcelo Lubaszewski
UFRGS, Brazil
,
Program Chairs:
Michel Renovell
LIRMM, France
,
Rajesh Gupta
UCSD, USA
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 September 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
FPGA
HPC
floating-point
scientific computing
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate133of347submissions,38%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 238
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Implementation of a double-precision multiplier accumulator with exception treatment to a dense matrix multiplier module in FPGA

SBCCI '08: Proceedings of the 21st annual symposium on Integrated circuits and system design

ABSTRACT

References

Cited By

Index Terms

Recommendations

Area-efficient architectures for double precision multiplier on FPGA, with run-time-reconfigurable dual single precision support

Run-Time-Reconfigurable Multi-Precision Floating-Point Matrix Multiplier Intellectual Property Core on FPGA

Double Precision Hybrid-Mode Floating-Point FPGA CORDIC Co-processor