ABSTRACT
Recently, the manufactures of supercomputers have made use of FPGAs to accelerate scientific applications [16][17]. Traditionally, the FPGAs were used only on non-scientific applications. The main reasons for this fact are: the floating-point computation complexity; the FPGA logic cells are not sufficient for the scientific cores implementation; the cores complexity prevents them to operate on high frequencies.
Nowadays, the increase of specialized blocks availability in complex operations, as sum and multiplier blocks, implemented directly in FPGA and, the increase of internal RAM blocks (BRAMs) have made possible high performance systems that use FPGA as a processing element for scientific computation [2].
These devices are used as co-processors that execute intensive computation. The emphasis of these architectures is the exploration of parallelism present on scientific computation operations and data reuse.
In major of these applications, the scientific computation uses, in general, operations of big floating-point dense matrices, which are normally operated by MACs.
In this work, we describe the architecture of an accumulative multiplier (MAC) in double precision floating-point, according to IEEE-754 standard and we propose the architecture of a multiplier of matrices that uses developed instances of the MACs and explores the reuse of data through the use of the BRAMs (Blocks of RAM internal to the FPGAs) of a Xilinx Virtex 4 LX200 FPGA. The synthesis results showed that the implemented MAC could reach a performance of 4GFLOPs.
- Ling Zhuo, Viktor K. Prasanna, Scalable and Modular Algorithms for Floating-Point Matrix Multiplication on Reconfigurable Computing Systems, IEEE Transactions on Parallel and Distributed Systems (TPDS), Vol. 18, No. 4, pp. 433--448, April 2007. Google ScholarDigital Library
- K.D. Underwood and K.S. Hemmert. Closing the Gap: CPU and FPGA Trends in Sustainable Floating-Point BLAS Performance. In Proc. Of 2004 IEEE Symposium on Field Programmable Custom Computing Machines, California, USA, April 2004. Google ScholarDigital Library
- A. Chtchelkanova, J. Gunnels, G. Morrow, J. Overfelt and K. D. Underwood and K. S. Hemmert. Closing the Gap: CPU and FPGA Trends in Sustainable Floating-Point BLAS Performance. In Proc. of 2004 IEEE Symposium on Field-Programmable Custom Computing Machines, California,USA, April 2004. Google ScholarDigital Library
- L. Zhuo and V. K. Prasanna. Scalable and Modular Algorithms for Floating-Point matrix Multiplication on FPGAs. In Proc. of the 18th International Parallel & Distributed Processing Simposium, New Mexico, USA, April 2004.Google Scholar
- L. Zhuo and V. K. Prasanna. Scalable Hybrid Designs for Linear Algebra on Reconfigurable Computing Systems, submitted to IEEE Transactions on Computers. Google ScholarDigital Library
- R. Scrofano and V. K. Prasanna. Computing Lennard-Jones Potentials and Forces with Reconfigurable Hardware. In Proc. Int'l Conf. Eng. of Reconfigurable Systems and Algorithms (ERSA'04), pages 284--290, June 2004.Google Scholar
- K. D. Underwood and K. S. Hemmert. Closing the Gap:CPU and FPGA Trends in Sustainable Floating-Point BLAS Performance. In Proc. IEEE Symp. Field-Programmable Custom Computing Machines (FCCM'04), April 2004. Google ScholarDigital Library
- L. Zhuo and V. K. Prasanna. Scalable and Modular Algorithms for Floating-Point Matrix Multiplication on FPGAs. In Proc. 18th Int'l Parallel & Distributed Processing Symp. (IPDPS'04), New Mexico, USA, April 2004.Google Scholar
- IEEE Standard for Binary Floating-Point Arithmetic 1985.Google Scholar
- C. Babb, J. Blank, I. Castellanos, J. Moskal. Floating Point Multiplier Final Project ECE 587Google Scholar
- Per Karlström, Andreas Ehliar, Dake Liu. High Performace, Low Latency FPGA based Floating Point Adder and Multiplier Units in a Virtex 4Google Scholar
- E. Mark. Free Floating-Point MadnessGoogle Scholar
- Youg Dou, S. Vassiliadis, G. K. Kuzmanov, G. N. Gaydadjiev. 64-bit Floating-Point FPGA Matrix MultiplicationGoogle Scholar
- Mingw 5.1.3 - http://www.mingw.org/Google Scholar
- GSL - GNU Scientific LibraryGoogle Scholar
- Cray Inc. Cray XD1 FPGA Development. http://www.cray.com/.Google Scholar
- SGI Inc. http://www.sgi.com/>Google Scholar
- ModelSim. http://www.model.com/Google Scholar
Index Terms
- Implementation of a double-precision multiplier accumulator with exception treatment to a dense matrix multiplier module in FPGA
Recommendations
Area-efficient architectures for double precision multiplier on FPGA, with run-time-reconfigurable dual single precision support
Floating point arithmetic (FPA) is a crucial basic building block in many application domains such as scientific, numerical and signal processing applications. Multiplication is one of the most commonly used one in FPA. This paper presents three ...
Run-Time-Reconfigurable Multi-Precision Floating-Point Matrix Multiplier Intellectual Property Core on FPGA
In today's world, high-power computing applications such as image processing, digital signal processing, graphics, robotics require enormous computing power. These applications use matrix operations, especially matrix multiplication. Multiplication ...
Double Precision Hybrid-Mode Floating-Point FPGA CORDIC Co-processor
HPCC '08: Proceedings of the 2008 10th IEEE International Conference on High Performance Computing and CommunicationsFPGA chips have become a promising option for accelerating scientific applications, which involve many floating-point transcendental functions, such as sin, log, exp, sqrt and etc. In this paper, we present a 64-bit ANSI/IEEE floating-point CORDIC co-...
Comments