research-article

Math Doesn't Have to be Hard: Logic Block Architectures to Enhance Low-Precision Multiply-Accumulate on FPGAs

Authors:

Andrew Boutros,

Mohamed Eldafrawy,

Sadegh Yazdanshenas,

Vaughn BetzAuthors Info & Claims

FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Pages 94 - 103

https://doi.org/10.1145/3289602.3293912

Published: 20 February 2019 Publication History

Abstract

Recent work has shown that using low-precision arithmetic in Deep Neural Network (DNN) inference acceleration can yield large efficiency gains with little or no accuracy degradation compared to half or single precision floating-point by enabling more MAC operations per unit area. The most efficient precision is a complex function of the DNN application, structure and required accuracy, which makes the variable precision capabilities of FPGAs very valuable. We propose three logic block architecture enhancements to increase the density and reduce the delay of multiply-accumulate (MAC) operations implemented in the soft fabric. Adding another level of carry chain to the ALM (extra carry chain architecture) leads to a 1.5x increase in MAC density, while ensuring a small impact on general designs as it adds only 2.6% FPGA tile area and a representative critical path delay increase of 0.8%. On the other hand, our highest impact option, which combines our 4-bit Adder architecture with a 9-bit Shadow Multiplier, increases MAC density by 6.1x, at the cost of larger tile area and representative critical path delay overheads of 16.7% and 9.8%, respectively.

References

[1]

E. Ahmed and J. Rose. 2004. The effect of LUT and cluster size on deep-submicron FPGA performance and density. TVLSI, Vol. 12, 3, 288--298.

Digital Library

[2]

C. Baugh and B. Wooley. 1973. A two's complement parallel array multiplication algorithm. TC, Vol. 100, 12.

Digital Library

[3]

A. Boutros et almbox. 2018a. Embracing diversity: Enhanced DSP blocks for low-precision deep learning on FPGAs. In FPL . 1--8.

[4]

A. Boutros et almbox. 2018b. You cannot improve what you do not measure: FPGA vs. ASIC efficiency gaps for convolutional neural network inference. TRETS, Vol. 11, 3.

Digital Library

[5]

M. Burich. 2012. Conference workshop: FPGAs in 2032, challenges and opportunities in the next 20 years, convergence of programmable solutions . In FPGA .

[6]

S. Chandrakar et almbox. 2015. Enhancements in UltraScale CLB architecture. In ISFPGA. 108--116.

Digital Library

[7]

C. Chiasson and V. Betz. 2013. COFFE: Fully-automated transistor sizing for FPGAs. In FPT. 34--41.

[8]

J. Fowers et almbox. 2018. A configurable cloud-scale DNN processor for real-time AI . ISCA, 1--14.

Digital Library

[9]

I. Goodfellow et almbox. 2016. Deep learning. Vol. 1. MIT press Cambridge.

Digital Library

[10]

Intel Corporation. 2017. Intel Stratix 10 logic array blocks and adaptive logic modules user guide (UG-S10LAB) .

[11]

P. Jamieson and J. Rose. 2006. Enhancing the area-efficiency of FPGAs with hard circuits using shadow clusters. In FPT . 1--8.

[12]

A. Krizhevsky et almbox. 2012. ImageNet classification with deep convolutional neural networks. In NIPS . 1097--1105.

Digital Library

[13]

I. Kuon and J. Rose. 2011. Exploring area and delay tradeoffs in FPGAs with architecture and automated transistor design. TVLSI, Vol. 19, 1, 71--84.

Digital Library

[14]

M. Langhammer and B. Pasca. 2015. Floating-point DSP block architecture for FPGAs. In ISFPGA. ACM, 117--125.

Digital Library

[15]

D. Lewis et almbox. 2005. The Stratix II logic and routing architecture . In ISFPGA. 14--20.

Digital Library

[16]

D. Lewis et almbox. 2016. The Stratix 10 highly pipelined FPGA architecture. In ISFPGA. ACM, 159--168.

Digital Library

[17]

J. Luu et almbox. 2014. On hard adders and carry chains in FPGAs. In FCCM. 52--59.

Digital Library

[18]

A. Mishra et almbox. 2017. WRPN: wide reduced-precision networks. arXiv preprint arXiv:1709.01134 .

[19]

J. Rose et almbox. 1993. Architecture of field-programmable gate arrays. Proc. IEEE, Vol. 81, 7, 1013--1029.

[20]

V. Rybalkin et almbox. 2018. FINN-L: Library extensions and design trade-off analysis for variable precision LSTM networks on FPGAs. In FPL. 1--8.

[21]

V. Sze et almbox. 2017. Efficient processing of deep neural networks: A tutorial and survey. Proc. IEEE, Vol. 105, 12, 2295--2329.

[22]

S. M. Trimberger. 2015. Three ages of FPGAs: A retrospective on the first thirty years of FPGA technology. Proc. IEEE, Vol. 103, 3, 318--331.

[23]

Xilinx Inc. 2017. Virtex UltraScale

[24]

HBM FPGA: A revolutionary increase in memory performance.

[25]

S. Yazdanshenas and V. Betz. 2017. Automatic circuit design and modelling for heterogeneous FPGAs. In ICFPT . 9--16.

Cited By

Takahashi RAndo KNakahara H(2024)A Stacked FPGA utilizing 3D-SRAM with Latency Optimization2024 IEEE 17th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)10.1109/MCSoC64144.2024.00072(400-406)Online publication date: 16-Dec-2024
https://doi.org/10.1109/MCSoC64144.2024.00072
Dai XChen YAbdelfattah M(2024)Kratos: An FPGA Benchmark for Unrolled DNNs with Fine-Grained Sparsity and Mixed Precision2024 34th International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL64840.2024.00030(156-163)Online publication date: 2-Sep-2024
https://doi.org/10.1109/FPL64840.2024.00030
Taka EGourounas DGerstlauer AMarculescu DArora A(2024)Efficient Approaches for GEMM Acceleration on Leading AI-Optimized FPGAs2024 IEEE 32nd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM60383.2024.00015(54-65)Online publication date: 5-May-2024
https://doi.org/10.1109/FCCM60383.2024.00015
Show More Cited By

Index Terms

Math Doesn't Have to be Hard: Logic Block Architectures to Enhance Low-Precision Multiply-Accumulate on FPGAs
1. Hardware
  1. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators
      2. Reconfigurable logic applications

Recommendations

Low-Cost Multiple-Precision Multiplication Unit Design For Deep Learning
GLSVLSI '23: Proceedings of the Great Lakes Symposium on VLSI 2023

Low-precision formats have been proposed and applied to deep learning algorithms to speed up training and inference. This paper proposes a novel multiple-precision multiplication unit(MU) for deep learning. The proposed MU supports four types of ...
BARVINN: Arbitrary Precision DNN Accelerator Controlled by a RISC-V CPU
ASPDAC '23: Proceedings of the 28th Asia and South Pacific Design Automation Conference

We present a DNN accelerator that allows inference at arbitrary precision with dedicated processing elements that are configurable at the bit level. Our DNN accelerator has 8 Processing Elements controlled by a RISC-V controller with a combined 8.2 TMACs ...
Calibrating process variation at system level with in-situ low-precision transfer learning for analog neural network processors
DAC '18: Proceedings of the 55th Annual Design Automation Conference

Process Variation (PV) may cause accuracy loss of the analog neural network (ANN) processors, and make it hard to be scaled down, as well as feasibility degrading. This paper first analyses the impact of PV on the performance of ANN chips. Then proposes ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

February 2019

360 pages

ISBN:9781450361378

DOI:10.1145/3289602

General Chair:
Kia Bazargan
Univ. of Minnesota, USA
,
Program Chair:
Stephen Neuendorffer
Xilinx, USA

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 February 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

FPGA '19

Sponsor:

SIGDA

FPGA '19: The 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

February 24 - 26, 2019

CA, Seaside, USA

Acceptance Rates

Overall Acceptance Rate 125 of 627 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
658
Total Downloads

Downloads (Last 12 months)51
Downloads (Last 6 weeks)4

Reflects downloads up to 25 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Takahashi RAndo KNakahara H(2024)A Stacked FPGA utilizing 3D-SRAM with Latency Optimization2024 IEEE 17th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)10.1109/MCSoC64144.2024.00072(400-406)Online publication date: 16-Dec-2024
https://doi.org/10.1109/MCSoC64144.2024.00072
Dai XChen YAbdelfattah M(2024)Kratos: An FPGA Benchmark for Unrolled DNNs with Fine-Grained Sparsity and Mixed Precision2024 34th International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL64840.2024.00030(156-163)Online publication date: 2-Sep-2024
https://doi.org/10.1109/FPL64840.2024.00030
Taka EGourounas DGerstlauer AMarculescu DArora A(2024)Efficient Approaches for GEMM Acceleration on Leading AI-Optimized FPGAs2024 IEEE 32nd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM60383.2024.00015(54-65)Online publication date: 5-May-2024
https://doi.org/10.1109/FCCM60383.2024.00015
Shao YShang JLi YDing YZhang MRen KLiu Y(2024)A Configurable Accelerator for CNN‐Based Remote Sensing Object Detection on FPGAsIET Computers & Digital Techniques10.1049/2024/44153422024:1Online publication date: 20-Jun-2024
https://doi.org/10.1049/2024/4415342
Liu B(2023)Convolutional Neural Network Models and Optimization Design for Edge ComputationHighlights in Science, Engineering and Technology10.54097/hset.v62i.1042162(36-41)Online publication date: 27-Jul-2023
https://doi.org/10.54097/hset.v62i.10421
Almeida TFelzmann IWanner L(2023)Experimental analysis of the symmetry of approximate adder designs in FPGA and ASIC2023 XIII Brazilian Symposium on Computing Systems Engineering (SBESC)10.1109/SBESC60926.2023.10324275(1-6)Online publication date: 21-Nov-2023
https://doi.org/10.1109/SBESC60926.2023.10324275
Roorda ERasoulinezhad SLeong PWilton S(2022)FPGA Architecture Exploration for DNN AccelerationACM Transactions on Reconfigurable Technology and Systems10.1145/350346515:3(1-37)Online publication date: 10-May-2022
https://dl.acm.org/doi/10.1145/3503465
Neda NUllah SGhanbari AMahdiani HModarressi MKumar A(2022)Multi-Precision Deep Neural Network Acceleration on FPGAs2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)10.1109/ASP-DAC52403.2022.9712485(454-459)Online publication date: 17-Jan-2022
https://dl.acm.org/doi/10.1109/ASP-DAC52403.2022.9712485
Rasoulinezhad SRoorda EWilton SLeong PBoland D(2021)Rethinking Embedded Blocks for Machine Learning ApplicationsACM Transactions on Reconfigurable Technology and Systems10.1145/349123415:1(1-30)Online publication date: 30-Nov-2021
https://dl.acm.org/doi/10.1145/3491234
Rasoulinezhad SSiddhartha Zhou HWang LBoland DLeong PNeuendorffer SShannon L(2020)LUXORProceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3373087.3375303(161-171)Online publication date: 23-Feb-2020
https://dl.acm.org/doi/10.1145/3373087.3375303
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten