skip to main content
10.1145/3289602.3293912acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
research-article

Math Doesn't Have to be Hard: Logic Block Architectures to Enhance Low-Precision Multiply-Accumulate on FPGAs

Published: 20 February 2019 Publication History

Abstract

Recent work has shown that using low-precision arithmetic in Deep Neural Network (DNN) inference acceleration can yield large efficiency gains with little or no accuracy degradation compared to half or single precision floating-point by enabling more MAC operations per unit area. The most efficient precision is a complex function of the DNN application, structure and required accuracy, which makes the variable precision capabilities of FPGAs very valuable. We propose three logic block architecture enhancements to increase the density and reduce the delay of multiply-accumulate (MAC) operations implemented in the soft fabric. Adding another level of carry chain to the ALM (extra carry chain architecture) leads to a 1.5x increase in MAC density, while ensuring a small impact on general designs as it adds only 2.6% FPGA tile area and a representative critical path delay increase of 0.8%. On the other hand, our highest impact option, which combines our 4-bit Adder architecture with a 9-bit Shadow Multiplier, increases MAC density by 6.1x, at the cost of larger tile area and representative critical path delay overheads of 16.7% and 9.8%, respectively.

References

[1]
E. Ahmed and J. Rose. 2004. The effect of LUT and cluster size on deep-submicron FPGA performance and density. TVLSI, Vol. 12, 3, 288--298.
[2]
C. Baugh and B. Wooley. 1973. A two's complement parallel array multiplication algorithm. TC, Vol. 100, 12.
[3]
A. Boutros et almbox. 2018a. Embracing diversity: Enhanced DSP blocks for low-precision deep learning on FPGAs. In FPL . 1--8.
[4]
A. Boutros et almbox. 2018b. You cannot improve what you do not measure: FPGA vs. ASIC efficiency gaps for convolutional neural network inference. TRETS, Vol. 11, 3.
[5]
M. Burich. 2012. Conference workshop: FPGAs in 2032, challenges and opportunities in the next 20 years, convergence of programmable solutions . In FPGA .
[6]
S. Chandrakar et almbox. 2015. Enhancements in UltraScale CLB architecture. In ISFPGA. 108--116.
[7]
C. Chiasson and V. Betz. 2013. COFFE: Fully-automated transistor sizing for FPGAs. In FPT. 34--41.
[8]
J. Fowers et almbox. 2018. A configurable cloud-scale DNN processor for real-time AI . ISCA, 1--14.
[9]
I. Goodfellow et almbox. 2016. Deep learning. Vol. 1. MIT press Cambridge.
[10]
Intel Corporation. 2017. Intel Stratix 10 logic array blocks and adaptive logic modules user guide (UG-S10LAB) .
[11]
P. Jamieson and J. Rose. 2006. Enhancing the area-efficiency of FPGAs with hard circuits using shadow clusters. In FPT . 1--8.
[12]
A. Krizhevsky et almbox. 2012. ImageNet classification with deep convolutional neural networks. In NIPS . 1097--1105.
[13]
I. Kuon and J. Rose. 2011. Exploring area and delay tradeoffs in FPGAs with architecture and automated transistor design. TVLSI, Vol. 19, 1, 71--84.
[14]
M. Langhammer and B. Pasca. 2015. Floating-point DSP block architecture for FPGAs. In ISFPGA. ACM, 117--125.
[15]
D. Lewis et almbox. 2005. The Stratix II logic and routing architecture . In ISFPGA. 14--20.
[16]
D. Lewis et almbox. 2016. The Stratix 10 highly pipelined FPGA architecture. In ISFPGA. ACM, 159--168.
[17]
J. Luu et almbox. 2014. On hard adders and carry chains in FPGAs. In FCCM. 52--59.
[18]
A. Mishra et almbox. 2017. WRPN: wide reduced-precision networks. arXiv preprint arXiv:1709.01134 .
[19]
J. Rose et almbox. 1993. Architecture of field-programmable gate arrays. Proc. IEEE, Vol. 81, 7, 1013--1029.
[20]
V. Rybalkin et almbox. 2018. FINN-L: Library extensions and design trade-off analysis for variable precision LSTM networks on FPGAs. In FPL. 1--8.
[21]
V. Sze et almbox. 2017. Efficient processing of deep neural networks: A tutorial and survey. Proc. IEEE, Vol. 105, 12, 2295--2329.
[22]
S. M. Trimberger. 2015. Three ages of FPGAs: A retrospective on the first thirty years of FPGA technology. Proc. IEEE, Vol. 103, 3, 318--331.
[23]
Xilinx Inc. 2017. Virtex UltraScale
[24]
HBM FPGA: A revolutionary increase in memory performance.
[25]
S. Yazdanshenas and V. Betz. 2017. Automatic circuit design and modelling for heterogeneous FPGAs. In ICFPT . 9--16.

Cited By

View all
  • (2024)A Stacked FPGA utilizing 3D-SRAM with Latency Optimization2024 IEEE 17th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)10.1109/MCSoC64144.2024.00072(400-406)Online publication date: 16-Dec-2024
  • (2024)Kratos: An FPGA Benchmark for Unrolled DNNs with Fine-Grained Sparsity and Mixed Precision2024 34th International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL64840.2024.00030(156-163)Online publication date: 2-Sep-2024
  • (2024)Efficient Approaches for GEMM Acceleration on Leading AI-Optimized FPGAs2024 IEEE 32nd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM60383.2024.00015(54-65)Online publication date: 5-May-2024
  • Show More Cited By

Index Terms

  1. Math Doesn't Have to be Hard: Logic Block Architectures to Enhance Low-Precision Multiply-Accumulate on FPGAs

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
      February 2019
      360 pages
      ISBN:9781450361378
      DOI:10.1145/3289602
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 20 February 2019

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. deep learning
      2. logic block architecture
      3. low-precision
      4. soft multipliers

      Qualifiers

      • Research-article

      Conference

      FPGA '19
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 125 of 627 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)51
      • Downloads (Last 6 weeks)4
      Reflects downloads up to 25 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)A Stacked FPGA utilizing 3D-SRAM with Latency Optimization2024 IEEE 17th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)10.1109/MCSoC64144.2024.00072(400-406)Online publication date: 16-Dec-2024
      • (2024)Kratos: An FPGA Benchmark for Unrolled DNNs with Fine-Grained Sparsity and Mixed Precision2024 34th International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL64840.2024.00030(156-163)Online publication date: 2-Sep-2024
      • (2024)Efficient Approaches for GEMM Acceleration on Leading AI-Optimized FPGAs2024 IEEE 32nd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM60383.2024.00015(54-65)Online publication date: 5-May-2024
      • (2024)A Configurable Accelerator for CNN‐Based Remote Sensing Object Detection on FPGAsIET Computers & Digital Techniques10.1049/2024/44153422024:1Online publication date: 20-Jun-2024
      • (2023)Convolutional Neural Network Models and Optimization Design for Edge ComputationHighlights in Science, Engineering and Technology10.54097/hset.v62i.1042162(36-41)Online publication date: 27-Jul-2023
      • (2023)Experimental analysis of the symmetry of approximate adder designs in FPGA and ASIC2023 XIII Brazilian Symposium on Computing Systems Engineering (SBESC)10.1109/SBESC60926.2023.10324275(1-6)Online publication date: 21-Nov-2023
      • (2022)FPGA Architecture Exploration for DNN AccelerationACM Transactions on Reconfigurable Technology and Systems10.1145/350346515:3(1-37)Online publication date: 10-May-2022
      • (2022)Multi-Precision Deep Neural Network Acceleration on FPGAs2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)10.1109/ASP-DAC52403.2022.9712485(454-459)Online publication date: 17-Jan-2022
      • (2021)Rethinking Embedded Blocks for Machine Learning ApplicationsACM Transactions on Reconfigurable Technology and Systems10.1145/349123415:1(1-30)Online publication date: 30-Nov-2021
      • (2020)LUXORProceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3373087.3375303(161-171)Online publication date: 23-Feb-2020
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media