research-article

Flexibility: FPGAs and CAD in Deep Learning Acceleration

Authors:

Gordon R. Chiu,

Andrew C. Ling,

Davor Capalija,

Mohamed S. AbdelfattahAuthors Info & Claims

ISPD '18: Proceedings of the 2018 International Symposium on Physical Design

Pages 34 - 41

https://doi.org/10.1145/3177540.3177561

Published: 25 March 2018 Publication History

Abstract

Deep learning inference has become the key workload to accelerate in our AI-powered world. FPGAs are an ideal platform for the acceleration of deep learning inference by combining low-latency performance, power-efficiency, and flexibility. This paper examines the flexibility aspect, and its impact on FPGA design methodology, physical design tools and CAD. We describe the degrees of flexibility required for creating efficient deep learning accelerators. We quantify the varying effects of precision, vectorization, and buffering on both performance and accuracy, and show how the FPGA can yield superior performance through architecture customization tuned for a specific neural network. We describe the need for abstraction and propose solutions in modern FPGA design flows to enable the rapid creation of these customized accelerator architectures for deep learning inference acceleration. Finally, we examine the implications on physical design tools and CAD.

References

[1]

Mohamed S Abdelfattah, Andrew Bitar, and Vaughn Betz. 2015. Take the highway: Design for embedded NoCs on FPGAs. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 98--107.

Digital Library

[2]

Utku Aydonat, Shane O'Connell, Davor Capalija, Andrew C. Ling, and Gordon R. Chiu. 2017. An OpenCL? Deep Learning Accelerator on Arria 10. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '17). ACM, New York, NY, USA, 55--64.

Digital Library

[3]

Andrew Bitar, Mohamed S Abdelfattah, and Vaughn Betz. 2015. Bringing programmability to the data plane: Packet processing with a NoC-enhanced FPGA. In Field Programmable Technology (FPT), 2015 International Conference on. IEEE, 24--31.

[4]

Diane M. Bryant. 2016. Keynote at Intel Developer's Forum 2016, San Francisco. (August 2016). https://newsroom.intel.com/chip-shots/2016-idf-keynotes-innovation-drives-technology-future-artificial-intelligence/

[5]

D. Capalija and T. S. Abdelrahman. 2013. A high-performance overlay architecture for pipelined execution of data flow graphs. In 2013 23rd International Conference on Field programmable Logic and Applications. 1--8.

[6]

D. Capalija and T. S. Abdelrahman. 2014. Tile-based bottom-up compilation of custom mesh-of-functional-units FPGA overlays. In 2014 24th International Conference on Field Programmable Logic and Applications (FPL). 1--8.

[7]

Eric et. al Chung. 2017. Accelerating persistent neural networks at datacenter scale. HotChips.

[8]

NVidia Corporation. 2017. NVidia TensorRT. (2017).

[9]

C. Szegedy et al. 2015. Going deeper with convolutions. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1--9.

[10]

Olga Russakovsky et al. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211--252.

Digital Library

[11]

T. S. Czajkowski et al. 2012. From opencl to high-performance hardware on FPGAS. In 22nd International Conference on Field Programmable Logic and Applications (FPL). 531--534.

[12]

Yao et. al Fu. 2016. Deep Learning with INT8 Optimization on Xilinx Devices. white paper of Xilinx (2016).

[13]

K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770--778.

[14]

Forrest N et al. Iandola. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size. arXiv preprint arXiv:1602.07360 (2016).

[15]

Jacopo Panerati, Donatella Sciuto, and Giovanni Beltrame. 2017. Handbook of Hardware/Software Codesign: Optimization Strategies in Design Space Exploration. Springer, Netherlands.

Digital Library

[16]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[17]

Xuechao et al. Wei. 2017. Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs. In Proceedings of the 54th Annual Design Automation Conference 2017 (DAC '17). ACM, New York, NY, USA, Article 29, 6 pages.

Digital Library

[18]

Yonghui et al. Wu. 2016. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016).

[19]

Xiaodong Xu, Qi Xu, Jinglei Huang, and Song Chen. 2017. An Integrated Optimization Framework for Partitioning, Scheduling and Floorplanning on Partially Dynamically Reconfigurable FPGAs. In Proceedings of the on Great Lakes Symposium on VLSI 2017 (GLSVLSI '17). ACM, New York, NY, USA, 403--406.

Digital Library

[20]

Jialiang Zhang and Jing Li. 2017. Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '17). ACM, New York, NY, USA, 25--34.

Digital Library

Cited By

Czymmek VKöhn CHarders LHussmann S(2023)Review of Energy-Efficient Embedded System Acceleration of Convolution Neural Networks for Organic Weeding RobotsAgriculture10.3390/agriculture1311210313:11(2103)Online publication date: 6-Nov-2023
https://doi.org/10.3390/agriculture13112103
Lo YLiu R(2023)Bucket Getter: A Bucket-based Processing Engine for Low-bit Block Floating Point (BFP) DNNsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614249(1002-1015)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614249
Huang HLi YSun JZhu XZhang JLuo LLi JWang Z(2023)P4SGD: Programmable Switch Enhanced Model-Parallel Training on Generalized Linear Models on Distributed FPGAsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.327925534:8(2311-2324)Online publication date: Aug-2023
https://doi.org/10.1109/TPDS.2023.3279255
Show More Cited By

Recommendations

Harnessing Numerical Flexibility for Deep Learning on FPGAs
HEART '18: Proceedings of the 9th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies

Deep learning has become a key workload in the data centre and edge leading to an arms race for compute dominance in this space. FPGAs have shown they can compete by combining deterministic low-latency with high throughput and flexibility. In particular, ...
An integrated high-level hardware/software partitioning methodology

Embedded systems are widely used in many sophisticated applications. To speed the time-to-market cycle, the hardware and software co-design has become one of the main methodologies in modern embedded systems. The most important challenge in the embedded ...
Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Convolutional neural networks (CNN) are the current stateof-the-art for many computer vision tasks. CNNs outperform older methods in accuracy, but require vast amounts of computation and memory. As a result, existing CNN applications are typically run ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISPD '18: Proceedings of the 2018 International Symposium on Physical Design

March 2018

178 pages

ISBN:9781450356268

DOI:10.1145/3177540

General Chair:
Chris Chu
Iowa State University
,
Program Chair:
Ismail Bustany
Xilinx Inc.

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 March 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ISPD '18

Sponsor:

SIGDA

ISPD '18: International Symposium on Physical Design

March 25 - 28, 2018

California, Monterey, USA

Acceptance Rates

Overall Acceptance Rate 62 of 172 submissions, 36%

Upcoming Conference

ISPD '25

Sponsor:
sigda

International Symposium on Physical Design

March 16 - 19, 2025

Austin , TX , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
379
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)3

Reflects downloads up to 09 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Czymmek VKöhn CHarders LHussmann S(2023)Review of Energy-Efficient Embedded System Acceleration of Convolution Neural Networks for Organic Weeding RobotsAgriculture10.3390/agriculture1311210313:11(2103)Online publication date: 6-Nov-2023
https://doi.org/10.3390/agriculture13112103
Lo YLiu R(2023)Bucket Getter: A Bucket-based Processing Engine for Low-bit Block Floating Point (BFP) DNNsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614249(1002-1015)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614249
Huang HLi YSun JZhu XZhang JLuo LLi JWang Z(2023)P4SGD: Programmable Switch Enhanced Model-Parallel Training on Generalized Linear Models on Distributed FPGAsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.327925534:8(2311-2324)Online publication date: Aug-2023
https://doi.org/10.1109/TPDS.2023.3279255
Abra RDenisenko DAllen RVanderhoek TWolstencroft SGibson M(2021)Low Precision Networks for Efficient Inference on FPGAs2021 International Conference on Field-Programmable Technology (ICFPT)10.1109/ICFPT52863.2021.9609837(1-5)Online publication date: 6-Dec-2021
https://doi.org/10.1109/ICFPT52863.2021.9609837
Colangelo PSengupta SMargala M(2020)Sparse Persistent GEMM Accelerator using OpenCL for Intel FPGAs2020 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS45731.2020.9181281(1-6)Online publication date: Oct-2020
https://doi.org/10.1109/ISCAS45731.2020.9181281
Langhammer MGribok SBaeckler G(2020)High Density 8-Bit Multiplier Systolic Arrays For Fpga2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM48280.2020.00021(84-92)Online publication date: May-2020
https://doi.org/10.1109/FCCM48280.2020.00021
Wang ZKara KZhang HAlonso GMutlu OZhang C(2019)Accelerating generalized linear models with MLWeavingProceedings of the VLDB Endowment10.14778/3317315.331732212:7(807-821)Online publication date: 1-Mar-2019
https://dl.acm.org/doi/10.14778/3317315.3317322
Ling AAbdelfattah MO'Connell SBitar AHan DDicecco RSubhaschandra SJohnson CDenisenko DFender JChiu G(2018)Harnessing Numerical Flexibility for Deep Learning on FPGAsProceedings of the 9th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies10.1145/3241793.3241794(1-3)Online publication date: 20-Jun-2018
https://dl.acm.org/doi/10.1145/3241793.3241794

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents