skip to main content
10.1145/3177540.3177561acmconferencesArticle/Chapter ViewAbstractPublication PagesispdConference Proceedingsconference-collections
research-article

Flexibility: FPGAs and CAD in Deep Learning Acceleration

Published: 25 March 2018 Publication History

Abstract

Deep learning inference has become the key workload to accelerate in our AI-powered world. FPGAs are an ideal platform for the acceleration of deep learning inference by combining low-latency performance, power-efficiency, and flexibility. This paper examines the flexibility aspect, and its impact on FPGA design methodology, physical design tools and CAD. We describe the degrees of flexibility required for creating efficient deep learning accelerators. We quantify the varying effects of precision, vectorization, and buffering on both performance and accuracy, and show how the FPGA can yield superior performance through architecture customization tuned for a specific neural network. We describe the need for abstraction and propose solutions in modern FPGA design flows to enable the rapid creation of these customized accelerator architectures for deep learning inference acceleration. Finally, we examine the implications on physical design tools and CAD.

References

[1]
Mohamed S Abdelfattah, Andrew Bitar, and Vaughn Betz. 2015. Take the highway: Design for embedded NoCs on FPGAs. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 98--107.
[2]
Utku Aydonat, Shane O'Connell, Davor Capalija, Andrew C. Ling, and Gordon R. Chiu. 2017. An OpenCL? Deep Learning Accelerator on Arria 10. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '17). ACM, New York, NY, USA, 55--64.
[3]
Andrew Bitar, Mohamed S Abdelfattah, and Vaughn Betz. 2015. Bringing programmability to the data plane: Packet processing with a NoC-enhanced FPGA. In Field Programmable Technology (FPT), 2015 International Conference on. IEEE, 24--31.
[4]
Diane M. Bryant. 2016. Keynote at Intel Developer's Forum 2016, San Francisco. (August 2016). https://newsroom.intel.com/chip-shots/2016-idf-keynotes-innovation-drives-technology-future-artificial-intelligence/
[5]
D. Capalija and T. S. Abdelrahman. 2013. A high-performance overlay architecture for pipelined execution of data flow graphs. In 2013 23rd International Conference on Field programmable Logic and Applications. 1--8.
[6]
D. Capalija and T. S. Abdelrahman. 2014. Tile-based bottom-up compilation of custom mesh-of-functional-units FPGA overlays. In 2014 24th International Conference on Field Programmable Logic and Applications (FPL). 1--8.
[7]
Eric et. al Chung. 2017. Accelerating persistent neural networks at datacenter scale. HotChips.
[8]
NVidia Corporation. 2017. NVidia TensorRT. (2017).
[9]
C. Szegedy et al. 2015. Going deeper with convolutions. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1--9.
[10]
Olga Russakovsky et al. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211--252.
[11]
T. S. Czajkowski et al. 2012. From opencl to high-performance hardware on FPGAS. In 22nd International Conference on Field Programmable Logic and Applications (FPL). 531--534.
[12]
Yao et. al Fu. 2016. Deep Learning with INT8 Optimization on Xilinx Devices. white paper of Xilinx (2016).
[13]
K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770--778.
[14]
Forrest N et al. Iandola. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size. arXiv preprint arXiv:1602.07360 (2016).
[15]
Jacopo Panerati, Donatella Sciuto, and Giovanni Beltrame. 2017. Handbook of Hardware/Software Codesign: Optimization Strategies in Design Space Exploration. Springer, Netherlands.
[16]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[17]
Xuechao et al. Wei. 2017. Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs. In Proceedings of the 54th Annual Design Automation Conference 2017 (DAC '17). ACM, New York, NY, USA, Article 29, 6 pages.
[18]
Yonghui et al. Wu. 2016. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016).
[19]
Xiaodong Xu, Qi Xu, Jinglei Huang, and Song Chen. 2017. An Integrated Optimization Framework for Partitioning, Scheduling and Floorplanning on Partially Dynamically Reconfigurable FPGAs. In Proceedings of the on Great Lakes Symposium on VLSI 2017 (GLSVLSI '17). ACM, New York, NY, USA, 403--406.
[20]
Jialiang Zhang and Jing Li. 2017. Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '17). ACM, New York, NY, USA, 25--34.

Cited By

View all
  • (2023)Review of Energy-Efficient Embedded System Acceleration of Convolution Neural Networks for Organic Weeding RobotsAgriculture10.3390/agriculture1311210313:11(2103)Online publication date: 6-Nov-2023
  • (2023)Bucket Getter: A Bucket-based Processing Engine for Low-bit Block Floating Point (BFP) DNNsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614249(1002-1015)Online publication date: 28-Oct-2023
  • (2023)P4SGD: Programmable Switch Enhanced Model-Parallel Training on Generalized Linear Models on Distributed FPGAsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.327925534:8(2311-2324)Online publication date: Aug-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISPD '18: Proceedings of the 2018 International Symposium on Physical Design
March 2018
178 pages
ISBN:9781450356268
DOI:10.1145/3177540
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 March 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. FPGAs
  2. deep learning
  3. high-level design
  4. physical design

Qualifiers

  • Research-article

Conference

ISPD '18
Sponsor:
ISPD '18: International Symposium on Physical Design
March 25 - 28, 2018
California, Monterey, USA

Acceptance Rates

Overall Acceptance Rate 62 of 172 submissions, 36%

Upcoming Conference

ISPD '25
International Symposium on Physical Design
March 16 - 19, 2025
Austin , TX , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)3
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Review of Energy-Efficient Embedded System Acceleration of Convolution Neural Networks for Organic Weeding RobotsAgriculture10.3390/agriculture1311210313:11(2103)Online publication date: 6-Nov-2023
  • (2023)Bucket Getter: A Bucket-based Processing Engine for Low-bit Block Floating Point (BFP) DNNsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614249(1002-1015)Online publication date: 28-Oct-2023
  • (2023)P4SGD: Programmable Switch Enhanced Model-Parallel Training on Generalized Linear Models on Distributed FPGAsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.327925534:8(2311-2324)Online publication date: Aug-2023
  • (2021)Low Precision Networks for Efficient Inference on FPGAs2021 International Conference on Field-Programmable Technology (ICFPT)10.1109/ICFPT52863.2021.9609837(1-5)Online publication date: 6-Dec-2021
  • (2020)Sparse Persistent GEMM Accelerator using OpenCL for Intel FPGAs2020 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS45731.2020.9181281(1-6)Online publication date: Oct-2020
  • (2020)High Density 8-Bit Multiplier Systolic Arrays For Fpga2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM48280.2020.00021(84-92)Online publication date: May-2020
  • (2019)Accelerating generalized linear models with MLWeavingProceedings of the VLDB Endowment10.14778/3317315.331732212:7(807-821)Online publication date: 1-Mar-2019
  • (2018)Harnessing Numerical Flexibility for Deep Learning on FPGAsProceedings of the 9th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies10.1145/3241793.3241794(1-3)Online publication date: 20-Jun-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media