skip to main content
research-article

xDNN: Inference for Deep Convolutional Neural Networks

Published: 11 January 2022 Publication History

Abstract

We present xDNN, an end-to-end system for deep-learning inference based on a family of specialized hardware processors synthesized on Field-Programmable Gate Array (FPGAs) and Convolution Neural Networks (CNN). We present a design optimized for low latency, high throughput, and high compute efficiency with no batching. The design is scalable and a parametric function of the number of multiply-accumulate units, on-chip memory hierarchy, and numerical precision. The design can produce a scale-down processor for embedded devices, replicated to produce more cores for larger devices, or resized to optimize efficiency. On Xilinx Virtex Ultrascale+ VU13P FPGA, we achieve 800 MHz that is close to the Digital Signal Processing maximum frequency and above 80% efficiency of on-chip compute resources.
On top of our processor family, we present a runtime system enabling the execution of different networks for different input sizes (i.e., from 224× 224 to 2048× 1024). We present a compiler that reads CNNs from native frameworks (i.e., MXNet, Caffe, Keras, and Tensorflow), optimizes them, generates codes, and provides performance estimates. The compiler combines quantization information from the native environment and optimizations to feed the runtime with code as efficient as any hardware expert could write. We present tools partitioning a CNN into subgraphs for the division of work to CPU cores and FPGAs. Notice that the software will not change when or if the FPGA design becomes an ASIC, making our work vertical and not just a proof-of-concept FPGA project.
We show experimental results for accuracy, latency, and power for several networks: In summary, we can achieve up to 4 times higher throughput, 3 times better power efficiency than the GPUs, and up to 20 times higher throughput than the latest CPUs. To our knowledge, we provide solutions faster than any previous FPGA-based solutions and comparable to any other top-of-the-shelves solutions.

References

[1]
[n.d.]. ML Commons, Inference Data Center. Retrieved May 10, 2021 from https://mlcommons.org/en/inference-datacenter-10/.
[2]
Mohamed S. Abdelfattah, David Han, Andrew Bitar, Roberto DiCecco, Shane O’Connell, Nitika Shanker, Joseph Chu, Ian Prins, Joshua Fender, Andrew C. Ling, and Gordon R. Chiu. 2018. DLA: Compiler and FPGA overlay for neural network inference acceleration. In Proceedings of the 28th International Conference on Field Programmable Logic and Applications (FPL’18). IEEE Computer Society, 411–418. https://doi.org/10.1109/FPL.2018.00077
[3]
Stephen Boyd and Lieven Vandenberghe. 2004. Convex Optimization. Cambridge University Press, New York, USA.
[4]
Yu-Hsin Chen, Joel S. Emer, and Vivienne Sze. 2016. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture (ISCA’16). 367–379. https://doi.org/10.1109/ISCA.2016.40
[5]
Javier Duarte and et al.2019. FPGAs as a service to accelerate machine learning inference. https://people.ece.uw.edu/hauck/publications/AcceleratedMachineLearning.pdf.
[6]
J. Duarte, S. Han, P. Harris, S. Jindariani, E. Kreinar, B. Kreis, J. Ngadiuba, M. Pierini, R. Rivera, N. Tran, and et al.2018. Fast inference of deep neural networks in FPGAs for particle physics. J. Instrum. 13, 07 (Jul. 2018), P07027–P07027. https://doi.org/10.1088/1748-0221/13/07/p07027
[7]
Vincent Dumoulin and Francesco Visin. 2016. A guide to convolution arithmetic for deep learning. arxiv:1603.07285. Retrieved from http://arxiv.org/abs/1603.07285.
[8]
Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Todd Massengill, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, Logan Adams, Mahdi Ghandi, Stephen Heil, Prerak Patel, Adam Sapek, Gabriel Weisz, Lisa Woods, Sitaram Lanka, Steven K. Reinhardt, Adrian M. Caulfield, Eric S. Chung, and Doug Burger. 2018. A configurable cloud-scale DNN processor for real-time AI. In Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture (ISCA’18). 1–14. https://doi.org/10.1109/ISCA.2018.00012
[9]
Yao Fu, Ephrem Wu, and Ashish Sirasao. 2015. 8-Bit Dot-Product Acceleration, Xilinx White Paper-WP487.
[10]
Vinayak Gokhale, Aliasger Zaidy, Andre Xian Ming Chang, and Eugenio Culurciello. 2017. Snowflake: An efficient hardware accelerator for convolutional neural networks. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS’17). 1–4. https://doi.org/10.1109/ISCAS.2017.8050809
[11]
K. Guo, L. Sui, J. Qiu, J. Yu, J. Wang, S. Yao, S. Han, Y. Wang, and H. Yang. 2018. Angel-Eye: A complete design flow for mapping CNN onto embedded FPGA. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 37, 1 (Jan. 2018), 35–47. https://doi.org/10.1109/TCAD.2017.2705069
[12]
Philipp Gysel, Mohammad Motamedi, and Soheil Ghiasi. 2016. Hardware-oriented approximation of convolutional neural networks. arXiv:1604.03168. Retrieved from http://arxiv.org/abs/1604.03168.
[13]
Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, Huazhong Yang, and William J. Dally. 2016. ESE: Efficient speech recognition engine with compressed LSTM on FPGA. arXiv:1612.00694. Retrieved from http://arxiv.org/abs/1612.00694.
[14]
Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient inference engine on compressed deep neural network. In Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture (ISCA’16). 243–254. https://doi.org/10.1109/ISCA.2016.30
[15]
Song Han, Huizi Mao, and William J. Dally. 2015. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. arXiv:1510.00149. Retrieved from http://arxiv.org/abs/1510.00149.
[16]
Song Han, Huizi Mao, and William J. Dally. 2015. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. CoRR abs/1510.00149 (2015).
[17]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. arXiv:1512.03385. Retrieved from http://arxiv.org/abs/1512.03385.
[18]
Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Brian Kingsbury, and Tara Sainath. 2012. Deep neural networks for acoustic modeling in speech recognition. IEEE Sign. Process. Mag. 29, 82–97.
[19]
Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Quantized neural networks: Training neural networks with low precision weights and activations. arXiv:1609.07061. Retrieved from http://arxiv.org/abs/1609.07061.
[20]
Sambhav R. Jain, Albert Gural, Michael Wu, and Chris Dick. 2019. Trained uniform quantization for accurate and efficient neural network inference on fixed-point hardware. arXiv:1903.08066. Retrieved from http://arxiv.org/abs/1903.08066.
[21]
Norman P. Jouppi, Cliff Young, Nishant Patil, David A. Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-Datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’17). 1–12. https://doi.org/10.1145/3079856.3080246
[22]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems–Volume 1 (NIPS’12). Curran Associates Inc., USA, 1097–1105.
[23]
H. T. Kung. 1982. Why systolic architectures?IEEE Comput. 15, 1 (1982), 37–46. https://doi.org/10.1109/MC.1982.1653825
[24]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng Yang Fu, and Alexander C. Berg. 2016. SSD: Single shot multibox detector. In Proceedings of the 14th European Conference on Computer Vision (ECCV’16) (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)), Bastian Leibe, Jiri Matas, Max Welling, and Nicu Sebe (Eds.). Springer Verlag, Germany, 21–37. https://doi.org/10.1007/978-3-319-46448-0_2
[25]
P. Quinton and Y. Robert. 1991. Systolic Algorithms & Architectures. Prentice Hall. gb90020190
[26]
Vijay Janapa Reddi, Christine Cheng, David Kanter, Peter Mattson, Guenther Schmuelling, Carole-Jean Wu, Brian Anderson, Maximilien Breughe, Mark Charlebois, William Chou, Ramesh Chukka, Cody Coleman, Sam Davis, Pan Deng, Greg Diamos, Jared Duke, Dave Fick, J. Scott Gardner, Itay Hubara, Sachin Idgunji, Thomas B. Jablin, Jeff Jiao, Tom St. John, Pankaj Kanwar, David Lee, Jeffery Liao, Anton Lokhmotov, Francisco Massa, Peng Meng, Paulius Micikevicius, Colin Osborne, Gennady Pekhimenko, Arun Tejusve Raghunath Rajan, Dilip Sequeira, Ashish Sirasao, Fei Sun, Hanlin Tang, Michael Thomson, Frank Wei, Ephrem Wu, Lingjie Xu, Koichi Yamada, Bing Yu, George Yuan, Aaron Zhong, Peizhao Zhang, and Yuchen Zhou. 2020. MLPerf inference benchmark. In Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture (ISCA’20). IEEE, 446–459. https://doi.org/10.1109/ISCA45697.2020.00045
[27]
Sean O. Settle, Manasa Bollavaram, Paolo D’Alberto, Elliott Delaye, Oscar Fernandez, Nicholas Fraser, Aaron Ng, Ashish Sirasao, and Michael Wu. 2018. Quantizing convolutional neural networks for low-power high-throughput inference engines. arXiv:1805.07941. Retrieved from http://arxiv.org/abs/1805.07941.
[28]
Sean O. Settle, Manasa Bollavaram, Paolo D’Alberto, Elliott Delaye, Oscar Fernandez, Nicholas Fraser, Aaron Ng, Ashish Sirasao, and Michael Wu. 2018. Quantizing convolutional neural networks for low-power high-throughput inference engines. arXiv:1805.07941. Retrieved from http://arxiv.org/abs/1805.07941.
[29]
Hardik Sharma, Jongse Park, Divya Mahajan, Emmanuel Amaro, Joon Kyung Kim, Chenkai Shao, Asit Mishra, and Hadi Esmaeilzadeh. 2016. From high-level deep neural models to FPGAs. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). 17:1–17:12. https://doi.org/10.1109/MICRO.2016.7783720
[30]
Rohit Sharma. 2019. trafficVision: Inferencing Traffic with yolo on AMU GPU. Retrieved from https://github.com/srohit0/trafficVision.
[31]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott E. Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 1–9. https://doi.org/10.1109/CVPR.2015.7298594
[32]
Yaman Umuroglu, Nicholas J. Fraser, Giulio Gambardella, Michaela Blott, Philip Heng Wai Leong, Magnus Jahre, and Kees A. Vissers. 2017. FINN: A framework for fast, scalable binarized neural network inference. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’17). 65–74. http://dl.acm.org/citation.cfm?id=3021744.
[33]
Ephrem Wu, Xiaoqian Zhang, David Berman, and Inkeun Cho. 2017. A high-throughput reconfigurable processing array for neural networks. In Proceedings of the 27th International Conference on Field Programmable Logic and Applications (FPL’17). 1–4. https://doi.org/10.23919/FPL.2017.8056794
[34]
Xilinx. 2018. Adaptable Accelerator Cards for Data Center Workloads. Retrieved from https://www.xilinx.com/products/boards-and-kits/alveo.html.
[35]
Xilinx. 2018. UG579, UltraScale Architecture DSP48 Slice User Guide.
[36]
Yu Xing, Shuang Liang, Lingzhi Sui, Xijie Jia, Jiantao Qiu, Xin Liu, Yushun Wang, Yi Shan, and Yu Wang. 2020. DNNVM: End-to-End compiler leveraging heterogeneous optimizations on FPGA-Based CNN accelerators. IEEE Trans. Comput. Aided Des. Integr. Circ. Syst. 39, 10 (2020), 2668–2681. https://doi.org/10.1109/TCAD.2019.2930577
[37]
Yunxuan Yu, Chen Wu, Tiandong Zhao, Kun Wang, and Lei He. 2020. OPU: An FPGA-Based overlay processor for convolutional neural networks. IEEE Trans. Very Large Scale Integr. Syst. 28, 1 (2020), 35–47. https://doi.org/10.1109/TVLSI.2019.2939726

Cited By

View all
  • (2023)Modular and Lean Architecture with Elasticity for Sparse Matrix Vector Multiplication on FPGAs2023 IEEE 31st Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM57271.2023.00023(133-143)Online publication date: May-2023
  • (2023)Research on technologies of accelerated processing module based on VU13P2023 3rd International Conference on Electronic Information Engineering and Computer (EIECT)10.1109/EIECT60552.2023.10442467(249-254)Online publication date: 17-Nov-2023
  • (2023)Investigating the Impact of Non-Volatile Memories on Energy-Efficiency of Coarse-Grained Reconfigurable Architectures2023 26th Euromicro Conference on Digital System Design (DSD)10.1109/DSD60849.2023.00107(748-755)Online publication date: 6-Sep-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Reconfigurable Technology and Systems
ACM Transactions on Reconfigurable Technology and Systems  Volume 15, Issue 2
June 2022
310 pages
ISSN:1936-7406
EISSN:1936-7414
DOI:10.1145/3501287
  • Editor:
  • Deming Chen
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 January 2022
Accepted: 01 June 2021
Revised: 01 May 2021
Received: 01 January 2021
Published in TRETS Volume 15, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. AI inference
  2. low latency
  3. high efficiency
  4. custom architectures
  5. and optimizations

Qualifiers

  • Research-article
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)123
  • Downloads (Last 6 weeks)6
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Modular and Lean Architecture with Elasticity for Sparse Matrix Vector Multiplication on FPGAs2023 IEEE 31st Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM57271.2023.00023(133-143)Online publication date: May-2023
  • (2023)Research on technologies of accelerated processing module based on VU13P2023 3rd International Conference on Electronic Information Engineering and Computer (EIECT)10.1109/EIECT60552.2023.10442467(249-254)Online publication date: 17-Nov-2023
  • (2023)Investigating the Impact of Non-Volatile Memories on Energy-Efficiency of Coarse-Grained Reconfigurable Architectures2023 26th Euromicro Conference on Digital System Design (DSD)10.1109/DSD60849.2023.00107(748-755)Online publication date: 6-Sep-2023
  • (2022)Emerging Trends in Deep Learning for Credit Scoring: A ReviewElectronics10.3390/electronics1119318111:19(3181)Online publication date: 3-Oct-2022
  • (2022)A Resource Efficient CNN Accelerator for Sensor Signal Processing Based on FPGAJournal of Circuits, Systems and Computers10.1142/S021812662350075532:05Online publication date: 5-Oct-2022
  • (2022)Energy Efficient Design of Coarse-Grained Reconfigurable Architectures: Insights, Trends and Challenges2022 International Conference on Field-Programmable Technology (ICFPT)10.1109/ICFPT56656.2022.9974339(1-11)Online publication date: 5-Dec-2022

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media