skip to main content
10.1145/3603165.3607390acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesacm-turcConference Proceedingsconference-collections
poster

Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks

Published: 25 September 2023 Publication History

Abstract

With the recent advancement of multilayer convolutional neural networks (CNN), deep learning has achieved amazing success in many areas, especially in visual content understanding and classification. To improve the performance and energy-efficiency of the computation-demanding CNN, the FPGA-based acceleration emerges as one of the most attractive alternatives. In this paper we design and implement Caffeine, a hardware and software co-designed library to efficiently accelerate the entire CNN on FPGAs. Based on the portable high-level synthesis, Caffeine provides a design automation flow that optimizes and generates FPGA-based AI hardware and runtime software codes. We integrate Caffeine into the industry-standard software deep learning framework.Caffeine achieves a peak performance of 365 GOPS on Xilinx KU060 FPGA and 636 GOPS on Virtex7 690t FPGA, showing up to 7.3x and 43.5x performance and energy gains over Caffe on a 12-core Xeon server, and 1.5x better energy-efficiency over the GPU.

References

[1]
Martın Abadi 2016. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. arXiv preprint arXiv:1603.04467 (2016).
[2]
Tianshi Chen 2014. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In ACM SIGPLAN Notices, Vol. 49. ACM, 269–284.
[3]
Ross Girshick 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR. 580–587.
[4]
Y. Q. Caffe Jia. 2013. An Open Source Convolutional Architecture for Fast Feature Embedding. http://caffe.berkeleyvision.org.
[5]
Alex Krizhevsky 2012. Imagenet classification with deep convolutional neural networks. In NIPS. 1097–1105.
[6]
Meta. 2016. PyTorch. https://pytorch.org/
[7]
Jiantao Qiu 2016. Going deeper with embedded fpga platform for convolutional neural network. In FPGA. ACM, 26–35.
[8]
Murugan Sankaradas 2009. A massively parallel coprocessor for convolutional neural networks. In ASAP. IEEE, 53–60.
[9]
Karen Simonyan 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[10]
Naveen Suda 2016. Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks. In FPGA. ACM, 16–25.
[11]
Yaniv Taigman 2014. Deepface: Closing the gap to human-level performance in face verification. In CVPR. 1701–1708.
[12]
Chen Zhang 2015. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks. In FPGA (Monterey, California, USA). ACM, 161–170. https://doi.org/10.1145/2684746.2689060

Cited By

View all
  • (2025)Stream-HLS: Towards Automatic Dataflow AccelerationProceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3706628.3708878(103-114)Online publication date: 27-Feb-2025
  • (2024)Optimizing CNN Hardware Acceleration with Configurable Vector Units and Feature Layout StrategiesElectronics10.3390/electronics1306105013:6(1050)Online publication date: 12-Mar-2024
  • (2024)Hardware-Friendly 3-D CNN Acceleration With Balanced Kernel Group SparsityIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.339004043:10(3027-3040)Online publication date: Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ACM TURC '23: Proceedings of the ACM Turing Award Celebration Conference - China 2023
July 2023
173 pages
ISBN:9798400702334
DOI:10.1145/3603165
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 September 2023

Check for updates

Qualifiers

  • Poster
  • Research
  • Refereed limited

Conference

ACM TURC '23

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)191
  • Downloads (Last 6 weeks)3
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Stream-HLS: Towards Automatic Dataflow AccelerationProceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3706628.3708878(103-114)Online publication date: 27-Feb-2025
  • (2024)Optimizing CNN Hardware Acceleration with Configurable Vector Units and Feature Layout StrategiesElectronics10.3390/electronics1306105013:6(1050)Online publication date: 12-Mar-2024
  • (2024)Hardware-Friendly 3-D CNN Acceleration With Balanced Kernel Group SparsityIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.339004043:10(3027-3040)Online publication date: Oct-2024
  • (2024)A Convolutional Neural Network Accelerator with High-level Synthesis2024 IEEE 4th International Conference on Electronic Technology, Communication and Information (ICETCI)10.1109/ICETCI61221.2024.10594247(38-43)Online publication date: 24-May-2024
  • (2024)Configurable DRAM Access for Neural Network Accelerators: A SystemC Virtual Platform Approach2024 IEEE East-West Design & Test Symposium (EWDTS)10.1109/EWDTS63723.2024.10873634(1-6)Online publication date: 13-Nov-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media