poster

Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks

Authors:

Jason CongAuthors Info & Claims

ACM TURC '23: Proceedings of the ACM Turing Award Celebration Conference - China 2023

Pages 47 - 48

https://doi.org/10.1145/3603165.3607390

Published: 25 September 2023 Publication History

Get Access

Abstract

With the recent advancement of multilayer convolutional neural networks (CNN), deep learning has achieved amazing success in many areas, especially in visual content understanding and classification. To improve the performance and energy-efficiency of the computation-demanding CNN, the FPGA-based acceleration emerges as one of the most attractive alternatives. In this paper we design and implement Caffeine, a hardware and software co-designed library to efficiently accelerate the entire CNN on FPGAs. Based on the portable high-level synthesis, Caffeine provides a design automation flow that optimizes and generates FPGA-based AI hardware and runtime software codes. We integrate Caffeine into the industry-standard software deep learning framework.Caffeine achieves a peak performance of 365 GOPS on Xilinx KU060 FPGA and 636 GOPS on Virtex7 690t FPGA, showing up to 7.3x and 43.5x performance and energy gains over Caffe on a 12-core Xeon server, and 1.5x better energy-efficiency over the GPU.

References

[1]

Martın Abadi 2016. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. arXiv preprint arXiv:1603.04467 (2016).

Google Scholar

[2]

Tianshi Chen 2014. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In ACM SIGPLAN Notices, Vol. 49. ACM, 269–284.

Google Scholar

[3]

Ross Girshick 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR. 580–587.

Google Scholar

[4]

Y. Q. Caffe Jia. 2013. An Open Source Convolutional Architecture for Fast Feature Embedding. http://caffe.berkeleyvision.org.

Google Scholar

[5]

Alex Krizhevsky 2012. Imagenet classification with deep convolutional neural networks. In NIPS. 1097–1105.

Google Scholar

[6]

Meta. 2016. PyTorch. https://pytorch.org/

Google Scholar

[7]

Jiantao Qiu 2016. Going deeper with embedded fpga platform for convolutional neural network. In FPGA. ACM, 26–35.

Google Scholar

[8]

Murugan Sankaradas 2009. A massively parallel coprocessor for convolutional neural networks. In ASAP. IEEE, 53–60.

Google Scholar

[9]

Karen Simonyan 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

Google Scholar

[10]

Naveen Suda 2016. Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks. In FPGA. ACM, 16–25.

Google Scholar

[11]

Yaniv Taigman 2014. Deepface: Closing the gap to human-level performance in face verification. In CVPR. 1701–1708.

Google Scholar

[12]

Chen Zhang 2015. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks. In FPGA (Monterey, California, USA). ACM, 161–170. https://doi.org/10.1145/2684746.2689060

Digital Library

Google Scholar

Cited By

View all

Basalama SCong JPutnam ALi J(2025)Stream-HLS: Towards Automatic Dataflow AccelerationProceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3706628.3708878(103-114)Online publication date: 27-Feb-2025
https://dl.acm.org/doi/10.1145/3706628.3708878
He JZhang MXu JYu LLi W(2024)Optimizing CNN Hardware Acceleration with Configurable Vector Units and Feature Layout StrategiesElectronics10.3390/electronics1306105013:6(1050)Online publication date: 12-Mar-2024
https://doi.org/10.3390/electronics13061050
Sun MXu KLin XHu YYin B(2024)Hardware-Friendly 3-D CNN Acceleration With Balanced Kernel Group SparsityIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.339004043:10(3027-3040)Online publication date: Oct-2024
https://doi.org/10.1109/TCAD.2024.3390040
Show More Cited By

Recommendations

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
FPGA '15: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Convolutional neural network (CNN) has been widely employed for image recognition because it can achieve high accuracy by emulating behavior of optic nerves in living creatures. Recently, rapid growth of modern applications based on deep learning ...
Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks
2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
With the recent advancement of multilayer convolutional neural networks (CNN), deep learning has achieved amazing success in many areas, especially in visual content understanding and classification. To improve the performance and energy-efficiency of the ...
Caffeine: Toward Uniformed Representation and Acceleration for Deep Convolutional Neural Networks
With the recent advancement of multilayer convolutional neural networks (CNNs) and fully connected networks (FCNs), deep learning has achieved amazing success in many areas, especially in visual content understanding and classification. To improve the ...

Comments

Information & Contributors

Information

Published In

ACM TURC '23: Proceedings of the ACM Turing Award Celebration Conference - China 2023

July 2023

173 pages

ISBN:9798400702334

DOI:10.1145/3603165

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 September 2023

Check for updates

Qualifiers

Poster
Research
Refereed limited

Conference

ACM TURC '23

ACM TURC '23: ACM Turing Award Celebration Conference 2023

July 28 - 30, 2023

Wuhan, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
376
Total Downloads

Downloads (Last 12 months)191
Downloads (Last 6 weeks)3

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Basalama SCong JPutnam ALi J(2025)Stream-HLS: Towards Automatic Dataflow AccelerationProceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3706628.3708878(103-114)Online publication date: 27-Feb-2025
https://dl.acm.org/doi/10.1145/3706628.3708878
He JZhang MXu JYu LLi W(2024)Optimizing CNN Hardware Acceleration with Configurable Vector Units and Feature Layout StrategiesElectronics10.3390/electronics1306105013:6(1050)Online publication date: 12-Mar-2024
https://doi.org/10.3390/electronics13061050
Sun MXu KLin XHu YYin B(2024)Hardware-Friendly 3-D CNN Acceleration With Balanced Kernel Group SparsityIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.339004043:10(3027-3040)Online publication date: Oct-2024
https://doi.org/10.1109/TCAD.2024.3390040
Yang TLi CZhang YLuo KDong ZLi J(2024)A Convolutional Neural Network Accelerator with High-level Synthesis2024 IEEE 4th International Conference on Electronic Technology, Communication and Information (ICETCI)10.1109/ICETCI61221.2024.10594247(38-43)Online publication date: 24-May-2024
https://doi.org/10.1109/ICETCI61221.2024.10594247
Hashemi TSadeghi RRajabalipanah MHojati ZNavabi Z(2024)Configurable DRAM Access for Neural Network Accelerators: A SystemC Virtual Platform Approach2024 IEEE East-West Design & Test Symposium (EWDTS)10.1109/EWDTS63723.2024.10873634(1-6)Online publication date: 13-Nov-2024
https://doi.org/10.1109/EWDTS63723.2024.10873634

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Abstract

References

Cited By

Recommendations

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks