Towards An FPGA-targeted Hardware/Software Co-design Framework for CNN-based Edge Computing

Pham-Quoc, Cuong; Nguyen, Xuan-Quang; Thinh, Tran Ngoc

doi:10.1007/s11036-022-01985-9

Towards An FPGA-targeted Hardware/Software Co-design Framework for CNN-based Edge Computing

Published: 14 May 2022

Volume 27, pages 2024–2035, (2022)
Cite this article

Mobile Networks and Applications Aims and scope Submit manuscript

782 Accesses
6 Citations
Explore all metrics

Abstract

In recent years, AI-based applications have been used more frequently in many different areas. More and more convolutional neural network models for AI applications have been proposed to improve accuracy compared to other methods like pattern matching or traditional image processing. However, the required computing power for AI applications during inference phases exceeds the processing ability of most edge computing systems. In this work, we target a hardware/software co-design framework to accelerate the performance of CNN-based edge computing applications. The proposed framework targets FPGA technology, which offers much flexibility to update or configure the computing systems for different purposes or working conditions. The framework allows designers to explore design space quickly to achieve better results without much effort. We implement our prototype version with an FPGA-based MPSoC platform using the MobileNet CNN model. The experimental results show that our system is always better than a quad-core ARM Cortex-A53 processor by achieving speed-ups by up to 69.4×. Compared to an Intel Core i7 CPU, the proposed system performs speed-ups by up to 4.67×. However, sometimes our system is not as good as the Intel CPU due to huge communication overhead. Synthesis results also report that our system can function at 159 MHz and consumes only 3.179 W, which is suitable for edge computing applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 7

CNNX: A Low Cost, CNN Accelerator for Embedded System in Vision at Edge

Article 21 May 2022

A Reconfigurable Convolutional Neural Networks Accelerator Based on FPGA

Exploiting Pixel Redundancy and Approximate Computing for Efficient Hardware–Software Co-design of CNN on IoT Edge Devices

Data Availability

The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available because the data are parts of the results of the B2021-20-02 project funded by VNUHCM. All results generated by the project are managed by and belong to the funder.

References

Khan A, Sohail A, Zahoora U, Qureshi AS (2020) A survey of the recent architectures of deep convolutional neural networks. Artif Intell Rev 53(8):5455–5516
Article Google Scholar
Strigl D, Kofler K, Podlipnig S (2010) Performance and scalability of gpu-based convolutional neural networks. In: 2010 18Th Euromicro conference on parallel, distributed and network-based processing, pp 317–324
Lacey G, Taylor GW, Areibi S (2016) Deep learning on fpgas: Past, present, and future. arxiv:1602.04283
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. arxiv:1409.1556
Wu R, Guo X, Du J, Li J (2021) Accelerating neural network inference on fpga-based platforms—a survey. Electronics 10(9). https://doi.org/10.3390/electronics10091025
Williams R (2017) What’s next? [the end of moore’s law]. Comput Sci Eng 19(02):7–13
Article Google Scholar
Guo K, Zeng S, Yu J, Wang Y, Yang H (2019) [dl] a survey of fpga-based neural network inference accelerators. ACM Trans. Reconfigurable Technol. Syst 12(1)
Mittal S (2020) A survey of fpga-based accelerators for convolutional neural networks. Neural Comput Applic 32(4):1109–1139
Article Google Scholar
Wu R, Guo X, Du J, Li J (2021) Accelerating neural network inference on fpga-based platforms—a survey. Electronics 10(9)
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arxiv:1704.04861
Pham-Quoc C, Al-Ars Z, Bertels K (2012) A heuristic-based communication-aware hardware optimization approach in heterogeneous multicore systems. In: 2012 International conference on reconfigurable computing and FPGAs, pp 1–6. https://doi.org/10.1109/ReConFig.2012.6416720
DeMicheli G, Sami M (1996) Hardware/software co-design. Nato Science Series E. Springer Netherlands. https://www.springer.com/gp/book/9780792338833. Accessed 1 Nov 2021
Pham-Quoc C, Nguyen XQ, Thinh TN (2021) Hardware/software co-design for convolutional neural networks acceleration: a survey and open issues. In: 2021 10Th EAI International conference on context-aware systems and applications, pp 1–15
Guo K, Han S, Yao S, Wang Y, Xie Y, Yang H (2017) Software-hardware codesign for efficient neural network acceleration. IEEE Micro 37(2):18–25
Article Google Scholar
Li H, Fan X, Jiao L, Cao W, Zhou X, Wang L (2016) A high performance fpga-based accelerator for large-scale convolutional neural networks. In: 2016 26Th International conference on field programmable logic and applications (FPL), pp 1–9
Lin X, Yin S, Tu F, Liu L, Li X, Wei S (2018) Lcp: a layer clusters paralleling mapping method for accelerating inception and residual networks on fpga. In: 2018 55Th ACM/ESDA/IEEE design automation conference (DAC), pp 1–6
Liu Z, Dou Y, Jiang J, Xu J (2016) Automatic code generation of convolutional neural networks in fpga implementation. In: 2016 International conference on field-programmable technology (FPT), pp 61–68
Ma Y, Cao Y, Vrudhula S, Seo JS (2017) Optimizing loop operation and dataflow in fpga acceleration of deep convolutional neural networks. In: Proceedings of the 2017 ACM/SIGDA international symposium on field-programmable gate arrays, FPGA ’17, pp 45–54. ACM, New York, NY, USA
Yang L, He Z, Fan D (2018) A fully onchip binarized convolutional neural network fpga impelmentation with accurate inference. In: Proceedings of the international symposium on low power electronics and design, ISLPED ’18. ACM, New York, NY, USA
Zhang X, Wang J, Zhu C, Lin Y, Xiong J, Hwu WM, Chen D (2018) Dnnbuilder: An automated tool for building high-performance dnn hardware accelerators for fpgas. In: Proceedings of the international conference on computer-aided design, ICCAD ’18. ACM, New York, NY, USA
Podili A, Zhang C, Prasanna V (2017) Fast and efficient implementation of convolutional neural networks on fpga. In: 2017 IEEE 28Th international conference on application-specific systems, architectures and processors (ASAP), pp 11–18
Motamedi M, Gysel P, Akella V, Ghiasi S (2016) Design space exploration of fpga-based deep convolutional neural networks. In: 2016 21St Asia and south pacific design automation conference (ASP-DAC), pp 575–580
Yang Y, Huang Q, Wu B, Zhang T, Ma L, Gambardella G, Blott M, Lavagno L, Vissers K, Wawrzynek J, Keutzer K (2019) Synetgy: Algorithm-hardware co-design for convnet accelerators on embedded fpgas. In: Proceedings of the 2019 ACM/SIGDA international symposium on field-programmable gate arrays, FPGA ’19, pp 23–32. ACM, New York, NY, USA
Ghasemzadeh M, Samragh M, Koushanfar F (2018) Rebnet: Residual binarized neural network. In: IEEE 26th Annual international symposium on field-programmable custom computing machines (FCCM), pp 57–64. IEEE Computer Society, Los Alamitos, CA, USA
Jiao L, Luo C, Cao W, Zhou X, Wang L (2017) Accelerating low bit-width convolutional neural networks with embedded fpga. In: 2017 27Th International conference on field programmable logic and applications (FPL), pp 1–4
Moss DJM, Nurvitadhi E, Sim J, Mishra A, Marr D, Subhaschandra S, Leong PHW (2017) High performance binary neural networks on the xeon+fpga™platform. In: 2017 27Th International conference on field programmable logic and applications (FPL), pp 1–4
Nakahara H, Fujii T, Sato S (2017) A fully connected layer elimination for a binarizec convolutional neural network on an fpga. In: 2017 27Th international conference on field programmable logic and applications (FPL), pp 1–4
Nurvitadhi E, Sheffield D, Sim J, Mishra A, Venkatesh G, Marr D (2016) Accelerating binarized neural networks: Comparison of fpga, cpu, gpu, and asic. In: 2016 International conference on field-programmable technology (FPT), pp 77–84
Prost-Boucle A, Bourge A, Pétrot F., Alemdar H, Caldwell N, Leroy V (2017) Scalable high-performance architecture for convolutional ternary neural networks on fpga. In: 2017 27Th International conference on field programmable logic and applications (FPL), pp 1–7
Umuroglu Y, Fraser NJ, Gambardella G, Blott M, Leong P, Jahre M, Vissers K (2017) Finn: a framework for fast, scalable binarized neural network inference. In: Proceedings of the 2017 ACM/SIGDA international symposium on field-programmable gate arrays, FPGA ’17, pp 65–74. ACM, New York, NY, USA
Liang S, Yin S, Liu L, Luk W, Wei S (2018) Fp-bnn: Binarized neural network on fpga. Neurocomputing 275:1072–1086
Article Google Scholar
Cao S, Zhang C, Yao Z, Xiao W, Nie L, Zhan D, Liu Y, Wu M, Zhang L (2019) Efficient and effective sparse lstm on fpga with bank-balanced sparsity. In: Proceedings of the 2019 ACM/SIGDA international symposium on field-programmable gate arrays, FPGA ’19, pp 63–72. Association for Computing Machinery
Kala S, Jose BR, Mathew J, Nalesh S (2019) High-performance cnn accelerator on fpga using unified winograd-gemm architecture. IEEE Trans Very Large Scale Integration (VLSI) Systems 27 (12):2816–2828. https://doi.org/10.1109/TVLSI.2019.2941250
Article Google Scholar
Wang J, Lou Q, Zhang X, Zhu C, Lin Y, Chen D (2018) Design flow of accelerating hybrid extremely low bit-width neural network in embedded fpga. In: 2018 28Th International conference on field programmable logic and applications (FPL), pp 163–1636. https://doi.org/10.1109/FPL.2018.00035
Ding C, Wang S, Liu N, Xu K, Wang Y, Liang Y (2019) Req-yolo: a resource-aware, efficient quantization framework for object detection on fpgas. In: Proceedings of the 2019 ACM/SIGDA international symposium on field-programmable gate arrays, FPGA ’19, pp 33–42. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3289602.3293904
Pham-Quoc C, Al-Ars Z, Bertels K (2013) Heterogeneous hardware accelerators interconnect: an overview. In: 2013 NASA/ESA Conference on adaptive hardware and systems (AHS-2013), pp 189–197
Wang J, Lin J, Wang Z (2018) Efficient hardware architectures for deep convolutional neural network. IEEE Trans Circ Syst I: Regular Papers 65(6):1941–1953
Google Scholar
Lu L, Liang Y, Xiao Q, Yan S (2017) Evaluating fast algorithms for convolutional neural networks on fpgas. In: 2017 IEEE 25Th annual international symposium on field-programmable custom computing machines (FCCM), pp 101–108
Guan Y, Liang H, Xu N, Wang W, Shi S, Chen X, Sun G, Zhang W, Cong J (2017) Fp-dnn: an automated framework for mapping deep neural networks onto fpgas with rtl-hls hybrid templates. In: 2017 IEEE 25Th Annual international symposium on field-programmable custom computing machines (FCCM), pp 152–159
Qiu J, Wang J, Yao S, Guo K, Li B, Zhou E, Yu J, Tang T, Xu N, Song S, Wang Y, Yang H (2016) Going deeper with embedded fpga platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA international symposium on field-programmable gate arrays, FPGA ’16, pp 26–35. ACM, New York, NY, USA
Wei X, Yu CH, Zhang P, Chen Y, Wang Y, Hu H, Liang Y, Cong J (2017) Automated systolic array architecture synthesis for high throughput cnn inference on fpgas. In: 2017 54Th ACM/EDAC/IEEE design automation conference (DAC), pp 1–6
Shen Y, Ferdman M, Milder P (2017) Escher: a cnn accelerator with flexible buffering to minimize off-chip transfer. In: 2017 IEEE 25Th annual international symposium on field-programmable custom computing machines (FCCM), pp 93–100
Guo K, Sui L, Qiu J, Yu J, Wang J, Yao S, Han S, Wang Y, Yang H (2018) Angel-eye: a complete design flow for mapping cnn onto embedded fpga. IEEE Trans Comput-Aided Des Integr Circuits Syst 37(1):35–47
Article Google Scholar
Nguyen XQ, Pham-Quoc C (2021) An fpga-based convolution ip core for deep neural networks acceleration. REV J Electron Commun 11(July-Dec):1–8
Google Scholar
Sim J, Lee S, Kim LS (2020) An energy-efficient deep convolutional neural network inference processor with enhanced output stationary dataflow in 65-nm cmos. IEEE Trans Very Large Scale Integration (VLSI) Syst 28(1):87–100. https://doi.org/10.1109/TVLSI.2019.2935251
Article Google Scholar
Avnet (2021) Ultra96-v2 board - arm-based, xilinx zynq ultrascale+ mpsoc development board based on the linaro 96boards consumer edition specification. https://www.avnet.com/wps/portal/us/products/new-product-introductions/npi/aes-ultra96-v2/. Accessed 10 Oct 2021
Xilinx (2021) Zynq ultrascale+ mpsoc. https://www.xilinx.com/products/silicon-devices/soc/zynq-ultrascale-mpsoc.html. Accessed 10 Oct 2021
Pham-Quoc C, Heisswolf J, Werner S, Al-Ars Z, Becker J, Bertels K (2013) Hybrid interconnect design for heterogeneous hardware accelerators. In: Proceedings of the Conference on Design, Automation and Test in Europe, DATE ’13, pp 843–846. EDA Consortium, San Jose, CA, USA

Download references

Funding

This research is funded by Vietnam National University - Ho Chi Minh City (VNU-HCM) under grant number B2021-20-02. We acknowledge the support of time and facilities from Ho Chi Minh City University of Technology (HCMUT), VNU-HCM for this study.

Author information

Authors and Affiliations

Ho Chi Minh City University of Technology (HCMUT), 268 Ly Thuong Kiet street, District 10, Ho Chi Minh City, Vietnam
Cuong Pham-Quoc
Vietnam National University - Ho Chi Minh City, Thu Duc City, Ho Chi Minh City, Vietnam
Xuan-Quang Nguyen & Tran Ngoc Thinh

Authors

Cuong Pham-Quoc
View author publications
You can also search for this author in PubMed Google Scholar
Xuan-Quang Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Tran Ngoc Thinh
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Cuong Pham-Quoc designed system and mainly wrote the paper. Xuan-Quang Nguyen mainly implemented and tested the proposed systems. Tran Ngoc Thinh contributed in system design and paper proofread.

Corresponding author

Correspondence to Cuong Pham-Quoc.

Ethics declarations

Conflict of Interests

The authors declare that there are no conflicts of interest regarding the publication of this paper

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pham-Quoc, C., Nguyen, XQ. & Thinh, T.N. Towards An FPGA-targeted Hardware/Software Co-design Framework for CNN-based Edge Computing. Mobile Netw Appl 27, 2024–2035 (2022). https://doi.org/10.1007/s11036-022-01985-9

Download citation

Accepted: 04 April 2022
Published: 14 May 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s11036-022-01985-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards An FPGA-targeted Hardware/Software Co-design Framework for CNN-based Edge Computing

Abstract

Access this article

Similar content being viewed by others

CNNX: A Low Cost, CNN Accelerator for Embedded System in Vision at Edge

A Reconfigurable Convolutional Neural Networks Accelerator Based on FPGA

Exploiting Pixel Redundancy and Approximate Computing for Efficient Hardware–Software Co-design of CNN on IoT Edge Devices

Data Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Towards An FPGA-targeted Hardware/Software Co-design Framework for CNN-based Edge Computing

Abstract

Access this article

Similar content being viewed by others

CNNX: A Low Cost, CNN Accelerator for Embedded System in Vision at Edge

A Reconfigurable Convolutional Neural Networks Accelerator Based on FPGA

Exploiting Pixel Redundancy and Approximate Computing for Efficient Hardware–Software Co-design of CNN on IoT Edge Devices

Data Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation