Abstract
In recent years, AI-based applications have been used more frequently in many different areas. More and more convolutional neural network models for AI applications have been proposed to improve accuracy compared to other methods like pattern matching or traditional image processing. However, the required computing power for AI applications during inference phases exceeds the processing ability of most edge computing systems. In this work, we target a hardware/software co-design framework to accelerate the performance of CNN-based edge computing applications. The proposed framework targets FPGA technology, which offers much flexibility to update or configure the computing systems for different purposes or working conditions. The framework allows designers to explore design space quickly to achieve better results without much effort. We implement our prototype version with an FPGA-based MPSoC platform using the MobileNet CNN model. The experimental results show that our system is always better than a quad-core ARM Cortex-A53 processor by achieving speed-ups by up to 69.4×. Compared to an Intel Core i7 CPU, the proposed system performs speed-ups by up to 4.67×. However, sometimes our system is not as good as the Intel CPU due to huge communication overhead. Synthesis results also report that our system can function at 159 MHz and consumes only 3.179 W, which is suitable for edge computing applications.
Similar content being viewed by others
Data Availability
The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available because the data are parts of the results of the B2021-20-02 project funded by VNUHCM. All results generated by the project are managed by and belong to the funder.
References
Khan A, Sohail A, Zahoora U, Qureshi AS (2020) A survey of the recent architectures of deep convolutional neural networks. Artif Intell Rev 53(8):5455–5516
Strigl D, Kofler K, Podlipnig S (2010) Performance and scalability of gpu-based convolutional neural networks. In: 2010 18Th Euromicro conference on parallel, distributed and network-based processing, pp 317–324
Lacey G, Taylor GW, Areibi S (2016) Deep learning on fpgas: Past, present, and future. arxiv:1602.04283
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. arxiv:1409.1556
Wu R, Guo X, Du J, Li J (2021) Accelerating neural network inference on fpga-based platforms—a survey. Electronics 10(9). https://doi.org/10.3390/electronics10091025
Williams R (2017) What’s next? [the end of moore’s law]. Comput Sci Eng 19(02):7–13
Guo K, Zeng S, Yu J, Wang Y, Yang H (2019) [dl] a survey of fpga-based neural network inference accelerators. ACM Trans. Reconfigurable Technol. Syst 12(1)
Mittal S (2020) A survey of fpga-based accelerators for convolutional neural networks. Neural Comput Applic 32(4):1109–1139
Wu R, Guo X, Du J, Li J (2021) Accelerating neural network inference on fpga-based platforms—a survey. Electronics 10(9)
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arxiv:1704.04861
Pham-Quoc C, Al-Ars Z, Bertels K (2012) A heuristic-based communication-aware hardware optimization approach in heterogeneous multicore systems. In: 2012 International conference on reconfigurable computing and FPGAs, pp 1–6. https://doi.org/10.1109/ReConFig.2012.6416720
DeMicheli G, Sami M (1996) Hardware/software co-design. Nato Science Series E. Springer Netherlands. https://www.springer.com/gp/book/9780792338833. Accessed 1 Nov 2021
Pham-Quoc C, Nguyen XQ, Thinh TN (2021) Hardware/software co-design for convolutional neural networks acceleration: a survey and open issues. In: 2021 10Th EAI International conference on context-aware systems and applications, pp 1–15
Guo K, Han S, Yao S, Wang Y, Xie Y, Yang H (2017) Software-hardware codesign for efficient neural network acceleration. IEEE Micro 37(2):18–25
Li H, Fan X, Jiao L, Cao W, Zhou X, Wang L (2016) A high performance fpga-based accelerator for large-scale convolutional neural networks. In: 2016 26Th International conference on field programmable logic and applications (FPL), pp 1–9
Lin X, Yin S, Tu F, Liu L, Li X, Wei S (2018) Lcp: a layer clusters paralleling mapping method for accelerating inception and residual networks on fpga. In: 2018 55Th ACM/ESDA/IEEE design automation conference (DAC), pp 1–6
Liu Z, Dou Y, Jiang J, Xu J (2016) Automatic code generation of convolutional neural networks in fpga implementation. In: 2016 International conference on field-programmable technology (FPT), pp 61–68
Ma Y, Cao Y, Vrudhula S, Seo JS (2017) Optimizing loop operation and dataflow in fpga acceleration of deep convolutional neural networks. In: Proceedings of the 2017 ACM/SIGDA international symposium on field-programmable gate arrays, FPGA ’17, pp 45–54. ACM, New York, NY, USA
Yang L, He Z, Fan D (2018) A fully onchip binarized convolutional neural network fpga impelmentation with accurate inference. In: Proceedings of the international symposium on low power electronics and design, ISLPED ’18. ACM, New York, NY, USA
Zhang X, Wang J, Zhu C, Lin Y, Xiong J, Hwu WM, Chen D (2018) Dnnbuilder: An automated tool for building high-performance dnn hardware accelerators for fpgas. In: Proceedings of the international conference on computer-aided design, ICCAD ’18. ACM, New York, NY, USA
Podili A, Zhang C, Prasanna V (2017) Fast and efficient implementation of convolutional neural networks on fpga. In: 2017 IEEE 28Th international conference on application-specific systems, architectures and processors (ASAP), pp 11–18
Motamedi M, Gysel P, Akella V, Ghiasi S (2016) Design space exploration of fpga-based deep convolutional neural networks. In: 2016 21St Asia and south pacific design automation conference (ASP-DAC), pp 575–580
Yang Y, Huang Q, Wu B, Zhang T, Ma L, Gambardella G, Blott M, Lavagno L, Vissers K, Wawrzynek J, Keutzer K (2019) Synetgy: Algorithm-hardware co-design for convnet accelerators on embedded fpgas. In: Proceedings of the 2019 ACM/SIGDA international symposium on field-programmable gate arrays, FPGA ’19, pp 23–32. ACM, New York, NY, USA
Ghasemzadeh M, Samragh M, Koushanfar F (2018) Rebnet: Residual binarized neural network. In: IEEE 26th Annual international symposium on field-programmable custom computing machines (FCCM), pp 57–64. IEEE Computer Society, Los Alamitos, CA, USA
Jiao L, Luo C, Cao W, Zhou X, Wang L (2017) Accelerating low bit-width convolutional neural networks with embedded fpga. In: 2017 27Th International conference on field programmable logic and applications (FPL), pp 1–4
Moss DJM, Nurvitadhi E, Sim J, Mishra A, Marr D, Subhaschandra S, Leong PHW (2017) High performance binary neural networks on the xeon+fpga™platform. In: 2017 27Th International conference on field programmable logic and applications (FPL), pp 1–4
Nakahara H, Fujii T, Sato S (2017) A fully connected layer elimination for a binarizec convolutional neural network on an fpga. In: 2017 27Th international conference on field programmable logic and applications (FPL), pp 1–4
Nurvitadhi E, Sheffield D, Sim J, Mishra A, Venkatesh G, Marr D (2016) Accelerating binarized neural networks: Comparison of fpga, cpu, gpu, and asic. In: 2016 International conference on field-programmable technology (FPT), pp 77–84
Prost-Boucle A, Bourge A, Pétrot F., Alemdar H, Caldwell N, Leroy V (2017) Scalable high-performance architecture for convolutional ternary neural networks on fpga. In: 2017 27Th International conference on field programmable logic and applications (FPL), pp 1–7
Umuroglu Y, Fraser NJ, Gambardella G, Blott M, Leong P, Jahre M, Vissers K (2017) Finn: a framework for fast, scalable binarized neural network inference. In: Proceedings of the 2017 ACM/SIGDA international symposium on field-programmable gate arrays, FPGA ’17, pp 65–74. ACM, New York, NY, USA
Liang S, Yin S, Liu L, Luk W, Wei S (2018) Fp-bnn: Binarized neural network on fpga. Neurocomputing 275:1072–1086
Cao S, Zhang C, Yao Z, Xiao W, Nie L, Zhan D, Liu Y, Wu M, Zhang L (2019) Efficient and effective sparse lstm on fpga with bank-balanced sparsity. In: Proceedings of the 2019 ACM/SIGDA international symposium on field-programmable gate arrays, FPGA ’19, pp 63–72. Association for Computing Machinery
Kala S, Jose BR, Mathew J, Nalesh S (2019) High-performance cnn accelerator on fpga using unified winograd-gemm architecture. IEEE Trans Very Large Scale Integration (VLSI) Systems 27 (12):2816–2828. https://doi.org/10.1109/TVLSI.2019.2941250
Wang J, Lou Q, Zhang X, Zhu C, Lin Y, Chen D (2018) Design flow of accelerating hybrid extremely low bit-width neural network in embedded fpga. In: 2018 28Th International conference on field programmable logic and applications (FPL), pp 163–1636. https://doi.org/10.1109/FPL.2018.00035
Ding C, Wang S, Liu N, Xu K, Wang Y, Liang Y (2019) Req-yolo: a resource-aware, efficient quantization framework for object detection on fpgas. In: Proceedings of the 2019 ACM/SIGDA international symposium on field-programmable gate arrays, FPGA ’19, pp 33–42. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3289602.3293904
Pham-Quoc C, Al-Ars Z, Bertels K (2013) Heterogeneous hardware accelerators interconnect: an overview. In: 2013 NASA/ESA Conference on adaptive hardware and systems (AHS-2013), pp 189–197
Wang J, Lin J, Wang Z (2018) Efficient hardware architectures for deep convolutional neural network. IEEE Trans Circ Syst I: Regular Papers 65(6):1941–1953
Lu L, Liang Y, Xiao Q, Yan S (2017) Evaluating fast algorithms for convolutional neural networks on fpgas. In: 2017 IEEE 25Th annual international symposium on field-programmable custom computing machines (FCCM), pp 101–108
Guan Y, Liang H, Xu N, Wang W, Shi S, Chen X, Sun G, Zhang W, Cong J (2017) Fp-dnn: an automated framework for mapping deep neural networks onto fpgas with rtl-hls hybrid templates. In: 2017 IEEE 25Th Annual international symposium on field-programmable custom computing machines (FCCM), pp 152–159
Qiu J, Wang J, Yao S, Guo K, Li B, Zhou E, Yu J, Tang T, Xu N, Song S, Wang Y, Yang H (2016) Going deeper with embedded fpga platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA international symposium on field-programmable gate arrays, FPGA ’16, pp 26–35. ACM, New York, NY, USA
Wei X, Yu CH, Zhang P, Chen Y, Wang Y, Hu H, Liang Y, Cong J (2017) Automated systolic array architecture synthesis for high throughput cnn inference on fpgas. In: 2017 54Th ACM/EDAC/IEEE design automation conference (DAC), pp 1–6
Shen Y, Ferdman M, Milder P (2017) Escher: a cnn accelerator with flexible buffering to minimize off-chip transfer. In: 2017 IEEE 25Th annual international symposium on field-programmable custom computing machines (FCCM), pp 93–100
Guo K, Sui L, Qiu J, Yu J, Wang J, Yao S, Han S, Wang Y, Yang H (2018) Angel-eye: a complete design flow for mapping cnn onto embedded fpga. IEEE Trans Comput-Aided Des Integr Circuits Syst 37(1):35–47
Nguyen XQ, Pham-Quoc C (2021) An fpga-based convolution ip core for deep neural networks acceleration. REV J Electron Commun 11(July-Dec):1–8
Sim J, Lee S, Kim LS (2020) An energy-efficient deep convolutional neural network inference processor with enhanced output stationary dataflow in 65-nm cmos. IEEE Trans Very Large Scale Integration (VLSI) Syst 28(1):87–100. https://doi.org/10.1109/TVLSI.2019.2935251
Avnet (2021) Ultra96-v2 board - arm-based, xilinx zynq ultrascale+ mpsoc development board based on the linaro 96boards consumer edition specification. https://www.avnet.com/wps/portal/us/products/new-product-introductions/npi/aes-ultra96-v2/. Accessed 10 Oct 2021
Xilinx (2021) Zynq ultrascale+ mpsoc. https://www.xilinx.com/products/silicon-devices/soc/zynq-ultrascale-mpsoc.html. Accessed 10 Oct 2021
Pham-Quoc C, Heisswolf J, Werner S, Al-Ars Z, Becker J, Bertels K (2013) Hybrid interconnect design for heterogeneous hardware accelerators. In: Proceedings of the Conference on Design, Automation and Test in Europe, DATE ’13, pp 843–846. EDA Consortium, San Jose, CA, USA
Funding
This research is funded by Vietnam National University - Ho Chi Minh City (VNU-HCM) under grant number B2021-20-02. We acknowledge the support of time and facilities from Ho Chi Minh City University of Technology (HCMUT), VNU-HCM for this study.
Author information
Authors and Affiliations
Contributions
Cuong Pham-Quoc designed system and mainly wrote the paper. Xuan-Quang Nguyen mainly implemented and tested the proposed systems. Tran Ngoc Thinh contributed in system design and paper proofread.
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that there are no conflicts of interest regarding the publication of this paper
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Pham-Quoc, C., Nguyen, XQ. & Thinh, T.N. Towards An FPGA-targeted Hardware/Software Co-design Framework for CNN-based Edge Computing. Mobile Netw Appl 27, 2024–2035 (2022). https://doi.org/10.1007/s11036-022-01985-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11036-022-01985-9