Scalable and custom-precision floating-point hardware convolution core for using in AI edge processors

Shafiei, Mahdi; Daryanavard, Hassan; Hatam, Ahmad

doi:10.1007/s11554-023-01352-1

Scalable and custom-precision floating-point hardware convolution core for using in AI edge processors

Research
Published: 10 August 2023

Volume 20, article number 94, (2023)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

201 Accesses
1 Citation
Explore all metrics

Abstract

AI algorithms such as CNNs devices have necessitated the design of lightweight, low-power, and fast hardware in edge processors. In this paper, a floating-point convolution core is proposed for edge processors to implement CNN. At first, the conventional CNN networks were analyzed in terms of the abundance of filter size. Considering the performance and execution time, the focus of the research is on size of 3 × 3. Next, using the proposed 10-input adder instead of nine 2-input adders and modifying the multipliers, an optimum 32-bit 3 × 3 convolution core has been designed. After studying different bit widths in mantissa and the exponent of floating-point numbers in CNNs, 13-bit is considered as the minimum bit width without accuracy losing. The 3 × 3 core with the new bit width is implemented. Since filter sizes 1 × 1 and 5 × 5 are also available in conventional networks, the new scalable architecture is designed to support all three sizes. Finally, the YOLOv4-tiny object detection and GoogLeNet are used as two benchmarks to evaluate the final 3 × 3 scalable core. The results have shown that despite using floating-point calculations, the FPS is equal to 45.9, which is equal to the previous works that were done in fixed-point, while the accuracy of the proposed work is 84% which is similar to the 32-bit floating point.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 8

A novel multiplier-less convolution core for YOLO CNN ASIC implementation

Article 04 March 2024

Towards An FPGA-targeted Hardware/Software Co-design Framework for CNN-based Edge Computing

Article 14 May 2022

CNNX: A Low Cost, CNN Accelerator for Embedded System in Vision at Edge

Article 21 May 2022

References

Verma, D., et al.: Internet of things (IoT) in nano-integrated wearable biosensor devices for healthcare applications. Biosen. Bioelect: X 11, 100153 (2022)
Google Scholar
Wójcicki, K., Biegańska, M., Paliwoda, B., Górna, J.: Internet of things in industry: research profiling, application, challenges and opportunities a review. Energies 15(5), 1806 (2022)
Article Google Scholar
Manojkumar, P., et al.: A novel home automation distributed server management system using Internet of Things. Int. J. Ambient Energy 43(1), 5478–5483 (2022)
Article Google Scholar
Xu, J., Gu, B., Tian, G.: Review of agricultural IoT technology. Art. Intell. Agri. 6, 22 (2022)
Google Scholar
Shen, Y.: "Information monitoring of animal husbandry industry based on the internet of things and wireless communication system. Computat Mathemat Meth Med. 2022, 1 (2022)
Google Scholar
Mwase, C., Jin, Y., Westerlund, T., Tenhunen, H., Zou, Z.: Communication-efficient distributed AI strategies for the IoT edge. Future. Gen. Comp. Syst. 131, 292 (2022)
Article Google Scholar
Sipola, T., Alatalo, J., Kokkonen, T., Rantonen M.: Artificial intelligence in the IoT era: A review of edge AI hardware and software, In: 2022 31st Conference of Open Innovations Association (FRUCT), (2022), p. 320–331: IEEE.
Surianarayanan, C., Lawrence, J.J., Chelliah, P.R., Prakash, E., Hewage, C.: A survey on optimization techniques for edge artificial intelligence (ai). Sensors 23(3), 1279 (2023)
Article Google Scholar
Tann, H., Zhao, H., Reda, S.: A resource-efficient embedded iris recognition system using fully convolutional networks. ACM J. Emerg. Technol. Comput. Syst. (JETC) 16(1), 1–23 (2019)
Google Scholar
Faraone, J., et al.: Addnet: Deep neural networks using fpga-optimized multipliers. Transact. Very. Large. Scale. Integ. Syst. 28, 115–128 (2019)
Article Google Scholar
Gu, M., et al.: A lightweight convolutional neural network hardware implementation for wearable heart rate anomaly detection. Comp. Biol. Med. 11, 106623 (2023)
Article Google Scholar
Nguyen, D.T., Nguyen, T.N., Kim, H., Lee, H.: A high-throughput and power-efficient fpga implementation of YOLO CNN for object detection. IEEE. Trans. Very. Large. Scale. Integ. Syst. 27, 1861–1873 (2019)
Article Google Scholar
Shawahna, A., Sait, S.M., El-Maleh, A.: FPGA-based accelerators of deep learning networks for learning and classification: A review IEEE. Access 7, 7823–7859 (2018)
Article Google Scholar
Lai L., Suda N., Chandra V., Deep convolutional neural network inference with floating-point weights and fixed-point activations. arXiv preprint arXiv (2017).
Higham, N.J., Mary, T.: Mixed precision algorithms in numerical linear algebra. Acta Numer 31, 347–414 (2022)
Article MathSciNet MATH Google Scholar
M. Haselman, M. Beauchamp, A. Wood, S. Hauck, K. Underwood, and K. S. Hemmert, A comparison of floating point and logarithmic number systems for FPGAs, In: 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05), (2005)pp. 181–190: IEEE.
Oksuz, K., Cam, B.C., Kalkan, S., Akbas, E.: Imbalance problems in object detection: A review. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3388–3415 (2020)
Article Google Scholar
C. Szegedy et al., Going deeper with convolutions, In: Proceedings of the IEEE conference on computer vision and pattern recognition, (2015), p. 1–9.
A. Kölsch, M. Z. Afzal, and M. Liwicki, Multilevel context representation for improving object recognition, In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), 2017 5: 10–15: IEEE.
Fang, R., Lu, C.-C., Chuang, C.-T., Chang, W.-H.: A visually interpretable detection method combines 3-D ECG with a multi-VGG neural network for myocardial infarction identification. Comput. Meth. Prog. Biomed. 219, 106762 (2022)
Article Google Scholar
Nan, Y., Ju, J., Hua, Q., Zhang, H., Wang, B.: A-MobileNet: An approach of facial expression recognition. Alex. Eng. J. 61(6), 4435–4444 (2022)
Article Google Scholar
Feng, S., et al.: Fish feeding intensity quantification using machine vision and a lightweight 3D ResNet-GloRe network. Aquacult. Eng. 98, 102244 (2022)
Article Google Scholar
Cheng, X.-R., Cui, B.-J., Hou, S.-Z.: Fault line selection of distribution network based on modified CEEMDAN and GoogLeNet neural network. IEEE Sens. J. 22(13), 13346–13364 (2022)
Article Google Scholar
Camgözlü Y. and Kutlu Y., Analysis of filter size effect in deep learning, arXiv preprint arXiv: 2101.01115, (2020).
Chen, Y.-X., Ruan, S.-J.: A throughput-optimized channel-oriented processing element array for convolutional neural networks. IEEE Trans. Circuits Syst. Express Briefs 68(2), 752–756 (2020)
Article Google Scholar
Y. Ma, Y. Cao, S. Vrudhula, and J.-s. Seo, Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks, In: Proceedings of the 2017 ACM/SIGDA Int. Symp. Field-Prog Gate Arrays (2017), p. 45–54.
Farrukh, F.U.D., et al.: Power efficient tiny yolo cnn using reduced hardware resources based on booth multiplier and wallace tree adders. IEEE Open J. Circ. Syst. 1, 76–87 (2020)
Article Google Scholar
Junaid, M., Arslan, S., Lee, T., Kim, H.: Optimal architecture of floating-point arithmetic for neural network training processors. Sensors 22(3), 1230 (2022)
Article Google Scholar
Song, Q., Zhang, J., Sun, L., Jin, G.: Design and implementation of convolutional neural networks accelerator based on multidie. IEEE Access 10, 91497–91508 (2022)
Article Google Scholar
Pestana, D., et al.: A full featured configurable accelerator for object detection with YOLO. IEEE Access 9, 75864–75877 (2021)
Article Google Scholar
Z. Jiang, L. Zhao, S. Li, and Y. Jia, Real-time object detection method based on improved YOLOv4-tiny, arXiv preprint arXiv:2011.04244, (2020).
Guo, C., Lv, X.-L., Zhang, Y., Zhang, M.-L.: Improved YOLOv4-tiny network for real-time electronic component detection. Sci. Rep. 11(1), 22744 (2021)
Article Google Scholar
Montalbo, F.J.P.: A computer-aided diagnosis of brain tumors using a fine-tuned YOLO-based model with transfer learning. KSII Transact. Int. Inform. Syst. (TIIS) 14(12), 4816–4834 (2020)
Google Scholar
Xu, K., et al.: A dedicated hardware accelerator for real-time acceleration of YOLOv2. J. Real-Time Image Process. 18, 481–492 (2021)
Article Google Scholar
Ravindran, R., Santora, M.J., Jamali, M.M.: Multi-object detection and tracking, based on DNN, for autonomous vehicles: A review. IEEE Sens. J. 21(5), 5668–5677 (2020)
Article Google Scholar
Zhang, Chi, and Viktor Prasanna. Frequency domain acceleration of convolutional neural networks on CPU-FPGA shared memory system. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, (2017), p. 35–44.

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Hormozgan, Bandar Abbas, Iran
Mahdi Shafiei, Hassan Daryanavard & Ahmad Hatam

Authors

Mahdi Shafiei
View author publications
You can also search for this author in PubMed Google Scholar
Hassan Daryanavard
View author publications
You can also search for this author in PubMed Google Scholar
Ahmad Hatam
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Mr. Daryanvard presented the main ideas of the paper, and Mr. Shafiei did the simulations. Mr. Daryanavard wrote the paper. Edited by Mr. Hatem. All authors reviewed the manuscript.

Corresponding author

Correspondence to Hassan Daryanavard.

Ethics declarations

Conflict of interest

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our research work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Shafiei, M., Daryanavard, H. & Hatam, A. Scalable and custom-precision floating-point hardware convolution core for using in AI edge processors. J Real-Time Image Proc 20, 94 (2023). https://doi.org/10.1007/s11554-023-01352-1

Download citation

Received: 30 May 2023
Accepted: 29 July 2023
Published: 10 August 2023
DOI: https://doi.org/10.1007/s11554-023-01352-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scalable and custom-precision floating-point hardware convolution core for using in AI edge processors

Abstract

Access this article

Similar content being viewed by others

A novel multiplier-less convolution core for YOLO CNN ASIC implementation

Towards An FPGA-targeted Hardware/Software Co-design Framework for CNN-based Edge Computing

CNNX: A Low Cost, CNN Accelerator for Embedded System in Vision at Edge

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Scalable and custom-precision floating-point hardware convolution core for using in AI edge processors

Abstract

Access this article

Similar content being viewed by others

A novel multiplier-less convolution core for YOLO CNN ASIC implementation

Towards An FPGA-targeted Hardware/Software Co-design Framework for CNN-based Edge Computing

CNNX: A Low Cost, CNN Accelerator for Embedded System in Vision at Edge

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation