Skip to main content
Log in

Scalable and custom-precision floating-point hardware convolution core for using in AI edge processors

  • Research
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

AI algorithms such as CNNs devices have necessitated the design of lightweight, low-power, and fast hardware in edge processors. In this paper, a floating-point convolution core is proposed for edge processors to implement CNN. At first, the conventional CNN networks were analyzed in terms of the abundance of filter size. Considering the performance and execution time, the focus of the research is on size of 3 × 3. Next, using the proposed 10-input adder instead of nine 2-input adders and modifying the multipliers, an optimum 32-bit 3 × 3 convolution core has been designed. After studying different bit widths in mantissa and the exponent of floating-point numbers in CNNs, 13-bit is considered as the minimum bit width without accuracy losing. The 3 × 3 core with the new bit width is implemented. Since filter sizes 1 × 1 and 5 × 5 are also available in conventional networks, the new scalable architecture is designed to support all three sizes. Finally, the YOLOv4-tiny object detection and GoogLeNet are used as two benchmarks to evaluate the final 3 × 3 scalable core. The results have shown that despite using floating-point calculations, the FPS is equal to 45.9, which is equal to the previous works that were done in fixed-point, while the accuracy of the proposed work is 84% which is similar to the 32-bit floating point.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig.10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. Verma, D., et al.: Internet of things (IoT) in nano-integrated wearable biosensor devices for healthcare applications. Biosen. Bioelect: X 11, 100153 (2022)

    Google Scholar 

  2. Wójcicki, K., Biegańska, M., Paliwoda, B., Górna, J.: Internet of things in industry: research profiling, application, challenges and opportunities a review. Energies 15(5), 1806 (2022)

    Article  Google Scholar 

  3. Manojkumar, P., et al.: A novel home automation distributed server management system using Internet of Things. Int. J. Ambient Energy 43(1), 5478–5483 (2022)

    Article  Google Scholar 

  4. Xu, J., Gu, B., Tian, G.: Review of agricultural IoT technology. Art. Intell. Agri. 6, 22 (2022)

    Google Scholar 

  5. Shen, Y.: "Information monitoring of animal husbandry industry based on the internet of things and wireless communication system. Computat Mathemat Meth Med. 2022, 1 (2022)

    Google Scholar 

  6. Mwase, C., Jin, Y., Westerlund, T., Tenhunen, H., Zou, Z.: Communication-efficient distributed AI strategies for the IoT edge. Future. Gen. Comp. Syst. 131, 292 (2022)

    Article  Google Scholar 

  7. Sipola, T., Alatalo, J., Kokkonen, T., Rantonen M.: Artificial intelligence in the IoT era: A review of edge AI hardware and software, In: 2022 31st Conference of Open Innovations Association (FRUCT), (2022), p. 320–331: IEEE.

  8. Surianarayanan, C., Lawrence, J.J., Chelliah, P.R., Prakash, E., Hewage, C.: A survey on optimization techniques for edge artificial intelligence (ai). Sensors 23(3), 1279 (2023)

    Article  Google Scholar 

  9. Tann, H., Zhao, H., Reda, S.: A resource-efficient embedded iris recognition system using fully convolutional networks. ACM J. Emerg. Technol. Comput. Syst. (JETC) 16(1), 1–23 (2019)

    Google Scholar 

  10. Faraone, J., et al.: Addnet: Deep neural networks using fpga-optimized multipliers. Transact. Very. Large. Scale. Integ. Syst. 28, 115–128 (2019)

    Article  Google Scholar 

  11. Gu, M., et al.: A lightweight convolutional neural network hardware implementation for wearable heart rate anomaly detection. Comp. Biol. Med. 11, 106623 (2023)

    Article  Google Scholar 

  12. Nguyen, D.T., Nguyen, T.N., Kim, H., Lee, H.: A high-throughput and power-efficient fpga implementation of YOLO CNN for object detection. IEEE. Trans. Very. Large. Scale. Integ. Syst. 27, 1861–1873 (2019)

    Article  Google Scholar 

  13. Shawahna, A., Sait, S.M., El-Maleh, A.: FPGA-based accelerators of deep learning networks for learning and classification: A review IEEE. Access 7, 7823–7859 (2018)

    Article  Google Scholar 

  14. Lai L., Suda N., Chandra V., Deep convolutional neural network inference with floating-point weights and fixed-point activations. arXiv preprint arXiv (2017).

  15. Higham, N.J., Mary, T.: Mixed precision algorithms in numerical linear algebra. Acta Numer 31, 347–414 (2022)

    Article  MathSciNet  MATH  Google Scholar 

  16. M. Haselman, M. Beauchamp, A. Wood, S. Hauck, K. Underwood, and K. S. Hemmert, A comparison of floating point and logarithmic number systems for FPGAs, In: 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05), (2005)pp. 181–190: IEEE.

  17. Oksuz, K., Cam, B.C., Kalkan, S., Akbas, E.: Imbalance problems in object detection: A review. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3388–3415 (2020)

    Article  Google Scholar 

  18. C. Szegedy et al., Going deeper with convolutions, In: Proceedings of the IEEE conference on computer vision and pattern recognition, (2015), p. 1–9.

  19. A. Kölsch, M. Z. Afzal, and M. Liwicki, Multilevel context representation for improving object recognition, In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), 2017 5: 10–15: IEEE.

  20. Fang, R., Lu, C.-C., Chuang, C.-T., Chang, W.-H.: A visually interpretable detection method combines 3-D ECG with a multi-VGG neural network for myocardial infarction identification. Comput. Meth. Prog. Biomed. 219, 106762 (2022)

    Article  Google Scholar 

  21. Nan, Y., Ju, J., Hua, Q., Zhang, H., Wang, B.: A-MobileNet: An approach of facial expression recognition. Alex. Eng. J. 61(6), 4435–4444 (2022)

    Article  Google Scholar 

  22. Feng, S., et al.: Fish feeding intensity quantification using machine vision and a lightweight 3D ResNet-GloRe network. Aquacult. Eng. 98, 102244 (2022)

    Article  Google Scholar 

  23. Cheng, X.-R., Cui, B.-J., Hou, S.-Z.: Fault line selection of distribution network based on modified CEEMDAN and GoogLeNet neural network. IEEE Sens. J. 22(13), 13346–13364 (2022)

    Article  Google Scholar 

  24. Camgözlü Y. and Kutlu Y., Analysis of filter size effect in deep learning, arXiv preprint arXiv: 2101.01115, (2020).

  25. Chen, Y.-X., Ruan, S.-J.: A throughput-optimized channel-oriented processing element array for convolutional neural networks. IEEE Trans. Circuits Syst. Express Briefs 68(2), 752–756 (2020)

    Article  Google Scholar 

  26. Y. Ma, Y. Cao, S. Vrudhula, and J.-s. Seo, Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks, In: Proceedings of the 2017 ACM/SIGDA Int. Symp. Field-Prog Gate Arrays (2017), p. 45–54.

  27. Farrukh, F.U.D., et al.: Power efficient tiny yolo cnn using reduced hardware resources based on booth multiplier and wallace tree adders. IEEE Open J. Circ. Syst. 1, 76–87 (2020)

    Article  Google Scholar 

  28. Junaid, M., Arslan, S., Lee, T., Kim, H.: Optimal architecture of floating-point arithmetic for neural network training processors. Sensors 22(3), 1230 (2022)

    Article  Google Scholar 

  29. Song, Q., Zhang, J., Sun, L., Jin, G.: Design and implementation of convolutional neural networks accelerator based on multidie. IEEE Access 10, 91497–91508 (2022)

    Article  Google Scholar 

  30. Pestana, D., et al.: A full featured configurable accelerator for object detection with YOLO. IEEE Access 9, 75864–75877 (2021)

    Article  Google Scholar 

  31. Z. Jiang, L. Zhao, S. Li, and Y. Jia, Real-time object detection method based on improved YOLOv4-tiny, arXiv preprint arXiv:2011.04244, (2020).

  32. Guo, C., Lv, X.-L., Zhang, Y., Zhang, M.-L.: Improved YOLOv4-tiny network for real-time electronic component detection. Sci. Rep. 11(1), 22744 (2021)

    Article  Google Scholar 

  33. Montalbo, F.J.P.: A computer-aided diagnosis of brain tumors using a fine-tuned YOLO-based model with transfer learning. KSII Transact. Int. Inform. Syst. (TIIS) 14(12), 4816–4834 (2020)

    Google Scholar 

  34. Xu, K., et al.: A dedicated hardware accelerator for real-time acceleration of YOLOv2. J. Real-Time Image Process. 18, 481–492 (2021)

    Article  Google Scholar 

  35. Ravindran, R., Santora, M.J., Jamali, M.M.: Multi-object detection and tracking, based on DNN, for autonomous vehicles: A review. IEEE Sens. J. 21(5), 5668–5677 (2020)

    Article  Google Scholar 

  36. Zhang, Chi, and Viktor Prasanna. Frequency domain acceleration of convolutional neural networks on CPU-FPGA shared memory system. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, (2017), p. 35–44.

Download references

Author information

Authors and Affiliations

Authors

Contributions

Mr. Daryanvard presented the main ideas of the paper, and Mr. Shafiei did the simulations. Mr. Daryanavard wrote the paper. Edited by Mr. Hatem. All authors reviewed the manuscript.

Corresponding author

Correspondence to Hassan Daryanavard.

Ethics declarations

Conflict of interest

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our research work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shafiei, M., Daryanavard, H. & Hatam, A. Scalable and custom-precision floating-point hardware convolution core for using in AI edge processors. J Real-Time Image Proc 20, 94 (2023). https://doi.org/10.1007/s11554-023-01352-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11554-023-01352-1

Keywords

Navigation