Abstract
AI algorithms such as CNNs devices have necessitated the design of lightweight, low-power, and fast hardware in edge processors. In this paper, a floating-point convolution core is proposed for edge processors to implement CNN. At first, the conventional CNN networks were analyzed in terms of the abundance of filter size. Considering the performance and execution time, the focus of the research is on size of 3 × 3. Next, using the proposed 10-input adder instead of nine 2-input adders and modifying the multipliers, an optimum 32-bit 3 × 3 convolution core has been designed. After studying different bit widths in mantissa and the exponent of floating-point numbers in CNNs, 13-bit is considered as the minimum bit width without accuracy losing. The 3 × 3 core with the new bit width is implemented. Since filter sizes 1 × 1 and 5 × 5 are also available in conventional networks, the new scalable architecture is designed to support all three sizes. Finally, the YOLOv4-tiny object detection and GoogLeNet are used as two benchmarks to evaluate the final 3 × 3 scalable core. The results have shown that despite using floating-point calculations, the FPS is equal to 45.9, which is equal to the previous works that were done in fixed-point, while the accuracy of the proposed work is 84% which is similar to the 32-bit floating point.
Similar content being viewed by others
References
Verma, D., et al.: Internet of things (IoT) in nano-integrated wearable biosensor devices for healthcare applications. Biosen. Bioelect: X 11, 100153 (2022)
Wójcicki, K., Biegańska, M., Paliwoda, B., Górna, J.: Internet of things in industry: research profiling, application, challenges and opportunities a review. Energies 15(5), 1806 (2022)
Manojkumar, P., et al.: A novel home automation distributed server management system using Internet of Things. Int. J. Ambient Energy 43(1), 5478–5483 (2022)
Xu, J., Gu, B., Tian, G.: Review of agricultural IoT technology. Art. Intell. Agri. 6, 22 (2022)
Shen, Y.: "Information monitoring of animal husbandry industry based on the internet of things and wireless communication system. Computat Mathemat Meth Med. 2022, 1 (2022)
Mwase, C., Jin, Y., Westerlund, T., Tenhunen, H., Zou, Z.: Communication-efficient distributed AI strategies for the IoT edge. Future. Gen. Comp. Syst. 131, 292 (2022)
Sipola, T., Alatalo, J., Kokkonen, T., Rantonen M.: Artificial intelligence in the IoT era: A review of edge AI hardware and software, In: 2022 31st Conference of Open Innovations Association (FRUCT), (2022), p. 320–331: IEEE.
Surianarayanan, C., Lawrence, J.J., Chelliah, P.R., Prakash, E., Hewage, C.: A survey on optimization techniques for edge artificial intelligence (ai). Sensors 23(3), 1279 (2023)
Tann, H., Zhao, H., Reda, S.: A resource-efficient embedded iris recognition system using fully convolutional networks. ACM J. Emerg. Technol. Comput. Syst. (JETC) 16(1), 1–23 (2019)
Faraone, J., et al.: Addnet: Deep neural networks using fpga-optimized multipliers. Transact. Very. Large. Scale. Integ. Syst. 28, 115–128 (2019)
Gu, M., et al.: A lightweight convolutional neural network hardware implementation for wearable heart rate anomaly detection. Comp. Biol. Med. 11, 106623 (2023)
Nguyen, D.T., Nguyen, T.N., Kim, H., Lee, H.: A high-throughput and power-efficient fpga implementation of YOLO CNN for object detection. IEEE. Trans. Very. Large. Scale. Integ. Syst. 27, 1861–1873 (2019)
Shawahna, A., Sait, S.M., El-Maleh, A.: FPGA-based accelerators of deep learning networks for learning and classification: A review IEEE. Access 7, 7823–7859 (2018)
Lai L., Suda N., Chandra V., Deep convolutional neural network inference with floating-point weights and fixed-point activations. arXiv preprint arXiv (2017).
Higham, N.J., Mary, T.: Mixed precision algorithms in numerical linear algebra. Acta Numer 31, 347–414 (2022)
M. Haselman, M. Beauchamp, A. Wood, S. Hauck, K. Underwood, and K. S. Hemmert, A comparison of floating point and logarithmic number systems for FPGAs, In: 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05), (2005)pp. 181–190: IEEE.
Oksuz, K., Cam, B.C., Kalkan, S., Akbas, E.: Imbalance problems in object detection: A review. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3388–3415 (2020)
C. Szegedy et al., Going deeper with convolutions, In: Proceedings of the IEEE conference on computer vision and pattern recognition, (2015), p. 1–9.
A. Kölsch, M. Z. Afzal, and M. Liwicki, Multilevel context representation for improving object recognition, In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), 2017 5: 10–15: IEEE.
Fang, R., Lu, C.-C., Chuang, C.-T., Chang, W.-H.: A visually interpretable detection method combines 3-D ECG with a multi-VGG neural network for myocardial infarction identification. Comput. Meth. Prog. Biomed. 219, 106762 (2022)
Nan, Y., Ju, J., Hua, Q., Zhang, H., Wang, B.: A-MobileNet: An approach of facial expression recognition. Alex. Eng. J. 61(6), 4435–4444 (2022)
Feng, S., et al.: Fish feeding intensity quantification using machine vision and a lightweight 3D ResNet-GloRe network. Aquacult. Eng. 98, 102244 (2022)
Cheng, X.-R., Cui, B.-J., Hou, S.-Z.: Fault line selection of distribution network based on modified CEEMDAN and GoogLeNet neural network. IEEE Sens. J. 22(13), 13346–13364 (2022)
Camgözlü Y. and Kutlu Y., Analysis of filter size effect in deep learning, arXiv preprint arXiv: 2101.01115, (2020).
Chen, Y.-X., Ruan, S.-J.: A throughput-optimized channel-oriented processing element array for convolutional neural networks. IEEE Trans. Circuits Syst. Express Briefs 68(2), 752–756 (2020)
Y. Ma, Y. Cao, S. Vrudhula, and J.-s. Seo, Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks, In: Proceedings of the 2017 ACM/SIGDA Int. Symp. Field-Prog Gate Arrays (2017), p. 45–54.
Farrukh, F.U.D., et al.: Power efficient tiny yolo cnn using reduced hardware resources based on booth multiplier and wallace tree adders. IEEE Open J. Circ. Syst. 1, 76–87 (2020)
Junaid, M., Arslan, S., Lee, T., Kim, H.: Optimal architecture of floating-point arithmetic for neural network training processors. Sensors 22(3), 1230 (2022)
Song, Q., Zhang, J., Sun, L., Jin, G.: Design and implementation of convolutional neural networks accelerator based on multidie. IEEE Access 10, 91497–91508 (2022)
Pestana, D., et al.: A full featured configurable accelerator for object detection with YOLO. IEEE Access 9, 75864–75877 (2021)
Z. Jiang, L. Zhao, S. Li, and Y. Jia, Real-time object detection method based on improved YOLOv4-tiny, arXiv preprint arXiv:2011.04244, (2020).
Guo, C., Lv, X.-L., Zhang, Y., Zhang, M.-L.: Improved YOLOv4-tiny network for real-time electronic component detection. Sci. Rep. 11(1), 22744 (2021)
Montalbo, F.J.P.: A computer-aided diagnosis of brain tumors using a fine-tuned YOLO-based model with transfer learning. KSII Transact. Int. Inform. Syst. (TIIS) 14(12), 4816–4834 (2020)
Xu, K., et al.: A dedicated hardware accelerator for real-time acceleration of YOLOv2. J. Real-Time Image Process. 18, 481–492 (2021)
Ravindran, R., Santora, M.J., Jamali, M.M.: Multi-object detection and tracking, based on DNN, for autonomous vehicles: A review. IEEE Sens. J. 21(5), 5668–5677 (2020)
Zhang, Chi, and Viktor Prasanna. Frequency domain acceleration of convolutional neural networks on CPU-FPGA shared memory system. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, (2017), p. 35–44.
Author information
Authors and Affiliations
Contributions
Mr. Daryanvard presented the main ideas of the paper, and Mr. Shafiei did the simulations. Mr. Daryanavard wrote the paper. Edited by Mr. Hatem. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our research work.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Shafiei, M., Daryanavard, H. & Hatam, A. Scalable and custom-precision floating-point hardware convolution core for using in AI edge processors. J Real-Time Image Proc 20, 94 (2023). https://doi.org/10.1007/s11554-023-01352-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11554-023-01352-1