Efficient memory reuse methodology for CNN-based real-time image processing in mobile-embedded systems

Zhao, Kairong; Chang, Yinghui; Wu, Weikang; Luo, Hongyin; Li, Zirun; He, Shan; Guo, Donghui

doi:10.1007/s11554-023-01375-8

Efficient memory reuse methodology for CNN-based real-time image processing in mobile-embedded systems

Research
Published: 21 October 2023

Volume 20, article number 118, (2023)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Kairong Zhao¹,
Yinghui Chang²,
Weikang Wu²,
Hongyin Luo¹,
Zirun Li¹,
Shan He¹ &
…
Donghui Guo¹

218 Accesses
Explore all metrics

Abstract

Real-time image processing applications such as intelligent security and traffic management requires pattern recognition tasks, such face recognition, and license plate detection, to execute in mobile-embedded systems. These mobile-embedded applications employ the deep neural network (DNN), especially convolutional neural network (CNN), to complete the image classification. However, deploying CNN models on embedded platforms is challenging as memory-costly CNNs are in conflict with the highly limited memory budget. To address this challenge, a variety of CNN memory reduction methodologies have been proposed. Among these methodologies, CNN memory reuse has no influence on accuracy and throughput of CNN and is easy to realize, which is most suitable for embedded application. However, the existing memory reuse algorithms cannot achieve stable optimal solution. To solve the problem, we first improve an existing memory reuse algorithm. Compared with its original version, the improved algorithm provides 7–25% less memory consumption of intermediate results. We further propose a novel CNN memory reuse algorithm. In the new algorithm, we significantly make use of CNN structure to reuse memory and obtain optimal solution at most cases. Compared with two existing memory reuse algorithms, the new algorithm can reduce the memory footprint by an average of 20.3% and 9.4%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hybrid Domain Convolutional Neural Network for Memory Efficient Training

Redundancy-Reduced MobileNet Acceleration on Reconfigurable Logic for ImageNet Classification

MAFAT: Memory-Aware Fusing and Tiling of Neural Networks for Accelerated Edge Inference

Data availability

The data that support the findings of this study are available from the first author and corresponding author, upon reasonable request.

References

https://www.raspberrypi.com/products/raspberry-pi-4-model-b/
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., et al.: Tensorflow: Large-scale Machine Learning on Heterogeneous Distributed Systems. arXiv preprint arXiv:1603.04467 (2016)
Abadi, M., Isard, M., Murray, D.G.: A computational model for tensorflow: an introduction. In: Proceedings of the 1st ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, pp. 1–7 (2017)
Cheng, Y., Wang, D., Zhou, P., Zhang, T.: A Survey of Model Compression and Acceleration for Deep Neural Networks. arXiv preprint arXiv:1710.09282 (2017)
Courbariaux, M., Bengio, Y., David, J.P.: Binaryconnect: training deep neural networks with binary weights during propagations. Adv. Neural Inf. Process. Syst. 28 (2015)
David, R., Duke, J., Jain, A., Janapa Reddi, V., Jeffries, N., Li, J., Kreeger, N., Nappier, I., Natraj, M., Wang, T., et al.: Tensorflow lite micro: embedded machine learning for tinyml systems. Proc. Mach. Learn. Syst. 3, 800–811 (2021)
Google Scholar
Denil, M., Shakibi, B., Dinh, L., Ranzato, M., De Freitas, N.: Predicting parameters in deep learning. Adv. Neural Inf. Process. Syst. 26 (2013)
Dewangan, D.K., Sahu, S.P.: Optimized convolutional neural network for road detection with structured contour and spatial information for intelligent vehicle system. Int. J. Pattern Recognit. Artif. Intell. 36(06), 2252002 (2022)
Article Google Scholar
Dewangan, D.K., Sahu, S.P.: Lane detection in intelligent vehicle system using optimal 2-tier deep convolutional neural network. Multimed. Tools Appl. 82(5), 7293–7317 (2023)
Article Google Scholar
Dewangan, D.K., Sahu, S.P.: Towards the design of vision-based intelligent vehicle system: methodologies and challenges. Evol. Intel. 16(3), 759–800 (2023)
Article Google Scholar
Han, S., Mao, H., Dally, W.J.: Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv preprint arXiv:1510.00149 (2015)
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv preprint arXiv:1704.04861 (2017)
Huang, C.C., Jin, G., Li, J.: Swapadvisor: pushing deep learning beyond the gpu memory limit via smart swapping. In: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 1341–1355 (2020)
Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: Alexnet-Level Accuracy with 50x Fewer Parameters and< 0.5 mb Model Size. arXiv preprint arXiv:1602.07360 (2016)
Ji, C., Wu, F., Zhu, Z., Chang, L.P., Liu, H., Zhai, W.: Memory-efficient deep learning inference with incremental weight loading and data layout reorganization on edge systems. J. Syst. Architect. 118, 102183 (2021)
Article Google Scholar
Jiang, X., Wang, H., Chen, Y., Wu, Z., Wang, L., Zou, B., Yang, Y., Cui, Z., Cai, Y., Yu, T., et al.: Mnn: a universal and efficient inference engine. Proc. Mach. Learn. Syst. 2, 1–13 (2020)
Google Scholar
Jokic, P., Emery, S., Benini, L.: Improving memory utilization in convolutional neural network accelerators. IEEE Embed. Syst. Lett. 13(3), 77–80 (2020)
Article Google Scholar
Kleyko, D., Davies, M., Frady, E.P., Kanerva, P., Kent, S.J., Olshausen, B.A., Osipov, E., Rabaey, J.M., Rachkovskij, D.A., Rahimi, A., et al.: Vector symbolic architectures as a computing framework for emerging hardware. Proc. IEEE 110(10), 1538–1571 (2022)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Article Google Scholar
Lee, J., Chirkov, N., Ignasheva, E., Pisarchyk, Y., Shieh, M., Riccardi, F., Sarokin, R., Kulik, A., Grundmann, M.: On-device Neural Net Inference with Mobile gpus. arXiv preprint arXiv:1907.01989 (2019)
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part V 13, pp. 740–755. Springer (2014)
Liu, D., Kong, H., Luo, X., Liu, W., Subramaniam, R.: Bringing AI to edge: from deep learning’s perspective. Neurocomputing (2021)
Liu, S., Fan, H., Niu, X., Ng, H.C., Chu, Y., Luk, W.: Optimizing cnn-based segmentation with deeply customized convolutional and deconvolutional architectures on fpga. ACM Trans. Reconfigurable Technol. Syst. (TRETS) 11(3), 1–22 (2018)
Article Google Scholar
Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., Zhang, C.: Learning efficient convolutional networks through network slimming. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2736–2744 (2017)
Nagel, M., Fournarakis, M., Amjad, R.A., Bondarenko, Y., van Baalen, M., Blankevoort, T.: A White Paper on Neural Network Quantization. arXiv preprint arXiv:2106.08295 (2021)
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32 (2019)
Pisarchyk, Y., Lee, J.: Efficient Memory Management for Deep Neural Net Inference. arXiv preprint arXiv:2001.03288 (2020)
Sekiyama, T., Imamichi, T., Imai, H., Raymond, R.: Profile-Guided Memory Optimization for Deep Neural Networks. arXiv preprint arXiv:1804.10001 (2018)
Waeijen, L., Sioutas, S., Peemen, M., Lindwer, M., Corporaal, H.: Convfusion: a model for layer fusion in convolutional neural networks. IEEE Access 9, 168245–168267 (2021)
Article Google Scholar
Wahib, M., Zhang, H., Nguyen, T.T., Drozd, A., Domke, J., Zhang, L., Takano, R., Matsuoka, S.: Scaling distributed deep learning workloads beyond the memory capacity with karma. In: SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–15. IEEE (2020)
Wang, L., Ye, J., Zhao, Y., Wu, W., Li, A., Song, S.L., Xu, Z., Kraska, T.: Superneurons: Dynamic gpu memory management for training deep neural networks. In: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 41–53 (2018)
Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018)
Zhao, J., Gao, X., Xia, R., Zhang, Z., Chen, D., Chen, L., Zhang, R., Geng, Z., Cheng, B., Jin, X.: Apollo: automatic partition-based operator fusion through layer by layer optimization. Proc. Mach. Learn. Syst. 4, 1–19 (2022)
Google Scholar

Download references

Acknowledgements

This work is supported by key special projects of National Key R &D plan under Grant no. 2019YFB2204600.

Author information

Authors and Affiliations

R&D Center of Integrated Circuit, School of Electronic Science and Engineering (National Model Microelectronics College), Xiamen University, Xiamen, 361005, China
Kairong Zhao, Hongyin Luo, Zirun Li, Shan He & Donghui Guo
China Academy for Network & Communications of CETC, Shijiazhuang, 050050, China
Yinghui Chang & Weikang Wu

Authors

Kairong Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Yinghui Chang
View author publications
You can also search for this author in PubMed Google Scholar
Weikang Wu
View author publications
You can also search for this author in PubMed Google Scholar
Hongyin Luo
View author publications
You can also search for this author in PubMed Google Scholar
Zirun Li
View author publications
You can also search for this author in PubMed Google Scholar
Shan He
View author publications
You can also search for this author in PubMed Google Scholar
Donghui Guo
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

KZ completes design and proof of the proposed algorithm, collection of experimental data and paper writing. YC and WW support experimental equipments. HL, ZL and SH provide guidance. DG is the corresponding author.

Corresponding author

Correspondence to Donghui Guo.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhao, K., Chang, Y., Wu, W. et al. Efficient memory reuse methodology for CNN-based real-time image processing in mobile-embedded systems. J Real-Time Image Proc 20, 118 (2023). https://doi.org/10.1007/s11554-023-01375-8

Download citation

Received: 10 August 2023
Accepted: 29 September 2023
Published: 21 October 2023
DOI: https://doi.org/10.1007/s11554-023-01375-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient memory reuse methodology for CNN-based real-time image processing in mobile-embedded systems

Abstract

Access this article

Similar content being viewed by others

Hybrid Domain Convolutional Neural Network for Memory Efficient Training

Redundancy-Reduced MobileNet Acceleration on Reconfigurable Logic for ImageNet Classification

MAFAT: Memory-Aware Fusing and Tiling of Neural Networks for Accelerated Edge Inference

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient memory reuse methodology for CNN-based real-time image processing in mobile-embedded systems

Abstract

Access this article

Similar content being viewed by others

Hybrid Domain Convolutional Neural Network for Memory Efficient Training

Redundancy-Reduced MobileNet Acceleration on Reconfigurable Logic for ImageNet Classification

MAFAT: Memory-Aware Fusing and Tiling of Neural Networks for Accelerated Edge Inference

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation