Skip to main content
Log in

Efficient memory reuse methodology for CNN-based real-time image processing in mobile-embedded systems

  • Research
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

Real-time image processing applications such as intelligent security and traffic management requires pattern recognition tasks, such face recognition, and license plate detection, to execute in mobile-embedded systems. These mobile-embedded applications employ the deep neural network (DNN), especially convolutional neural network (CNN), to complete the image classification. However, deploying CNN models on embedded platforms is challenging as memory-costly CNNs are in conflict with the highly limited memory budget. To address this challenge, a variety of CNN memory reduction methodologies have been proposed. Among these methodologies, CNN memory reuse has no influence on accuracy and throughput of CNN and is easy to realize, which is most suitable for embedded application. However, the existing memory reuse algorithms cannot achieve stable optimal solution. To solve the problem, we first improve an existing memory reuse algorithm. Compared with its original version, the improved algorithm provides 7–25% less memory consumption of intermediate results. We further propose a novel CNN memory reuse algorithm. In the new algorithm, we significantly make use of CNN structure to reuse memory and obtain optimal solution at most cases. Compared with two existing memory reuse algorithms, the new algorithm can reduce the memory footprint by an average of 20.3% and 9.4%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

The data that support the findings of this study are available from the first author and corresponding author, upon reasonable request.

References

  1. https://www.raspberrypi.com/products/raspberry-pi-4-model-b/

  2. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., et al.: Tensorflow: Large-scale Machine Learning on Heterogeneous Distributed Systems. arXiv preprint arXiv:1603.04467 (2016)

  3. Abadi, M., Isard, M., Murray, D.G.: A computational model for tensorflow: an introduction. In: Proceedings of the 1st ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, pp. 1–7 (2017)

  4. Cheng, Y., Wang, D., Zhou, P., Zhang, T.: A Survey of Model Compression and Acceleration for Deep Neural Networks. arXiv preprint arXiv:1710.09282 (2017)

  5. Courbariaux, M., Bengio, Y., David, J.P.: Binaryconnect: training deep neural networks with binary weights during propagations. Adv. Neural Inf. Process. Syst. 28 (2015)

  6. David, R., Duke, J., Jain, A., Janapa Reddi, V., Jeffries, N., Li, J., Kreeger, N., Nappier, I., Natraj, M., Wang, T., et al.: Tensorflow lite micro: embedded machine learning for tinyml systems. Proc. Mach. Learn. Syst. 3, 800–811 (2021)

    Google Scholar 

  7. Denil, M., Shakibi, B., Dinh, L., Ranzato, M., De Freitas, N.: Predicting parameters in deep learning. Adv. Neural Inf. Process. Syst. 26 (2013)

  8. Dewangan, D.K., Sahu, S.P.: Optimized convolutional neural network for road detection with structured contour and spatial information for intelligent vehicle system. Int. J. Pattern Recognit. Artif. Intell. 36(06), 2252002 (2022)

    Article  Google Scholar 

  9. Dewangan, D.K., Sahu, S.P.: Lane detection in intelligent vehicle system using optimal 2-tier deep convolutional neural network. Multimed. Tools Appl. 82(5), 7293–7317 (2023)

    Article  Google Scholar 

  10. Dewangan, D.K., Sahu, S.P.: Towards the design of vision-based intelligent vehicle system: methodologies and challenges. Evol. Intel. 16(3), 759–800 (2023)

    Article  Google Scholar 

  11. Han, S., Mao, H., Dally, W.J.: Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv preprint arXiv:1510.00149 (2015)

  12. Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv preprint arXiv:1704.04861 (2017)

  13. Huang, C.C., Jin, G., Li, J.: Swapadvisor: pushing deep learning beyond the gpu memory limit via smart swapping. In: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 1341–1355 (2020)

  14. Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: Alexnet-Level Accuracy with 50x Fewer Parameters and< 0.5 mb Model Size. arXiv preprint arXiv:1602.07360 (2016)

  15. Ji, C., Wu, F., Zhu, Z., Chang, L.P., Liu, H., Zhai, W.: Memory-efficient deep learning inference with incremental weight loading and data layout reorganization on edge systems. J. Syst. Architect. 118, 102183 (2021)

    Article  Google Scholar 

  16. Jiang, X., Wang, H., Chen, Y., Wu, Z., Wang, L., Zou, B., Yang, Y., Cui, Z., Cai, Y., Yu, T., et al.: Mnn: a universal and efficient inference engine. Proc. Mach. Learn. Syst. 2, 1–13 (2020)

    Google Scholar 

  17. Jokic, P., Emery, S., Benini, L.: Improving memory utilization in convolutional neural network accelerators. IEEE Embed. Syst. Lett. 13(3), 77–80 (2020)

    Article  Google Scholar 

  18. Kleyko, D., Davies, M., Frady, E.P., Kanerva, P., Kent, S.J., Olshausen, B.A., Osipov, E., Rabaey, J.M., Rachkovskij, D.A., Rahimi, A., et al.: Vector symbolic architectures as a computing framework for emerging hardware. Proc. IEEE 110(10), 1538–1571 (2022)

    Article  Google Scholar 

  19. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)

    Article  Google Scholar 

  20. Lee, J., Chirkov, N., Ignasheva, E., Pisarchyk, Y., Shieh, M., Riccardi, F., Sarokin, R., Kulik, A., Grundmann, M.: On-device Neural Net Inference with Mobile gpus. arXiv preprint arXiv:1907.01989 (2019)

  21. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part V 13, pp. 740–755. Springer (2014)

  22. Liu, D., Kong, H., Luo, X., Liu, W., Subramaniam, R.: Bringing AI to edge: from deep learning’s perspective. Neurocomputing (2021)

  23. Liu, S., Fan, H., Niu, X., Ng, H.C., Chu, Y., Luk, W.: Optimizing cnn-based segmentation with deeply customized convolutional and deconvolutional architectures on fpga. ACM Trans. Reconfigurable Technol. Syst. (TRETS) 11(3), 1–22 (2018)

    Article  Google Scholar 

  24. Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., Zhang, C.: Learning efficient convolutional networks through network slimming. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2736–2744 (2017)

  25. Nagel, M., Fournarakis, M., Amjad, R.A., Bondarenko, Y., van Baalen, M., Blankevoort, T.: A White Paper on Neural Network Quantization. arXiv preprint arXiv:2106.08295 (2021)

  26. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32 (2019)

  27. Pisarchyk, Y., Lee, J.: Efficient Memory Management for Deep Neural Net Inference. arXiv preprint arXiv:2001.03288 (2020)

  28. Sekiyama, T., Imamichi, T., Imai, H., Raymond, R.: Profile-Guided Memory Optimization for Deep Neural Networks. arXiv preprint arXiv:1804.10001 (2018)

  29. Waeijen, L., Sioutas, S., Peemen, M., Lindwer, M., Corporaal, H.: Convfusion: a model for layer fusion in convolutional neural networks. IEEE Access 9, 168245–168267 (2021)

    Article  Google Scholar 

  30. Wahib, M., Zhang, H., Nguyen, T.T., Drozd, A., Domke, J., Zhang, L., Takano, R., Matsuoka, S.: Scaling distributed deep learning workloads beyond the memory capacity with karma. In: SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–15. IEEE (2020)

  31. Wang, L., Ye, J., Zhao, Y., Wu, W., Li, A., Song, S.L., Xu, Z., Kraska, T.: Superneurons: Dynamic gpu memory management for training deep neural networks. In: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 41–53 (2018)

  32. Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018)

  33. Zhao, J., Gao, X., Xia, R., Zhang, Z., Chen, D., Chen, L., Zhang, R., Geng, Z., Cheng, B., Jin, X.: Apollo: automatic partition-based operator fusion through layer by layer optimization. Proc. Mach. Learn. Syst. 4, 1–19 (2022)

    Google Scholar 

Download references

Acknowledgements

This work is supported by key special projects of National Key R &D plan under Grant no. 2019YFB2204600.

Author information

Authors and Affiliations

Authors

Contributions

KZ completes design and proof of the proposed algorithm, collection of experimental data and paper writing. YC and WW support experimental equipments. HL, ZL and SH provide guidance. DG is the corresponding author.

Corresponding author

Correspondence to Donghui Guo.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, K., Chang, Y., Wu, W. et al. Efficient memory reuse methodology for CNN-based real-time image processing in mobile-embedded systems. J Real-Time Image Proc 20, 118 (2023). https://doi.org/10.1007/s11554-023-01375-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11554-023-01375-8

Keywords

Navigation