Skip to main content

An Efficient Memory Management Method for Embedded Vector Processors

  • Conference paper
  • First Online:
Communications and Networking (ChinaCom 2022)

Abstract

For processors with vectorial computing units like DSP, it is very important to ensure vector load/store operations alignment of memory blocks, and minimize space wastage when making memory allocations. In this paper, we design and implement a memory management method, vector memory pool, suitable for embedded vector processors. By partitioning an entire block of memory space into many aligned vector objects and making efficiently use of vector processing units, the processing of memory manipulation library functions such as memset/memcpy is accelerated. The implementation and comparative verification of vector memory pool on RT-Thread Nano based on SWIFT DSP was completed, and the running efficiency reached a tens of times improvement compared to the original method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Shang, Q., Liu, W.: Multi-function DSP experimental system based on TMS320VC5509. In: Proceedings of 2016 2nd International Conference on Social, Education and Management Engineering (SEME 2016), pp. 107–111. DEStech Publications (2016)

    Google Scholar 

  2. Tarasiuk, T., Szweda, M.: DSP instrument for transient monitoring. Comput. Stand. Interfaces 33(2) (2010)

    Google Scholar 

  3. Shen, J.Q., Wu, J., Zhang, Z.F., et al.: Design and implementation of binaryutilities generator. Appl. Mech. Mater. 644, 3260–3265 (2014). Trans Tech Publications Ltd.

    Google Scholar 

  4. Fridman, J., Greenfield, Z.: The TigerSHARC DSP architecture. IEEE Micro 20, 66–76 (January 2000)

    Article  Google Scholar 

  5. Zhou, Y., He, F., Hou, N., Qiu, Y.: Parallel ant colony optimization on multi-core SIMD CPUs. Future Gener. Comput. Syst. 79 (2018)

    Google Scholar 

  6. Maiyuran, S., Garg, V., Abdallah, M.A., et al.: Memory access latency hiding with hint buffer: U.S. Patent 6,718,440, 6 April 2004

    Google Scholar 

  7. Adachi, Y., Kumano, T., Ogino, K.: Intermediate representation for stiff virtual objects. In: Proceedings Virtual Reality Annual International Symposium 1995, pp. 203–210. IEEE (1995)

    Google Scholar 

  8. Vanholder, H.: Efficient Inference with TensorRT (2016)

    Google Scholar 

  9. Chadha, P., Siddagangaiah, T.: Performance analysis of accelerated linear algebra compiler for TensorFlow

    Google Scholar 

  10. Sivalingam, K., Mujkanovic, N.: Graph compilers for AI training and inference

    Google Scholar 

  11. Griewank, A., Walther, A.: Evaluating derivatives: principles and techniques of algorithmic differentiation. SIAM (2008)

    Google Scholar 

  12. Paszke, A., Gross, S., Massa, F., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, pp. 8024–8035 (2019)

    Google Scholar 

  13. Moore, R.C., Lewis, W.: Intelligent selection of language model training data. In: Proceedings of the ACL 2010 Conference Short Papers, pp. 220–224. Association for Computational Linguistics (2010)

    Google Scholar 

  14. Abadi, M., Barham, P., Chen, J., et al.: TensorFlow: a system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2016), pp. 265–283 (2016)

    Google Scholar 

  15. Yang, Y., Wu, R., Zhang, L., Zhou, D.: An asynchronous adaptive priority round-robin arbiter based on four-phase dual-rail protocol. Chin. J. Elec. 24(01), 1–7 (2015)

    Google Scholar 

Download references

Acknowledgement

The authors thank the editors and the anonymous reviewers for their invaluable comments to help to improve the quality of this paper. This work was supported by National Key R&D Program of China under Grant 2020YFA0711400, National Natural Science Foundation of China under Grants 61831018 and U21A20452, the Outstanding youth project of Natural Science Foundation of Jiangxi Province 20212ACB212001, and the Jiangxi Double Thousand Plan under Grant jxsq2019201125.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, S., Ren, H., Zhang, Z., Tan, B., Wu, J. (2023). An Efficient Memory Management Method for Embedded Vector Processors. In: Gao, F., Wu, J., Li, Y., Gao, H. (eds) Communications and Networking. ChinaCom 2022. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 500. Springer, Cham. https://doi.org/10.1007/978-3-031-34790-0_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-34790-0_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-34789-4

  • Online ISBN: 978-3-031-34790-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics