Skip to main content

A Pipelining Strategy for Accelerating Convolutional Networks on ARM Processors

  • Conference paper
  • First Online:
Parallel Architectures, Algorithms and Programming (PAAP 2019)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1163))

  • 1411 Accesses

Abstract

Convolutional neural networks (CNN) is playing an important role in many fields. Many applications are able to run the inference process of CNN with pre-trained models on mobile devices in these days. Improving performance of embedded processors such as ARM-based CPUs makes it possible to meet the requirement of real-time processing. In this paper, a pipelining strategy is proposed to accelerate convolution networks on ARM processors. We implement a \(3\times 3\) convolution with Neon instructions which are single instruction and multiple data (SIMD) instructions supported by ARM processors. In order to reduce stalls in the pipeline, issue orders of instructions are rearranged according to the out-of-order execution and dual-issue mechanism on ARM processors. A tiling method is exploited to increase data reuse. The input feature map is divided into multiple \(6\times 6\) tiles, and the computations within the tile is highly optimized using our proposed pipelining strategy. The speedup of proposed method is 2.88 compared with gcc compiled codes on RK3288. The effect of our optimizing method is measured by a performance profiling tool, cycles and cache misses are decreased significantly. The multi-thread version implemented with openMP achieve speedup of 6.8 compared with single-thread gcc complied version.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Liu, Z., Chow, P., Xu, J., Jiang, J., Dou, Y., Zhou, J.: A uniform architecture design for accelerating 2D and 3D CNNS on FPGAs. Electronics 8(1), 65 (2019)

    Article  Google Scholar 

  2. Qiao, Y., Shen, J., Xiao, T., Yang, Q., Wen, M., Zhang, C.: FPGA-accelerated deep convolutional neural networks for high throughput and energy efficiency. Concurrency Comput.: Practice Exp. 29(20), e3850 (2017)

    Article  Google Scholar 

  3. Dongarra, J.J., Cruz, J.D., Hammarling, S., Duff, I.S.: Algorithm 679: a set of level 3 basic linear algebra subprograms: model implementation and test programs. ACM Trans. Math. Softw. (TOMS) 16(1), 18–28 (1990)

    Article  Google Scholar 

  4. Winograd, S.: Arithmetic Complexity of Computations, vol. 33. SIAM, Philadelphia (1980)

    Google Scholar 

  5. Lavin, A., Gray, S.: Fast algorithms for convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4013–4021 (2016)

    Google Scholar 

  6. Arm compute library. https://github.com/ARM-software/ComputeLibrary

  7. Tengine. https://github.com/OAID/Tengine

  8. Chetlur, S., et al.: cuDNN: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014)

  9. Mkl-dnn. https://github.com/intel/mkl-dnn

  10. Ncnn: a high-performance neural network inference framework optimized for the mobile platform. https://github.com/Tencent/ncnn

  11. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems (2012)

    Google Scholar 

  12. Szegedy, C., et al.: Going deeper with convolutions (2014)

    Google Scholar 

  13. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  14. Howard, A.G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications (2017)

    Google Scholar 

  15. Patterson, D.A.: Computer Architecture: A Quantitative Approach (2008)

    Google Scholar 

  16. Cortex, A.: A8 technical reference manual. Revision: r3p2, p. 64, May 2010

    Google Scholar 

  17. Cortex, A.: Arm Cortex-A17 MPCore processor. Revision: r1p1, September 2014

    Google Scholar 

Download references

Acknowledgment

This paper is supported by the National Key Research and Development Program of China (Grant No. 2018YFB1003405) and the National Natural Science Foundation of China (Grant No. 61802419).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Zhou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhou, X., Li, R., Zhang, P., Liu, Y., Dou, Y. (2020). A Pipelining Strategy for Accelerating Convolutional Networks on ARM Processors. In: Shen, H., Sang, Y. (eds) Parallel Architectures, Algorithms and Programming. PAAP 2019. Communications in Computer and Information Science, vol 1163. Springer, Singapore. https://doi.org/10.1007/978-981-15-2767-8_45

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-2767-8_45

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-2766-1

  • Online ISBN: 978-981-15-2767-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics