A Pipelining Strategy for Accelerating Convolutional Networks on ARM Processors

Zhou, Xin; Li, Rongchun; Zhang, Peng; Liu, Yuntao; Dou, Yong

doi:10.1007/978-981-15-2767-8_45

Xin Zhou⁸,
Rongchun Li⁸,
Peng Zhang⁸,
Yuntao Liu⁸ &
…
Yong Dou⁸

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1163))

Included in the following conference series:

International Symposium on Parallel Architectures, Algorithms and Programming

1411 Accesses

Abstract

Convolutional neural networks (CNN) is playing an important role in many fields. Many applications are able to run the inference process of CNN with pre-trained models on mobile devices in these days. Improving performance of embedded processors such as ARM-based CPUs makes it possible to meet the requirement of real-time processing. In this paper, a pipelining strategy is proposed to accelerate convolution networks on ARM processors. We implement a \(3\times 3\) convolution with Neon instructions which are single instruction and multiple data (SIMD) instructions supported by ARM processors. In order to reduce stalls in the pipeline, issue orders of instructions are rearranged according to the out-of-order execution and dual-issue mechanism on ARM processors. A tiling method is exploited to increase data reuse. The input feature map is divided into multiple \(6\times 6\) tiles, and the computations within the tile is highly optimized using our proposed pipelining strategy. The speedup of proposed method is 2.88 compared with gcc compiled codes on RK3288. The effect of our optimizing method is measured by a performance profiling tool, cycles and cache misses are decreased significantly. The multi-thread version implemented with openMP achieve speedup of 6.8 compared with single-thread gcc complied version.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Liu, Z., Chow, P., Xu, J., Jiang, J., Dou, Y., Zhou, J.: A uniform architecture design for accelerating 2D and 3D CNNS on FPGAs. Electronics 8(1), 65 (2019)
Article Google Scholar
Qiao, Y., Shen, J., Xiao, T., Yang, Q., Wen, M., Zhang, C.: FPGA-accelerated deep convolutional neural networks for high throughput and energy efficiency. Concurrency Comput.: Practice Exp. 29(20), e3850 (2017)
Article Google Scholar
Dongarra, J.J., Cruz, J.D., Hammarling, S., Duff, I.S.: Algorithm 679: a set of level 3 basic linear algebra subprograms: model implementation and test programs. ACM Trans. Math. Softw. (TOMS) 16(1), 18–28 (1990)
Article Google Scholar
Winograd, S.: Arithmetic Complexity of Computations, vol. 33. SIAM, Philadelphia (1980)
Google Scholar
Lavin, A., Gray, S.: Fast algorithms for convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4013–4021 (2016)
Google Scholar
Arm compute library. https://github.com/ARM-software/ComputeLibrary
Tengine. https://github.com/OAID/Tengine
Chetlur, S., et al.: cuDNN: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014)
Mkl-dnn. https://github.com/intel/mkl-dnn
Ncnn: a high-performance neural network inference framework optimized for the mobile platform. https://github.com/Tencent/ncnn
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems (2012)
Google Scholar
Szegedy, C., et al.: Going deeper with convolutions (2014)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Howard, A.G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications (2017)
Google Scholar
Patterson, D.A.: Computer Architecture: A Quantitative Approach (2008)
Google Scholar
Cortex, A.: A8 technical reference manual. Revision: r3p2, p. 64, May 2010
Google Scholar
Cortex, A.: Arm Cortex-A17 MPCore processor. Revision: r1p1, September 2014
Google Scholar

Download references

Acknowledgment

This paper is supported by the National Key Research and Development Program of China (Grant No. 2018YFB1003405) and the National Natural Science Foundation of China (Grant No. 61802419).

Author information

Authors and Affiliations

National Key Laboratory for Parallel and Distribution Processing, National University of Defense Technology, Changsha, 450001, China
Xin Zhou, Rongchun Li, Peng Zhang, Yuntao Liu & Yong Dou

Authors

Xin Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Rongchun Li
View author publications
You can also search for this author in PubMed Google Scholar
Peng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yuntao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yong Dou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xin Zhou .

Editor information

Editors and Affiliations

Sun Yat-sen University, Guangzhou, China
Hong Shen
Sun Yat-sen University, Guangzhou, China
Yingpeng Sang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, X., Li, R., Zhang, P., Liu, Y., Dou, Y. (2020). A Pipelining Strategy for Accelerating Convolutional Networks on ARM Processors. In: Shen, H., Sang, Y. (eds) Parallel Architectures, Algorithms and Programming. PAAP 2019. Communications in Computer and Information Science, vol 1163. Springer, Singapore. https://doi.org/10.1007/978-981-15-2767-8_45

Download citation

DOI: https://doi.org/10.1007/978-981-15-2767-8_45
Published: 26 January 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-2766-1
Online ISBN: 978-981-15-2767-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics