Abstract
Convolutional neural networks (CNN) is playing an important role in many fields. Many applications are able to run the inference process of CNN with pre-trained models on mobile devices in these days. Improving performance of embedded processors such as ARM-based CPUs makes it possible to meet the requirement of real-time processing. In this paper, a pipelining strategy is proposed to accelerate convolution networks on ARM processors. We implement a \(3\times 3\) convolution with Neon instructions which are single instruction and multiple data (SIMD) instructions supported by ARM processors. In order to reduce stalls in the pipeline, issue orders of instructions are rearranged according to the out-of-order execution and dual-issue mechanism on ARM processors. A tiling method is exploited to increase data reuse. The input feature map is divided into multiple \(6\times 6\) tiles, and the computations within the tile is highly optimized using our proposed pipelining strategy. The speedup of proposed method is 2.88 compared with gcc compiled codes on RK3288. The effect of our optimizing method is measured by a performance profiling tool, cycles and cache misses are decreased significantly. The multi-thread version implemented with openMP achieve speedup of 6.8 compared with single-thread gcc complied version.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Liu, Z., Chow, P., Xu, J., Jiang, J., Dou, Y., Zhou, J.: A uniform architecture design for accelerating 2D and 3D CNNS on FPGAs. Electronics 8(1), 65 (2019)
Qiao, Y., Shen, J., Xiao, T., Yang, Q., Wen, M., Zhang, C.: FPGA-accelerated deep convolutional neural networks for high throughput and energy efficiency. Concurrency Comput.: Practice Exp. 29(20), e3850 (2017)
Dongarra, J.J., Cruz, J.D., Hammarling, S., Duff, I.S.: Algorithm 679: a set of level 3 basic linear algebra subprograms: model implementation and test programs. ACM Trans. Math. Softw. (TOMS) 16(1), 18–28 (1990)
Winograd, S.: Arithmetic Complexity of Computations, vol. 33. SIAM, Philadelphia (1980)
Lavin, A., Gray, S.: Fast algorithms for convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4013–4021 (2016)
Arm compute library. https://github.com/ARM-software/ComputeLibrary
Tengine. https://github.com/OAID/Tengine
Chetlur, S., et al.: cuDNN: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014)
Mkl-dnn. https://github.com/intel/mkl-dnn
Ncnn: a high-performance neural network inference framework optimized for the mobile platform. https://github.com/Tencent/ncnn
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems (2012)
Szegedy, C., et al.: Going deeper with convolutions (2014)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Howard, A.G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications (2017)
Patterson, D.A.: Computer Architecture: A Quantitative Approach (2008)
Cortex, A.: A8 technical reference manual. Revision: r3p2, p. 64, May 2010
Cortex, A.: Arm Cortex-A17 MPCore processor. Revision: r1p1, September 2014
Acknowledgment
This paper is supported by the National Key Research and Development Program of China (Grant No. 2018YFB1003405) and the National Natural Science Foundation of China (Grant No. 61802419).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhou, X., Li, R., Zhang, P., Liu, Y., Dou, Y. (2020). A Pipelining Strategy for Accelerating Convolutional Networks on ARM Processors. In: Shen, H., Sang, Y. (eds) Parallel Architectures, Algorithms and Programming. PAAP 2019. Communications in Computer and Information Science, vol 1163. Springer, Singapore. https://doi.org/10.1007/978-981-15-2767-8_45
Download citation
DOI: https://doi.org/10.1007/978-981-15-2767-8_45
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-2766-1
Online ISBN: 978-981-15-2767-8
eBook Packages: Computer ScienceComputer Science (R0)