Compiling Optimization for Neural Network Accelerators

Song, Jin; Zhuang, Yimin; Chen, Xiaobing; Zhi, Tian; Liu, Shaoli

doi:10.1007/978-3-030-29611-7_2

Compiling Optimization for Neural Network Accelerators

Jin Song^13,14,15,
Yimin Zhuang^13,14,15,
Xiaobing Chen^13,14,15,
Tian Zhi^14,15 &
…
Shaoli Liu^14,15

Conference paper
First Online: 09 August 2019

956 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11719))

Abstract

Nowadays artificial neural networks are one of the most common computational models among all the intelligent methods. To cope with the ever-growing scales of neural networks and the restrictions of system energy consumption, there comes out a bunch of neural network (NN) accelerators. However, owing to their dedicated architecture, programming on NN accelerators is different from general processors. In order to improve performance, it is necessary to use global structure information of NN model to optimize the compilation. In this paper, we introduce a series of layer-based compile optimizations for NN accelerators. From top to bottom, we define a type of computational graph, carrying necessary information such as relationship between layer nodes and data nodes. Then according to the pattern of a NN layer computation process, we apply an intra layer loop unrolling and pipelining, including fine-grained and coarse-grained two levels. Similarly, we apply layer fusion optimization based on our computational graph and abstract pipelining stage. After expanding pipelining stages of layers, we can reduce some redundant IO operations, which we call it layer elimination optimization. The experiment results show that with our proposed optimizations the inference process can achieve up to 1.34x speedup than not using fusion optimization.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, pp. 1097–1105. Curran Associates Inc. (2012)
Google Scholar
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition, pp. 770–778 (2015)
Google Scholar
Zhang, X., Zhou, X., Lin, M., et al.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices (2017)
Google Scholar
Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: International Conference on Neural Information Processing Systems, pp. 91–99. MIT Press (2015)
Google Scholar
Sak, H., Senior, A., Beaufays, F.: Long short-term memory recurrent neural network architectures for large scale acoustic modeling. Comput. Sci. 338–342 (2014)
Google Scholar
Graves, A., Jaitly, N., Mohamed, A.R.: Hybrid speech recognition with deep bidirectional LSTM. In: Automatic Speech Recognition and Understanding, pp. 273–278. IEEE (2014)
Google Scholar
Silver, D., Schrittwieser, J., Simonyan, K., et al.: Mastering the game of Go without human knowledge. Nature 550(7676), 354–359 (2017)
Article Google Scholar
Silver, D., Huang, A., Maddison, C.J., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Article Google Scholar
OpenAI Five Homepage. https://blog.openai.com/openai-five/
Venkatesh, G., Nurvitadhi, E., Marr, D.: Accelerating deep convolutional networks using low-precision and sparsity (2016)
Google Scholar
Ovtcharov, K., Ruwase, O., Kim, J., et al.: Accelerating deep convolutional neural networks using specialized hardware. Miscellaneous (2015)
Google Scholar
Han, S., Liu, X., Mao, H., et al.: EIE: efficient inference engine on compressed deep neural network. In: International Symposium on Computer Architecture, pp. 243–254. IEEE Press (2016)
Google Scholar
Zhang, C., Li, P., Sun, G., et al.: Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 161–170. ACM (2015)
Google Scholar
Parashar, A., Rhu, M., Mukkara, A., et al.: SCNN: an accelerator for compressed-sparse convolutional neural networks, pp. 27–40 (2017)
Article Google Scholar
Chen, T., Du, Z., Sun, N.: DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM SIGPLAN Not. 49(4), 269–284 (2014)
Google Scholar
Chen, Y., Chen, T., Xu, Z.: DianNao family: energy-efficient hardware accelerators for machine learning. Commun. ACM 59(11), 105–112 (2016)
Article Google Scholar
Zhang, S., Du, Z., Zhang, L., et al.: Cambricon-X: an accelerator for sparse neural networks. In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE Computer Society (2016)
Google Scholar
Liu, S., Du, Z., Tao, J., et al.: Cambricon: an instruction set architecture for neural networks. In: ACM/IEEE International Symposium on Computer Architecture, pp. 393–405. IEEE (2016)
Google Scholar
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Comput. Sci. (2014)
Google Scholar
Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions (2014)
Google Scholar
Szegedy, C., Ioffe, S., Vanhoucke, V., et al.: Inception-v4, Inception-ResNet and the impact of residual connections on learning (2016)
Google Scholar
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions (2015)
Google Scholar
Howard, A.G., Zhu, M., Chen, B., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications (2017)
Google Scholar
Iandola, F.N., Han, S., Moskewicz, M.W., et al.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size (2016)
Google Scholar
Abadi, M., Agarwal, A., Barham, P., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems (2016)
Google Scholar
Jia, Y., Shelhamer, E., et al.: Caffe: convolutional architecture for fast feature embedding, pp. 675–678 (2014)
Google Scholar
Chen, T., Li, M., Li, Y., et al.: MXNet: a flexible and efficient machine learning library for heterogeneous distributed systems. Statistics (2015)
Google Scholar
Allan, V.H., Jones, R.B., Lee, R.M., et al.: Software pipelining. ACM Comput. Surv. 27(3), 367–432 (1995)
Article Google Scholar
Gray, A., Gottbrath, C., Olson, R., Prasanna, S., et al.: Production deep learning with NVIDIA GPU inference engine. https://devblogs.nvidia.com/production-deep-learning-nvidia-gpu-inference-engine/
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Truong, L., Barik, R., Totoni, E., et al.: Latte: a language, compiler, and runtime for elegant and efficient deep neural networks. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 209–223. ACM (2016)
Article Google Scholar
Chen, T., Moreau, T., Jiang, Z., et al.: TVM: an automated end-to-end optimizing compiler for deep learning (2018)
Google Scholar
Ragankelley, J., Adams, A., Sharlet, D., et al.: Halide: decoupling algorithms from schedules for high-performance image processing. Commun. ACM 61(1), 106–115 (2018)
Article Google Scholar
Cyphers, S., Bansal, A.K., Bhiwandiwalla, A., et al.: Intel nGraph: an intermediate representation, compiler, and executor for deep learning (2018)
Google Scholar

Download references

Acknowledgement

This work is partially supported by the National Key Research and Development Program of China (under Grant 2017YFB1003104), the NSF of China (under Grants 61432016, 61532016, 61672491, 61602441, 61602446, 61732002, 61702478, 61732007 and 61732020), Beijing Natural Science Foundation (JQ18013), the 973 Program of China (under Grant 2015CB358800), National Science and Technology Major Project (2018ZX01031102), the Transformation and Transfer of Scientific and Technological Achievements of Chinese Academy of Sciences (KFJ-HGZX-013), Key Research Projects in Frontier Science of Chinese Academy of Sciences (QYZDB-SSW-JSC001), Strategic Priority Research Program of Chinese Academy of Science (XDB32050200, XDC01020000) and Standardization Research Project of Chinese Academy of Sciences (BZ201800001).

Author information

Authors and Affiliations

University of Chinese Academy of Sciences, Beijing, China
Jin Song, Yimin Zhuang & Xiaobing Chen
SKL of Computer Architecture, Institute of Computing Technology, CAS, Beijing, China
Jin Song, Yimin Zhuang, Xiaobing Chen, Tian Zhi & Shaoli Liu
Cambricon Tech. Ltd., Beijing, China
Jin Song, Yimin Zhuang, Xiaobing Chen, Tian Zhi & Shaoli Liu

Authors

Jin Song
View author publications
You can also search for this author in PubMed Google Scholar
Yimin Zhuang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaobing Chen
View author publications
You can also search for this author in PubMed Google Scholar
Tian Zhi
View author publications
You can also search for this author in PubMed Google Scholar
Shaoli Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jin Song or Tian Zhi .

Editor information

Editors and Affiliations

University of Minnesota, Minneapolis, MN, USA
Pen-Chung Yew
Chalmers University of Technology, Gothenburg, Sweden
Per Stenström
National University of Defense Technology, Changsha, China
Junjie Wu
Nankai University, Tianjin, China
Xiaoli Gong
Nankai University, Tianjin, China
Tao Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Song, J., Zhuang, Y., Chen, X., Zhi, T., Liu, S. (2019). Compiling Optimization for Neural Network Accelerators. In: Yew, PC., Stenström, P., Wu, J., Gong, X., Li, T. (eds) Advanced Parallel Processing Technologies. APPT 2019. Lecture Notes in Computer Science(), vol 11719. Springer, Cham. https://doi.org/10.1007/978-3-030-29611-7_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-29611-7_2
Published: 09 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29610-0
Online ISBN: 978-3-030-29611-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)