Abstract
Nowadays artificial neural networks are one of the most common computational models among all the intelligent methods. To cope with the ever-growing scales of neural networks and the restrictions of system energy consumption, there comes out a bunch of neural network (NN) accelerators. However, owing to their dedicated architecture, programming on NN accelerators is different from general processors. In order to improve performance, it is necessary to use global structure information of NN model to optimize the compilation. In this paper, we introduce a series of layer-based compile optimizations for NN accelerators. From top to bottom, we define a type of computational graph, carrying necessary information such as relationship between layer nodes and data nodes. Then according to the pattern of a NN layer computation process, we apply an intra layer loop unrolling and pipelining, including fine-grained and coarse-grained two levels. Similarly, we apply layer fusion optimization based on our computational graph and abstract pipelining stage. After expanding pipelining stages of layers, we can reduce some redundant IO operations, which we call it layer elimination optimization. The experiment results show that with our proposed optimizations the inference process can achieve up to 1.34x speedup than not using fusion optimization.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, pp. 1097–1105. Curran Associates Inc. (2012)
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition, pp. 770–778 (2015)
Zhang, X., Zhou, X., Lin, M., et al.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices (2017)
Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: International Conference on Neural Information Processing Systems, pp. 91–99. MIT Press (2015)
Sak, H., Senior, A., Beaufays, F.: Long short-term memory recurrent neural network architectures for large scale acoustic modeling. Comput. Sci. 338–342 (2014)
Graves, A., Jaitly, N., Mohamed, A.R.: Hybrid speech recognition with deep bidirectional LSTM. In: Automatic Speech Recognition and Understanding, pp. 273–278. IEEE (2014)
Silver, D., Schrittwieser, J., Simonyan, K., et al.: Mastering the game of Go without human knowledge. Nature 550(7676), 354–359 (2017)
Silver, D., Huang, A., Maddison, C.J., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
OpenAI Five Homepage. https://blog.openai.com/openai-five/
Venkatesh, G., Nurvitadhi, E., Marr, D.: Accelerating deep convolutional networks using low-precision and sparsity (2016)
Ovtcharov, K., Ruwase, O., Kim, J., et al.: Accelerating deep convolutional neural networks using specialized hardware. Miscellaneous (2015)
Han, S., Liu, X., Mao, H., et al.: EIE: efficient inference engine on compressed deep neural network. In: International Symposium on Computer Architecture, pp. 243–254. IEEE Press (2016)
Zhang, C., Li, P., Sun, G., et al.: Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 161–170. ACM (2015)
Parashar, A., Rhu, M., Mukkara, A., et al.: SCNN: an accelerator for compressed-sparse convolutional neural networks, pp. 27–40 (2017)
Chen, T., Du, Z., Sun, N.: DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM SIGPLAN Not. 49(4), 269–284 (2014)
Chen, Y., Chen, T., Xu, Z.: DianNao family: energy-efficient hardware accelerators for machine learning. Commun. ACM 59(11), 105–112 (2016)
Zhang, S., Du, Z., Zhang, L., et al.: Cambricon-X: an accelerator for sparse neural networks. In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE Computer Society (2016)
Liu, S., Du, Z., Tao, J., et al.: Cambricon: an instruction set architecture for neural networks. In: ACM/IEEE International Symposium on Computer Architecture, pp. 393–405. IEEE (2016)
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Comput. Sci. (2014)
Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions (2014)
Szegedy, C., Ioffe, S., Vanhoucke, V., et al.: Inception-v4, Inception-ResNet and the impact of residual connections on learning (2016)
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions (2015)
Howard, A.G., Zhu, M., Chen, B., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications (2017)
Iandola, F.N., Han, S., Moskewicz, M.W., et al.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size (2016)
Abadi, M., Agarwal, A., Barham, P., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems (2016)
Jia, Y., Shelhamer, E., et al.: Caffe: convolutional architecture for fast feature embedding, pp. 675–678 (2014)
Chen, T., Li, M., Li, Y., et al.: MXNet: a flexible and efficient machine learning library for heterogeneous distributed systems. Statistics (2015)
Allan, V.H., Jones, R.B., Lee, R.M., et al.: Software pipelining. ACM Comput. Surv. 27(3), 367–432 (1995)
Gray, A., Gottbrath, C., Olson, R., Prasanna, S., et al.: Production deep learning with NVIDIA GPU inference engine. https://devblogs.nvidia.com/production-deep-learning-nvidia-gpu-inference-engine/
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Truong, L., Barik, R., Totoni, E., et al.: Latte: a language, compiler, and runtime for elegant and efficient deep neural networks. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 209–223. ACM (2016)
Chen, T., Moreau, T., Jiang, Z., et al.: TVM: an automated end-to-end optimizing compiler for deep learning (2018)
Ragankelley, J., Adams, A., Sharlet, D., et al.: Halide: decoupling algorithms from schedules for high-performance image processing. Commun. ACM 61(1), 106–115 (2018)
Cyphers, S., Bansal, A.K., Bhiwandiwalla, A., et al.: Intel nGraph: an intermediate representation, compiler, and executor for deep learning (2018)
Acknowledgement
This work is partially supported by the National Key Research and Development Program of China (under Grant 2017YFB1003104), the NSF of China (under Grants 61432016, 61532016, 61672491, 61602441, 61602446, 61732002, 61702478, 61732007 and 61732020), Beijing Natural Science Foundation (JQ18013), the 973 Program of China (under Grant 2015CB358800), National Science and Technology Major Project (2018ZX01031102), the Transformation and Transfer of Scientific and Technological Achievements of Chinese Academy of Sciences (KFJ-HGZX-013), Key Research Projects in Frontier Science of Chinese Academy of Sciences (QYZDB-SSW-JSC001), Strategic Priority Research Program of Chinese Academy of Science (XDB32050200, XDC01020000) and Standardization Research Project of Chinese Academy of Sciences (BZ201800001).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Song, J., Zhuang, Y., Chen, X., Zhi, T., Liu, S. (2019). Compiling Optimization for Neural Network Accelerators. In: Yew, PC., Stenström, P., Wu, J., Gong, X., Li, T. (eds) Advanced Parallel Processing Technologies. APPT 2019. Lecture Notes in Computer Science(), vol 11719. Springer, Cham. https://doi.org/10.1007/978-3-030-29611-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-29611-7_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29610-0
Online ISBN: 978-3-030-29611-7
eBook Packages: Computer ScienceComputer Science (R0)