Skip to main content

Compiling Optimization for Neural Network Accelerators

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11719))

Abstract

Nowadays artificial neural networks are one of the most common computational models among all the intelligent methods. To cope with the ever-growing scales of neural networks and the restrictions of system energy consumption, there comes out a bunch of neural network (NN) accelerators. However, owing to their dedicated architecture, programming on NN accelerators is different from general processors. In order to improve performance, it is necessary to use global structure information of NN model to optimize the compilation. In this paper, we introduce a series of layer-based compile optimizations for NN accelerators. From top to bottom, we define a type of computational graph, carrying necessary information such as relationship between layer nodes and data nodes. Then according to the pattern of a NN layer computation process, we apply an intra layer loop unrolling and pipelining, including fine-grained and coarse-grained two levels. Similarly, we apply layer fusion optimization based on our computational graph and abstract pipelining stage. After expanding pipelining stages of layers, we can reduce some redundant IO operations, which we call it layer elimination optimization. The experiment results show that with our proposed optimizations the inference process can achieve up to 1.34x speedup than not using fusion optimization.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, pp. 1097–1105. Curran Associates Inc. (2012)

    Google Scholar 

  2. He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition, pp. 770–778 (2015)

    Google Scholar 

  3. Zhang, X., Zhou, X., Lin, M., et al.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices (2017)

    Google Scholar 

  4. Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  5. Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: International Conference on Neural Information Processing Systems, pp. 91–99. MIT Press (2015)

    Google Scholar 

  6. Sak, H., Senior, A., Beaufays, F.: Long short-term memory recurrent neural network architectures for large scale acoustic modeling. Comput. Sci. 338–342 (2014)

    Google Scholar 

  7. Graves, A., Jaitly, N., Mohamed, A.R.: Hybrid speech recognition with deep bidirectional LSTM. In: Automatic Speech Recognition and Understanding, pp. 273–278. IEEE (2014)

    Google Scholar 

  8. Silver, D., Schrittwieser, J., Simonyan, K., et al.: Mastering the game of Go without human knowledge. Nature 550(7676), 354–359 (2017)

    Article  Google Scholar 

  9. Silver, D., Huang, A., Maddison, C.J., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)

    Article  Google Scholar 

  10. OpenAI Five Homepage. https://blog.openai.com/openai-five/

  11. Venkatesh, G., Nurvitadhi, E., Marr, D.: Accelerating deep convolutional networks using low-precision and sparsity (2016)

    Google Scholar 

  12. Ovtcharov, K., Ruwase, O., Kim, J., et al.: Accelerating deep convolutional neural networks using specialized hardware. Miscellaneous (2015)

    Google Scholar 

  13. Han, S., Liu, X., Mao, H., et al.: EIE: efficient inference engine on compressed deep neural network. In: International Symposium on Computer Architecture, pp. 243–254. IEEE Press (2016)

    Google Scholar 

  14. Zhang, C., Li, P., Sun, G., et al.: Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 161–170. ACM (2015)

    Google Scholar 

  15. Parashar, A., Rhu, M., Mukkara, A., et al.: SCNN: an accelerator for compressed-sparse convolutional neural networks, pp. 27–40 (2017)

    Article  Google Scholar 

  16. Chen, T., Du, Z., Sun, N.: DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM SIGPLAN Not. 49(4), 269–284 (2014)

    Google Scholar 

  17. Chen, Y., Chen, T., Xu, Z.: DianNao family: energy-efficient hardware accelerators for machine learning. Commun. ACM 59(11), 105–112 (2016)

    Article  Google Scholar 

  18. Zhang, S., Du, Z., Zhang, L., et al.: Cambricon-X: an accelerator for sparse neural networks. In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE Computer Society (2016)

    Google Scholar 

  19. Liu, S., Du, Z., Tao, J., et al.: Cambricon: an instruction set architecture for neural networks. In: ACM/IEEE International Symposium on Computer Architecture, pp. 393–405. IEEE (2016)

    Google Scholar 

  20. Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)

    Article  Google Scholar 

  21. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Comput. Sci. (2014)

    Google Scholar 

  22. Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions (2014)

    Google Scholar 

  23. Szegedy, C., Ioffe, S., Vanhoucke, V., et al.: Inception-v4, Inception-ResNet and the impact of residual connections on learning (2016)

    Google Scholar 

  24. Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions (2015)

    Google Scholar 

  25. Howard, A.G., Zhu, M., Chen, B., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications (2017)

    Google Scholar 

  26. Iandola, F.N., Han, S., Moskewicz, M.W., et al.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size (2016)

    Google Scholar 

  27. Abadi, M., Agarwal, A., Barham, P., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems (2016)

    Google Scholar 

  28. Jia, Y., Shelhamer, E., et al.: Caffe: convolutional architecture for fast feature embedding, pp. 675–678 (2014)

    Google Scholar 

  29. Chen, T., Li, M., Li, Y., et al.: MXNet: a flexible and efficient machine learning library for heterogeneous distributed systems. Statistics (2015)

    Google Scholar 

  30. Allan, V.H., Jones, R.B., Lee, R.M., et al.: Software pipelining. ACM Comput. Surv. 27(3), 367–432 (1995)

    Article  Google Scholar 

  31. Gray, A., Gottbrath, C., Olson, R., Prasanna, S., et al.: Production deep learning with NVIDIA GPU inference engine. https://devblogs.nvidia.com/production-deep-learning-nvidia-gpu-inference-engine/

  32. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  33. Truong, L., Barik, R., Totoni, E., et al.: Latte: a language, compiler, and runtime for elegant and efficient deep neural networks. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 209–223. ACM (2016)

    Article  Google Scholar 

  34. Chen, T., Moreau, T., Jiang, Z., et al.: TVM: an automated end-to-end optimizing compiler for deep learning (2018)

    Google Scholar 

  35. Ragankelley, J., Adams, A., Sharlet, D., et al.: Halide: decoupling algorithms from schedules for high-performance image processing. Commun. ACM 61(1), 106–115 (2018)

    Article  Google Scholar 

  36. Cyphers, S., Bansal, A.K., Bhiwandiwalla, A., et al.: Intel nGraph: an intermediate representation, compiler, and executor for deep learning (2018)

    Google Scholar 

Download references

Acknowledgement

This work is partially supported by the National Key Research and Development Program of China (under Grant 2017YFB1003104), the NSF of China (under Grants 61432016, 61532016, 61672491, 61602441, 61602446, 61732002, 61702478, 61732007 and 61732020), Beijing Natural Science Foundation (JQ18013), the 973 Program of China (under Grant 2015CB358800), National Science and Technology Major Project (2018ZX01031102), the Transformation and Transfer of Scientific and Technological Achievements of Chinese Academy of Sciences (KFJ-HGZX-013), Key Research Projects in Frontier Science of Chinese Academy of Sciences (QYZDB-SSW-JSC001), Strategic Priority Research Program of Chinese Academy of Science (XDB32050200, XDC01020000) and Standardization Research Project of Chinese Academy of Sciences (BZ201800001).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jin Song or Tian Zhi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Song, J., Zhuang, Y., Chen, X., Zhi, T., Liu, S. (2019). Compiling Optimization for Neural Network Accelerators. In: Yew, PC., Stenström, P., Wu, J., Gong, X., Li, T. (eds) Advanced Parallel Processing Technologies. APPT 2019. Lecture Notes in Computer Science(), vol 11719. Springer, Cham. https://doi.org/10.1007/978-3-030-29611-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-29611-7_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-29610-0

  • Online ISBN: 978-3-030-29611-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics