Skip to main content
Log in

DyPipe: A Holistic Approach to Accelerating Dynamic Neural Networks with Dynamic Pipelining

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Dynamic neural network (NN) techniques are increasingly important because they facilitate deep learning techniques with more complex network architectures. However, existing studies, which predominantly optimize the static computational graphs by static scheduling methods, usually focus on optimizing static neural networks in deep neural network (DNN) accelerators. We analyze the execution process of dynamic neural networks and observe that dynamic features introduce challenges for efficient scheduling and pipelining in existing DNN accelerators. We propose DyPipe, a holistic approach to optimizing dynamic neural network inferences in enhanced DNN accelerators. DyPipe achieves significant performance improvements for dynamic neural networks while it introduces negligible overhead for static neural networks. Our evaluation demonstrates that DyPipe achieves 1.7x speedup on dynamic neural networks and maintains more than 96% performance for static neural networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

  1. Xie S N, Girshick R, Dollár P, Tu Z W, He K M. Aggregated residual transformations for deep neural networks. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul. 2017, pp.5987–5995. https://doi.org/10.1109/cvpr.2017.634.

  2. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł, Polosukhin I. Attention is all you need. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.6000–6010.

  3. Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, Chen Y T, Lillicrap T, Hui F, Sifre L, van den Driessche G, Graepel T, Hassabis D. Mastering the game of go without human knowledge. Nature, 2017, 550(7676): 354–359. https://doi.org/10.1038/nature24270.

    Article  Google Scholar 

  4. Jouppi N P, Young C, Patil N, Patterson D, Agrawal G, Bajwa R, Bates S, Bhatia S, Boden N, Borchers A, Boyle R, Cantin P L, Chao C, Clark C, Coriell J, Daley M, Dau M, Dean J, Gelb B, Ghaemmaghami T V, Gottipati R, Gulland W, Hagmann R, Ho C R, Hogberg D, Hu J, Hundt R, Hurt D, Ibarz J, Jaffey A, Jaworski A, Kaplan A, Khaitan H, Killebrew D, Koch A, Kumar N, Lacy S, Laudon J, Law J, Le D, Leary C, Liu Z Y, Lucke K, Lundin A, MacKean G, Maggiore A, Mahony M, Miller K, Nagarajan R, Narayanaswami R, Ni R, Nix K, Norrie T, Omernick M, Penukonda N, Phelps A, Ross J, Ross M, Salek A, Samadiani E, Severn C, Sizikov G, Snelham M, Souter J, Steinberg D, Swing A, Tan M, Thorson G, Tian B, Toma H, Tuttle E, Vasudevan V, Walter R, Wang W, Wilcox E, Yoon D H. In-datacenter performance analysis of a tensor processing unit. In Proc. the 44th Annual International Symposium on Computer Architecture, Jun. 2017. https://doi.org/10.1145/3079856.3080246.

  5. Chen Y H, Krishna T, Emer J S, Sze V. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits, 2017, 52(1): 127-138. https://doi.org/10.1109/JSSC.2016.261657.

    Article  Google Scholar 

  6. Alwani M, Chen H, Ferdman M, Milder P. Fused-layer CNN accelerators. In Proc. the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Oct. 2016. https://doi.org/10.1109/micro.2016.7783725.

  7. Abadi M, Barham P, Chen J M, Chen Z F, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg J, Monga R, Moore S, Murray D G, Steiner B, Tucker P, Vasudevan V, Warden P, Wicke M, Yu Y, Zheng X Q. Tensorflow: A system for large-scale machine learning. In Proc. the 12th USENIX Conference on Operating Systems Design and Implementation, Nov. 2016, pp.265–283.

  8. Rotem N, Fix J, Abdulrasool S, Catron G, Deng S, Dzhabarov R, Gibson N, Hegeman J, Lele M, Levenstein R, Montgomery J, Maher B, Nadathur S, Olesen J, Park J, Rakhov A, Smelyanskiy M, Wang M. Glow: Graph lowering compiler techniques for neural networks. arXiv: 1805.00907, 2018. https://arxiv.org/abs/1805.00907, August 2023.

  9. Vasilache N, Zinenko O, Theodoridis T, Goyal P, DeVito Z, Moses W S, Verdoolaege S, Adams A, Cohen A. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. arXiv: 1802.04730, 2018. https://arxiv.org/abs/1802.04730, August 2023.

  10. Neubig G, Dyer C, Goldberg Y, Matthews A, Ammar W, Anastasopoulos A, Ballesteros M, Chiang D, Clothiaux D, Cohn T, Duh K, Faruqui M, Gan C, Garrette D, Ji Y F, Kong L P, Kuncoro A, Kumar G, Malaviya C, Michel P, Oda Y, Richardson M, Saphra N, Swayamdipta S, Yin P C. DyNet: The dynamic neural network toolkit. arXiv: 1701.03980, 2017. https://arxiv.org/abs/1701.03980, August 2023.

  11. Devlin J, Chang M W, Lee K, Toutanova K. BERT: Pretraining of deep bidirectional transformers for language understanding. In Proc. the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Jun. 2019, pp.4171–4186. https://doi.org/10.18653/v1/n19-1423.

  12. Kirillov A, Wu Y X, He K M, Girshick R. PointRend: Image segmentation as rendering. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2020, pp.9796–9805. https://doi.org/10.1109/cvpr42600.2020.00982.

  13. Chen T Q, Moreau T, Jiang Z H, Zheng L M, Yan E, Cowan M, Shen H C, Wang L Y, Hu Y W, Ceze L, Guestrin C, Krishnamurthy A. TVM: An automated end-to-end optimizing compiler for deep learning. In Proc. the 13th USENIX Conference on Operating Systems Design and Implementation, Oct. 2018, pp.579–594.

  14. Xing Y, Liang S, Sui L Z, Jia X J, Qiu J T, Liu X, Wang Y S, Shan Y, Wang Y. DNNVM: End-to-end compiler leveraging heterogeneous optimizations on FPGA-based CNN accelerators. IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, 2020, 39(10): 2668-2681. https://doi.org/10.1109/tcad.2019.2930577.

    Article  Google Scholar 

  15. Chen T Q, Zheng L M, Yan E, Jiang Z H, Moreau T, Ceze L, Guestrin C, Krishnamurthy A. Learning to optimize tensor programs. In Proc. the 32nd International Conference on Neural Information Processing Systems, Dec. 2018, pp.3393–3404.

  16. Xiao Q C, Liang Y, Lu L Q, Yan S E, Tai Y W. Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on FPGAs. In Proc. the 54th Annual Design Automation Conference, Jun. 2017, Article No. 62. https://doi.org/10.1145/3061639.3062244.

  17. He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016, pp.770–778. https://doi.org/10.1109/cvpr.2016.90.

  18. Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2015. https://doi.org/10.1109/cvpr.2015.7298594.

  19. Lan Z Z, Chen M D, Goodman S, Gimpel K, Sharma P, Soricut R. ALBERT: A lite BERT for self-supervised learning of language representations. arXiv: 1909.11942, 2019. https://arxiv.org/abs/1909.11942, August 2023.

  20. Zoph B, Le Q V. Neural architecture search with reinforcement learning. arXiv: 1611.01578, 2016. https://arxiv.org/abs/1611.01578, August 2023.

  21. Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks. In Proc. the 27th International Conference on Neural Information Processing Systems, Dec. 2014, pp.3104–3112.

  22. Tai K S, Socher R, Manning C D. Improved semantic representations from tree-structured long short-term memory networks. In Proc. the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Jul. 2015, pp.1556–1566. https://doi.org/10.3115/v1/p15-1150.

  23. Zoph B, Vasudevan V, Shlens J, Le Q V. Learning transferable architectures for scalable image recognition. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp.8697–8710. https://doi.org/10.1109/cvpr.2018.00907.

  24. Shen H C, Roesch J, Chen Z, Chen W, Wu Y, Li M, Sharma V, Tatlock Z, Wang Y D. Nimble: Efficiently compiling dynamic neural networks for model inference. arXiv: 2006.03031, 2020. https://arxiv.org/abs/2006.03031, August 2023.

  25. Looks M, Herreshoff M, Hutchins D, Norvig P. Deep learning with dynamic computation graphs. arXiv: 1702.02181, 2017. https://arxiv.org/abs/1702.02181, August 2023.

  26. Chen T Q, Li M, Li Y T, Lin M, Wang N Y, Wang M J, Xiao T J, Xu B, Zhang C Y, Zhang Z. MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv: 1512.01274, 2015. https://arxiv.org/abs/1512.01274, August 2023.

  27. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z M, Gimelshein N, Antiga L, Desmaison A, Köpf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J J, Chintala S. PyTorch: An imperative style, high-performance deep learning library. In Proc. the 33rd International Conference on Neural Information Processing Systems, Dec. 2019, Article No. 721.

  28. Xu S Z, Zhang H, Neubig G, Dai W, Kim J K, Deng Z J, Ho Q, Yang G W, Xing E P. Cavs: An efficient runtime system for dynamic neural networks. In Proc. the 2018 USENIX Conference on Usenix Annual Technical Conference, Jul. 2018, pp.937–950.

  29. Chen T S, Du Z D, Sun N H, Wang J, Wu C Y, Chen Y J, Temam O. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In Proc. the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, Feb. 2014, pp.269–284. https://doi.org/10.1145/2541940.2541967.

  30. Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In Proc. the 25th International Conference on Neural Information Processing Systems, Dec. 2012, pp.1097–1105. https://doi.org/10.1145/3065386.

  31. Iandola F N, Han S, Moskewicz M W, Ashraf K, Dally W J, Keutzer K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size. arXiv: 1602.07360, 2016. https://arxiv.org/abs/1602.07360, August 2023.

  32. Werbos P J. Backpropagation through time: What it does and how to do it. Proceedings of the IEEE, 1990, 78(10): 1550-1560. https://doi.org/10.1109/5.58337.

    Article  Google Scholar 

  33. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735.

    Article  Google Scholar 

  34. Chung J, Gulcehre C, Cho K H, Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv: 1412.3555, 2014. https://arxiv.org/abs/1412.3555, August 2023.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tian Zhi.

Supplementary Information

ESM 1

(PDF 137 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhuang, YM., Hu, X., Chen, XB. et al. DyPipe: A Holistic Approach to Accelerating Dynamic Neural Networks with Dynamic Pipelining. J. Comput. Sci. Technol. 38, 899–910 (2023). https://doi.org/10.1007/s11390-021-1161-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-021-1161-y

Keywords

Navigation