DyPipe: A Holistic Approach to Accelerating Dynamic Neural Networks with Dynamic Pipelining

Zhuang, Yi-Min; Hu, Xing; Chen, Xiao-Bing; Zhi, Tian

doi:10.1007/s11390-021-1161-y

DyPipe: A Holistic Approach to Accelerating Dynamic Neural Networks with Dynamic Pipelining

Regular Paper
Published: 31 July 2023

Volume 38, pages 899–910, (2023)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Yi-Min Zhuang^1,2,
Xing Hu¹,
Xiao-Bing Chen^1,2 &
…
Tian Zhi¹

53 Accesses
1 Altmetric
Explore all metrics

Abstract

Dynamic neural network (NN) techniques are increasingly important because they facilitate deep learning techniques with more complex network architectures. However, existing studies, which predominantly optimize the static computational graphs by static scheduling methods, usually focus on optimizing static neural networks in deep neural network (DNN) accelerators. We analyze the execution process of dynamic neural networks and observe that dynamic features introduce challenges for efficient scheduling and pipelining in existing DNN accelerators. We propose DyPipe, a holistic approach to optimizing dynamic neural network inferences in enhanced DNN accelerators. DyPipe achieves significant performance improvements for dynamic neural networks while it introduces negligible overhead for static neural networks. Our evaluation demonstrates that DyPipe achieves 1.7x speedup on dynamic neural networks and maintains more than 96% performance for static neural networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Xie S N, Girshick R, Dollár P, Tu Z W, He K M. Aggregated residual transformations for deep neural networks. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul. 2017, pp.5987–5995. https://doi.org/10.1109/cvpr.2017.634.
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł, Polosukhin I. Attention is all you need. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.6000–6010.
Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, Chen Y T, Lillicrap T, Hui F, Sifre L, van den Driessche G, Graepel T, Hassabis D. Mastering the game of go without human knowledge. Nature, 2017, 550(7676): 354–359. https://doi.org/10.1038/nature24270.
Article Google Scholar
Jouppi N P, Young C, Patil N, Patterson D, Agrawal G, Bajwa R, Bates S, Bhatia S, Boden N, Borchers A, Boyle R, Cantin P L, Chao C, Clark C, Coriell J, Daley M, Dau M, Dean J, Gelb B, Ghaemmaghami T V, Gottipati R, Gulland W, Hagmann R, Ho C R, Hogberg D, Hu J, Hundt R, Hurt D, Ibarz J, Jaffey A, Jaworski A, Kaplan A, Khaitan H, Killebrew D, Koch A, Kumar N, Lacy S, Laudon J, Law J, Le D, Leary C, Liu Z Y, Lucke K, Lundin A, MacKean G, Maggiore A, Mahony M, Miller K, Nagarajan R, Narayanaswami R, Ni R, Nix K, Norrie T, Omernick M, Penukonda N, Phelps A, Ross J, Ross M, Salek A, Samadiani E, Severn C, Sizikov G, Snelham M, Souter J, Steinberg D, Swing A, Tan M, Thorson G, Tian B, Toma H, Tuttle E, Vasudevan V, Walter R, Wang W, Wilcox E, Yoon D H. In-datacenter performance analysis of a tensor processing unit. In Proc. the 44th Annual International Symposium on Computer Architecture, Jun. 2017. https://doi.org/10.1145/3079856.3080246.
Chen Y H, Krishna T, Emer J S, Sze V. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits, 2017, 52(1): 127-138. https://doi.org/10.1109/JSSC.2016.261657.
Article Google Scholar
Alwani M, Chen H, Ferdman M, Milder P. Fused-layer CNN accelerators. In Proc. the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Oct. 2016. https://doi.org/10.1109/micro.2016.7783725.
Abadi M, Barham P, Chen J M, Chen Z F, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg J, Monga R, Moore S, Murray D G, Steiner B, Tucker P, Vasudevan V, Warden P, Wicke M, Yu Y, Zheng X Q. Tensorflow: A system for large-scale machine learning. In Proc. the 12th USENIX Conference on Operating Systems Design and Implementation, Nov. 2016, pp.265–283.
Rotem N, Fix J, Abdulrasool S, Catron G, Deng S, Dzhabarov R, Gibson N, Hegeman J, Lele M, Levenstein R, Montgomery J, Maher B, Nadathur S, Olesen J, Park J, Rakhov A, Smelyanskiy M, Wang M. Glow: Graph lowering compiler techniques for neural networks. arXiv: 1805.00907, 2018. https://arxiv.org/abs/1805.00907, August 2023.
Vasilache N, Zinenko O, Theodoridis T, Goyal P, DeVito Z, Moses W S, Verdoolaege S, Adams A, Cohen A. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. arXiv: 1802.04730, 2018. https://arxiv.org/abs/1802.04730, August 2023.
Neubig G, Dyer C, Goldberg Y, Matthews A, Ammar W, Anastasopoulos A, Ballesteros M, Chiang D, Clothiaux D, Cohn T, Duh K, Faruqui M, Gan C, Garrette D, Ji Y F, Kong L P, Kuncoro A, Kumar G, Malaviya C, Michel P, Oda Y, Richardson M, Saphra N, Swayamdipta S, Yin P C. DyNet: The dynamic neural network toolkit. arXiv: 1701.03980, 2017. https://arxiv.org/abs/1701.03980, August 2023.
Devlin J, Chang M W, Lee K, Toutanova K. BERT: Pretraining of deep bidirectional transformers for language understanding. In Proc. the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Jun. 2019, pp.4171–4186. https://doi.org/10.18653/v1/n19-1423.
Kirillov A, Wu Y X, He K M, Girshick R. PointRend: Image segmentation as rendering. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2020, pp.9796–9805. https://doi.org/10.1109/cvpr42600.2020.00982.
Chen T Q, Moreau T, Jiang Z H, Zheng L M, Yan E, Cowan M, Shen H C, Wang L Y, Hu Y W, Ceze L, Guestrin C, Krishnamurthy A. TVM: An automated end-to-end optimizing compiler for deep learning. In Proc. the 13th USENIX Conference on Operating Systems Design and Implementation, Oct. 2018, pp.579–594.
Xing Y, Liang S, Sui L Z, Jia X J, Qiu J T, Liu X, Wang Y S, Shan Y, Wang Y. DNNVM: End-to-end compiler leveraging heterogeneous optimizations on FPGA-based CNN accelerators. IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, 2020, 39(10): 2668-2681. https://doi.org/10.1109/tcad.2019.2930577.
Article Google Scholar
Chen T Q, Zheng L M, Yan E, Jiang Z H, Moreau T, Ceze L, Guestrin C, Krishnamurthy A. Learning to optimize tensor programs. In Proc. the 32nd International Conference on Neural Information Processing Systems, Dec. 2018, pp.3393–3404.
Xiao Q C, Liang Y, Lu L Q, Yan S E, Tai Y W. Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on FPGAs. In Proc. the 54th Annual Design Automation Conference, Jun. 2017, Article No. 62. https://doi.org/10.1145/3061639.3062244.
He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016, pp.770–778. https://doi.org/10.1109/cvpr.2016.90.
Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2015. https://doi.org/10.1109/cvpr.2015.7298594.
Lan Z Z, Chen M D, Goodman S, Gimpel K, Sharma P, Soricut R. ALBERT: A lite BERT for self-supervised learning of language representations. arXiv: 1909.11942, 2019. https://arxiv.org/abs/1909.11942, August 2023.
Zoph B, Le Q V. Neural architecture search with reinforcement learning. arXiv: 1611.01578, 2016. https://arxiv.org/abs/1611.01578, August 2023.
Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks. In Proc. the 27th International Conference on Neural Information Processing Systems, Dec. 2014, pp.3104–3112.
Tai K S, Socher R, Manning C D. Improved semantic representations from tree-structured long short-term memory networks. In Proc. the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Jul. 2015, pp.1556–1566. https://doi.org/10.3115/v1/p15-1150.
Zoph B, Vasudevan V, Shlens J, Le Q V. Learning transferable architectures for scalable image recognition. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp.8697–8710. https://doi.org/10.1109/cvpr.2018.00907.
Shen H C, Roesch J, Chen Z, Chen W, Wu Y, Li M, Sharma V, Tatlock Z, Wang Y D. Nimble: Efficiently compiling dynamic neural networks for model inference. arXiv: 2006.03031, 2020. https://arxiv.org/abs/2006.03031, August 2023.
Looks M, Herreshoff M, Hutchins D, Norvig P. Deep learning with dynamic computation graphs. arXiv: 1702.02181, 2017. https://arxiv.org/abs/1702.02181, August 2023.
Chen T Q, Li M, Li Y T, Lin M, Wang N Y, Wang M J, Xiao T J, Xu B, Zhang C Y, Zhang Z. MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv: 1512.01274, 2015. https://arxiv.org/abs/1512.01274, August 2023.
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z M, Gimelshein N, Antiga L, Desmaison A, Köpf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J J, Chintala S. PyTorch: An imperative style, high-performance deep learning library. In Proc. the 33rd International Conference on Neural Information Processing Systems, Dec. 2019, Article No. 721.
Xu S Z, Zhang H, Neubig G, Dai W, Kim J K, Deng Z J, Ho Q, Yang G W, Xing E P. Cavs: An efficient runtime system for dynamic neural networks. In Proc. the 2018 USENIX Conference on Usenix Annual Technical Conference, Jul. 2018, pp.937–950.
Chen T S, Du Z D, Sun N H, Wang J, Wu C Y, Chen Y J, Temam O. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In Proc. the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, Feb. 2014, pp.269–284. https://doi.org/10.1145/2541940.2541967.
Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In Proc. the 25th International Conference on Neural Information Processing Systems, Dec. 2012, pp.1097–1105. https://doi.org/10.1145/3065386.
Iandola F N, Han S, Moskewicz M W, Ashraf K, Dally W J, Keutzer K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size. arXiv: 1602.07360, 2016. https://arxiv.org/abs/1602.07360, August 2023.
Werbos P J. Backpropagation through time: What it does and how to do it. Proceedings of the IEEE, 1990, 78(10): 1550-1560. https://doi.org/10.1109/5.58337.
Article Google Scholar
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735.
Article Google Scholar
Chung J, Gulcehre C, Cho K H, Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv: 1412.3555, 2014. https://arxiv.org/abs/1412.3555, August 2023.

Download references

Author information

Authors and Affiliations

State Key Laboratory of Processors, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
Yi-Min Zhuang, Xing Hu, Xiao-Bing Chen & Tian Zhi
University of Chinese Academy of Sciences, Beijing, 100049, China
Yi-Min Zhuang & Xiao-Bing Chen

Authors

Yi-Min Zhuang
View author publications
You can also search for this author in PubMed Google Scholar
Xing Hu
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-Bing Chen
View author publications
You can also search for this author in PubMed Google Scholar
Tian Zhi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tian Zhi.

Supplementary Information

ESM 1

(PDF 137 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhuang, YM., Hu, X., Chen, XB. et al. DyPipe: A Holistic Approach to Accelerating Dynamic Neural Networks with Dynamic Pipelining. J. Comput. Sci. Technol. 38, 899–910 (2023). https://doi.org/10.1007/s11390-021-1161-y

Download citation

Received: 24 November 2020
Accepted: 30 May 2021
Published: 31 July 2023
Issue Date: July 2023
DOI: https://doi.org/10.1007/s11390-021-1161-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DyPipe: A Holistic Approach to Accelerating Dynamic Neural Networks with Dynamic Pipelining

Abstract

Access this article

References

Author information

Authors and Affiliations

Corresponding author

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation