Accelerating Sparse Convolutional Neural Networks Based on Dataflow Architecture

Wu, Xinxin; Li, Yi; Ou, Yan; Li, Wenming; Sun, Shibo; Xu, Wenxing; Fan, Dongrui

doi:10.1007/978-3-030-60239-0_2

Xinxin Wu^9,10,
Yi Li^9,10,
Yan Ou^9,10,
Wenming Li⁹,
Shibo Sun¹¹,
Wenxing Xu¹¹ &
…
Dongrui Fan^9,10

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12453))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

2056 Accesses
1 Citations

Abstract

Convolutional Neural Networks (CNNs) achieve state-of-the art performance in a wide range of applications including image recognition, speech recognition, and natural language processing. Large-scale CNNs generally have encountered limitations in computing and storage resources, but sparse CNNs have emerged as an effective solution to reduce the amount of computation and memory required. Though existing neural networks accelerators are able to efficiently process sparse networks, the strong coupling of algorithms and structures makes them inflexible. Dataflow architecture can implement different neural network applications through flexible instruction scheduling. The dataflow architecture needs to be initialized at execution time to load instructions into the computing array. Running a dense convolutional layer only needs to be initialized once due to regular calculations. However, running a sparse convolutional layer requires multiple initializations, which takes a long time to fetch instructions from memory, resulting in the computing array being idle and degrading performance. In this paper, we propose an instruction sharing strategy based on the field content in the instruction, which can reduce initialization time and improve performance. Moreover, we use an extended instruction sharing strategy based on the static nature of filters to remove filters-related instructions, further reducing initialization time. Experiments show that our strategies achieve 1.69x (Alexnet), 1.45x (VGG-16) speedup and 37.2\(\%\) (Alexnet), 34.26\(\%\) (VGG-16) energy reduction compared with dense networks. Also, they achieve on average 2.34x (Alexnet), 2.12x (VGG-16) and 1.75x (Alexnet), 1.49x (VGG-16) speedup over Titan Xp GPU and Cambricon-X for our benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Albericio, J., Judd, P., Hetherington, T., Aamodt, T., Jerger, N.E., Moshovos, A.: Cnvlutin: ineffectual-neuron-free deep neural network computing. ACM SIGARCH Comput. Arch. News 44(3), 1–13 (2016)
Article Google Scholar
Carter, N.P., et al.: Runnemede: an architecture for ubiquitous high-performance computing. In: 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA), pp. 198–209. IEEE (2013)
Google Scholar
Chen, Y.H., Krishna, T., Emer, J.S., Sze, V.: Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 52(1), 127–138 (2016)
Article Google Scholar
Chen, Y., Chen, T., Xu, Z., Sun, N., Temam, O.: Diannao family: energy-efficient hardware accelerators for machine learning. Commun. ACM 59(11), 105–112 (2016)
Article Google Scholar
Chen, Y., et al.: DaDianNao: a machine-learning supercomputer. In: Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 609–622. IEEE Computer Society (2014)
Google Scholar
Dean, J., et al.: Large scale distributed deep networks. In: Advances in Neural Information Processing Systems, pp. 1223–1231 (2012)
Google Scholar
Denil, M., Shakibi, B., Dinh, L., Ranzato, M., De Freitas, N.: Predicting parameters in deep learning. In: Advances in Neural Information Processing Systems, pp. 2148–2156 (2013)
Google Scholar
Fan, D., et al.: SmarCo: an efficient many-core processor for high-throughput applications in datacenters. In: 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 596–607. IEEE (2018)
Google Scholar
Gao, G.R., Suetterlein, J., Zuckerman, S.: Toward an execution model for extreme-scale systems-runnemede and beyond. CAPSL Tecnhical Memo 104, Department of Electrical and Computer Engineering, University of Delaware (2011)
Google Scholar
Giorgi, R.: TERAFLUX: harnessing dataflow in next generation teradevices. Microprocess. Microsyst. 38(8), 976–990 (2014)
Article Google Scholar
Han, S., et al.: ESE: efficient speech recognition engine with sparse LSTM on FPGA. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 75–84. ACM (2017)
Google Scholar
Han, S., et al.: EIE: efficient inference engine on compressed deep neural network. In: 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pp. 243–254. IEEE (2016)
Google Scholar
Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems, pp. 1135–1143 (2015)
Google Scholar
Holi, J.L., Hwang, J.N.: Finite precision error analysis of neural network hardware implementations. IEEE Trans. Comput. 42(3), 281–290 (1993)
Article Google Scholar
Jaderberg, M., Vedaldi, A., Zisserman, A.: Speeding up convolutional neural networks with low rank expansions. arXiv preprint arXiv:1405.3866 (2014)
Jouppi, N.P., et al.: In-datacenter performance analysis of a tensor processing unit. In: 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pp. 1–12. IEEE (2017)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization, p. 75. IEEE Computer Society (2004)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Google Scholar
Naumov, M., Chien, L., Vandermersch, P., Kapasi, U.: Cusparse library. In: GPU Technology Conference (2010)
Google Scholar
Oriato, D., Tilbury, S., Marrocu, M., Pusceddu, G.: Acceleration of a meteorological limited area model with dataflow engines. In: 2012 Symposium on Application Accelerators in High Performance Computing, pp. 129–132. IEEE (2012)
Google Scholar
Parashar, A., et al.: SCNN: an accelerator for compressed-sparse convolutional neural networks. In: 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pp. 27–40. IEEE (2017)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Google Scholar
Xiang, T., et al.: Accelerating CNN algorithm with fine-grained dataflow architectures. In: 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp. 243–251. IEEE (2018)
Google Scholar
Ye, X., Fan, D., Sun, N., Tang, S., Zhang, M., Zhang, H.: SimICT: a fast and flexible framework for performance and power evaluation of large-scale architecture. In: Proceedings of the 2013 International Symposium on Low Power Electronics and Design, pp. 273–278. IEEE Press (2013)
Google Scholar
Ye, X., et al.: An efficient dataflow accelerator for scientific applications. Fut. Gener. Comput. Syst. 112, 580–588 (2020)
Article Google Scholar
Ye, X.: Applying cnn on a scientific application accelerator based on dataflow architecture. CCF Trans. High Perform. Comput. 1(3–4), 177–195 (2019)
Article Google Scholar
Zhang, S., et al.: Cambricon-X: an accelerator for sparse neural networks. In: The 49th Annual IEEE/ACM International Symposium on Microarchitecture, p. 20. IEEE Press (2016)
Google Scholar
Zhou, X., et al.: Cambricon-S: addressing irregularity in sparse neural networks through a cooperative software/hardware approach. In: 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 15–28. IEEE (2018)
Google Scholar

Download references

Acknowledgement

This work was supported by the National Key Research and Development Program (2018YFB1003501), the National Natural Science Foundation of China (61732018, 61872335, 61802367, 61672499), Austrian-Chinese Cooperative R&D Project (FFG and CAS) Grant No. 171111KYSB20170032, the Strategic Priority Research Program of Chinese Academy of Sciences, Grant No. XDC05000000, and the Open Project Program of the State Key Laboratory of Mathematical Engineering and Advanced Computing (2019A07).

Author information

Authors and Affiliations

State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
Xinxin Wu, Yi Li, Yan Ou, Wenming Li & Dongrui Fan
School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing, 100190, China
Xinxin Wu, Yi Li, Yan Ou & Dongrui Fan
Beijing Institute of Petrochemical Technology, College of Information Engineering, Beijing, 100190, China
Shibo Sun & Wenxing Xu

Authors

Xinxin Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yi Li
View author publications
You can also search for this author in PubMed Google Scholar
Yan Ou
View author publications
You can also search for this author in PubMed Google Scholar
Wenming Li
View author publications
You can also search for this author in PubMed Google Scholar
Shibo Sun
View author publications
You can also search for this author in PubMed Google Scholar
Wenxing Xu
View author publications
You can also search for this author in PubMed Google Scholar
Dongrui Fan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xinxin Wu .

Editor information

Editors and Affiliations

Columbia University, New York, NY, USA
Meikang Qiu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, X. et al. (2020). Accelerating Sparse Convolutional Neural Networks Based on Dataflow Architecture. In: Qiu, M. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2020. Lecture Notes in Computer Science(), vol 12453. Springer, Cham. https://doi.org/10.1007/978-3-030-60239-0_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-60239-0_2
Published: 29 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60238-3
Online ISBN: 978-3-030-60239-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics