Hardware-Software Co-design for Deep Neural Network Acceleration

Wang, Yanwei; Li, Bingbing; Lu, Lu; Wang, Jiangwei; Li, Rengang; Kan, Hongwei

doi:10.1007/978-981-99-4402-6_16

Yanwei Wang⁸,
Bingbing Li⁸,
Lu Lu⁸,
Jiangwei Wang⁸,
Rengang Li⁸ &
…
Hongwei Kan⁸

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1844))

Included in the following conference series:

International Conference on Service Science

594 Accesses
1 Citations

Abstract

Deep neural networks are widely utilized in many fields. However, the extensive requirement of computation is usually difficult to meet to support network inference. Model pruning, a technique to reduce redundant model weights to acceleration, provides a possible way to solve this problem but the improvement is usually limited due to the separation of hardware and software optimization. In this paper, we propose a complete hardware-software co-design framework to support irregular sparse model. Specifically, we prune redundant model weights through iterative pruning by increasing the penalty factor and improve the hardware efficiency through hardware threads control. We achieve significant model efficiency improvement by reducing 64.2% and 86.5% inference latency in vector-multiplication and convolution applications. The experimental results show the significant performance improvement and proves the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Asanović, K., Patterson, D.A.: Instruction sets should be free: the case for RISC-V. In: EECS Department, University of California, Berkeley, Technical report UCB/EECS-2014-146 (2014)
Google Scholar
Bragança, L., et al.: Simplifying HW/SW integration to deploy multiple accelerators for CPU-FPGA heterogeneous platforms. In: Proceedings of the 18th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation, pp. 97–104 (2018)
Google Scholar
Collange, C.: Simty: generalized SIMT execution on RISC-V. In: CARRV 2017–1st Workshop on Computer Architecture Research with RISCV, vol. 6, p. 6 (2017)
Google Scholar
Elsabbagh, F., et al. : Vortex: OpenCL compatible RISC-V GPGPU. In: arXiv preprint arXiv:2002.12151 (2020)
Jääskeläinen, P., et al.: PoCL: a performance-portable OpenCL implementation. Int. J. Parallel Prog. 43, 752–785 (2015)
Article Google Scholar
Lattner, C.: LLVM and Clang: next generation compiler technology. In: The BSD Conference, vol. 5, pp. 1–20 (2008)
Google Scholar
Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: International Symposium on Code Generation and Optimization, 2004. CGO 2004, pp. 75–86. IEEE (2004)
Google Scholar
Liu, Z.-J., et al.: Behavior-aware memory scheduling for GPGPU applications. Comput. Eng. Sci. 39(06), 1011
Google Scholar
Michel, P., Levy, O., Neubig, G.: Are sixteen heads really better than one? Adv. Neural Inf. Process. Syst. 32 (2019)
Google Scholar
Nvidia. Cuda binary utilities. NVIDIA Application Note (2014)
Google Scholar
Sanders, J., Kandrot, E.: CUDA by Example: An Introduction to General-Purpose GPU Programming. Addison-Wesley Professional, Boston (2010)
Google Scholar
Stone, J.E., Gohara, D., Shi, G.,: OpenCL: a parallel programming standard for heterogeneous computing systems. Comput. Sci. Eng. 12(3), 66 (2010)
Google Scholar
Blaise, T., et al.: Vortex: extending the RISC-V ISA for GPGPU and 3D-graphics. In: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 754–766 (2021)
Google Scholar
J. J. Corinna Vinschen. Newlib (2001). http://sourceware.org/newlib
Wang, Z., Wohlwend, J., Lei, T.: Structured pruning of large language models. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6151–6162 (2020)
Google Scholar
Yan, H., et al.: Constructing concurrent data structures on FPGA with channels. In: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 172–177 (2019)
Google Scholar
Yao, Y.: SE-CNN: convolution neural network acceleration via symbolic value prediction. IEEE J. Emerg. Sel. Top. Circuits Syst. (2023)
Google Scholar
Zhang, T., et al.: A systematic DNN weight pruning framework using alternating direction method of multipliers. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 184–199 (2018)
Google Scholar
Zhuang, Z., et al.: Discrimination-aware channel pruning for deep neural networks. Adv. Neural Inf. Process. Syst. 31 (2018)
Google Scholar

Download references

Acknowledgements

This research is supported by National Key R &D Program of China Grants No. 2020YFB1805505, and Natural Science Foundation of Shandong Province Grant No. ZR2022LZH017.

Author information

Authors and Affiliations

Guangdong Inspur Intelligent Computing Technology Co. Ltd., Guangdong, China
Yanwei Wang, Bingbing Li, Lu Lu, Jiangwei Wang, Rengang Li & Hongwei Kan

Authors

Yanwei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Bingbing Li
View author publications
You can also search for this author in PubMed Google Scholar
Lu Lu
View author publications
You can also search for this author in PubMed Google Scholar
Jiangwei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Rengang Li
View author publications
You can also search for this author in PubMed Google Scholar
Hongwei Kan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bingbing Li .

Editor information

Editors and Affiliations

Harbin Institute of Technology, Harbin, China
Zhongjie Wang
Beijing University of Posts and Telecommunications, Beijing, China
Shangguang Wang
Harbin Institute of Technology, Harbin, China
Hanchuan Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Y., Li, B., Lu, L., Wang, J., Li, R., Kan, H. (2023). Hardware-Software Co-design for Deep Neural Network Acceleration. In: Wang, Z., Wang, S., Xu, H. (eds) Service Science. ICSS 2023. Communications in Computer and Information Science, vol 1844. Springer, Singapore. https://doi.org/10.1007/978-981-99-4402-6_16

Download citation

DOI: https://doi.org/10.1007/978-981-99-4402-6_16
Published: 27 July 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-4401-9
Online ISBN: 978-981-99-4402-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Hardware-Software Co-design for Deep Neural Network Acceleration