Abstract
Deep neural networks are widely utilized in many fields. However, the extensive requirement of computation is usually difficult to meet to support network inference. Model pruning, a technique to reduce redundant model weights to acceleration, provides a possible way to solve this problem but the improvement is usually limited due to the separation of hardware and software optimization. In this paper, we propose a complete hardware-software co-design framework to support irregular sparse model. Specifically, we prune redundant model weights through iterative pruning by increasing the penalty factor and improve the hardware efficiency through hardware threads control. We achieve significant model efficiency improvement by reducing 64.2% and 86.5% inference latency in vector-multiplication and convolution applications. The experimental results show the significant performance improvement and proves the effectiveness of the proposed method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Asanović, K., Patterson, D.A.: Instruction sets should be free: the case for RISC-V. In: EECS Department, University of California, Berkeley, Technical report UCB/EECS-2014-146 (2014)
Bragança, L., et al.: Simplifying HW/SW integration to deploy multiple accelerators for CPU-FPGA heterogeneous platforms. In: Proceedings of the 18th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation, pp. 97–104 (2018)
Collange, C.: Simty: generalized SIMT execution on RISC-V. In: CARRV 2017–1st Workshop on Computer Architecture Research with RISCV, vol. 6, p. 6 (2017)
Elsabbagh, F., et al. : Vortex: OpenCL compatible RISC-V GPGPU. In: arXiv preprint arXiv:2002.12151 (2020)
Jääskeläinen, P., et al.: PoCL: a performance-portable OpenCL implementation. Int. J. Parallel Prog. 43, 752–785 (2015)
Lattner, C.: LLVM and Clang: next generation compiler technology. In: The BSD Conference, vol. 5, pp. 1–20 (2008)
Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: International Symposium on Code Generation and Optimization, 2004. CGO 2004, pp. 75–86. IEEE (2004)
Liu, Z.-J., et al.: Behavior-aware memory scheduling for GPGPU applications. Comput. Eng. Sci. 39(06), 1011
Michel, P., Levy, O., Neubig, G.: Are sixteen heads really better than one? Adv. Neural Inf. Process. Syst. 32 (2019)
Nvidia. Cuda binary utilities. NVIDIA Application Note (2014)
Sanders, J., Kandrot, E.: CUDA by Example: An Introduction to General-Purpose GPU Programming. Addison-Wesley Professional, Boston (2010)
Stone, J.E., Gohara, D., Shi, G.,: OpenCL: a parallel programming standard for heterogeneous computing systems. Comput. Sci. Eng. 12(3), 66 (2010)
Blaise, T., et al.: Vortex: extending the RISC-V ISA for GPGPU and 3D-graphics. In: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 754–766 (2021)
J. J. Corinna Vinschen. Newlib (2001). http://sourceware.org/newlib
Wang, Z., Wohlwend, J., Lei, T.: Structured pruning of large language models. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6151–6162 (2020)
Yan, H., et al.: Constructing concurrent data structures on FPGA with channels. In: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 172–177 (2019)
Yao, Y.: SE-CNN: convolution neural network acceleration via symbolic value prediction. IEEE J. Emerg. Sel. Top. Circuits Syst. (2023)
Zhang, T., et al.: A systematic DNN weight pruning framework using alternating direction method of multipliers. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 184–199 (2018)
Zhuang, Z., et al.: Discrimination-aware channel pruning for deep neural networks. Adv. Neural Inf. Process. Syst. 31 (2018)
Acknowledgements
This research is supported by National Key R &D Program of China Grants No. 2020YFB1805505, and Natural Science Foundation of Shandong Province Grant No. ZR2022LZH017.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wang, Y., Li, B., Lu, L., Wang, J., Li, R., Kan, H. (2023). Hardware-Software Co-design for Deep Neural Network Acceleration. In: Wang, Z., Wang, S., Xu, H. (eds) Service Science. ICSS 2023. Communications in Computer and Information Science, vol 1844. Springer, Singapore. https://doi.org/10.1007/978-981-99-4402-6_16
Download citation
DOI: https://doi.org/10.1007/978-981-99-4402-6_16
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-4401-9
Online ISBN: 978-981-99-4402-6
eBook Packages: Computer ScienceComputer Science (R0)