Abstract
Convolutional neural network (CNN) has been widely used in deep learning. However, the hardware consumption of the convolutional neural network is very large. Traditional Central Processing Units (CPUs) and Graphic Processing Units (GPUs) are inefficient and expensive for neural network, so an efficient hardware design is required. The proposed design based on Digital Signal Processor (DSP) has rapid operating speed and strong computation ability for training and inference of CNN. In this paper, the hardware architecture of convolution and softmax function is specially optimized. Winograd algorithm can reduce multiplications of convolution, thus decreases hardware complexity, since multiplication is much more complex in hardware implementation than addition. The softmax function is also simplified by replacing divider by subtractor and logarithmic function which cost fewer resources. The proposed hardware architecture dramatically decreases the complexity and hardware resources.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neural networks. In: International Conference on Machine Learning, pp. 1764–1772 (2014)
Jogin, M., et al.: Feature extraction using convolution neural networks (CNN) and deep learning. In: 2018 3rd IEEE International Conference on Recent Trends in Electronics, Information and Communication Technology (RTEICT). IEEE (2020)
Wang, Z., Chen, J., Wang, X.: Convolutional neural network for image feature extraction based on concurrent nested inception modules. In: 2019 15th International Conference on Computational Intelligence and Security (CIS). IEEE (2020)
Li, R., et al.: A convolutional neural network with mapping layers for hyperspectral image classification. IEEE Trans. Geoence Remote Sens. 58(5), 3136–3147 (2020)
Long, P.M., Sedghi, H.: Size-free generalization bounds for convolutional neural networks (2019)
Shao, R., Zhong, S., Yan, L.: ASIC-based architecture for the real-time computation of 2D convolution with large kernel size. In: International Symposium on Multispectral Image Processing and Pattern Recognition International Society for Optics and Photonics (2015)
Jouppi, N.P., et al.: In-datacenter performance analysis of a tensor processing unit. In: 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pp. 1–12. IEEE (2017)
Ren, H., Zhang, Z., Wu, J.: SWIFT: a computationally-intensive DSP architecture for communication applications. Mobile Networks Appl. 21(6), 974 (2016)
Spagnolo, F., et al.: Designing fast convolutional engines for deep learning applications. In: 2018 25th IEEE International Conference on Electronics, Circuits and Systems (ICECS). IEEE (2019)
Meng, L., Brothers, J.: Efficient Winograd Convolution via Integer Arithmetic (2019)
Asgari, B., Hadidi, R., Kim, H.: Proposing a fast and scalable systolic array for matrix multiplication. In: 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE (2020)
Wang, Z., Lan, Q., He, H., Zhang, C.: Winograd algorithm for 3D convolution neural networks. In: Lintas, A., Rovetta, S., Verschure, P.F.M.J., Villa, A.E.P. (eds.) ICANN 2017. LNCS, vol. 10614, pp. 609–616. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68612-7_69
Yuan, B.: Efficient hardware architecture of softmax layer in deep neural network. In: System-on-chip Conference. IEEE (2017)
Zhu, D., et al.: Efficient Precision-adjustable architecture for softmax function in deep learning. IEEE Trans. Circuits Syst. II Express Briefs, 99, 1 (2020)
Farabet, C., et al.: Hardware accelerated convolutional neural networks for synthetic vision systems. In: IEEE International Symposium on Circuits and Systems. IEEE (2010)
Acknowledgement
The authors would like to thank the editors and the reviewers for providing comments and suggestions for this paper. This work was supported in part by the National Natural Science Foundation of China under Grant Nos. 61831018, 61901199, and 61631017, and Guangdong Province Key Research and Development Program Major Science and Technology Projects under Grant 2018B010115002.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Jiang, Z., Zhang, Z., Ren, H., Wu, J. (2021). Efficient Architecture for Convolution and Softmax Function in Deep Learning Accelerator. In: Gao, H., Fan, P., Wun, J., Xiaoping, X., Yu, J., Wang, Y. (eds) Communications and Networking. ChinaCom 2020. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 352. Springer, Cham. https://doi.org/10.1007/978-3-030-67720-6_43
Download citation
DOI: https://doi.org/10.1007/978-3-030-67720-6_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67719-0
Online ISBN: 978-3-030-67720-6
eBook Packages: Computer ScienceComputer Science (R0)