Abstract
The sequence data processing, such as signal classification, is an important part of pattern recognition. Long short-term memory recurrent neural networks (LSTM-RNN) are widely applicable across the sequence data processing due to the ability to learn long-term dynamics while avoiding vanishing and exploding gradient problems. However, the high cost of LSTM-RNN models in computation is the main obstacle to deploy LSTM-RNN models on devices with limited resources. In this paper, we propose performance transferring of LSTM-RNN models based on knowledge distillation for LSTM-RNN model acceleration to overcome this obstacle. Firstly, we propose a paradigm for transferring the performance of LSTM-RNN models to lightweight convolutional neural network (CNN) models. Then, based on the paradigm, we define a novel loss that utilizes the prediction of an LSTM-RNN model to train a lightweight CNN model. Experiments results on two sequence data processing tasks, automatic modulation classification and text classification, show that the proposed paradigm is effective and the proposed loss makes CNN models with low time consumption and few parameters achieve higher accuracies and generate similar category distributions to LSTM-RNN models. Consequently, CNN models trained in the proposed method can be utilized to replace LSTM-RNN models for LSTM-RNN model acceleration.
Similar content being viewed by others
Notes
The code of this structure refers to https://github.com/richliao/textclassifier.
References
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Tang, Z., Wang, D., Chen, Y., Li, L., Abel, A.: Phonetic temporal neural model for language identification. IEEE/ACM Trans. Audio Speech Lang. Process. 26(1), 134–144 (2017)
Graves, A., Jaitly, N., Mohamed, A.r.: Hybrid speech recognition with deep bidirectional lstm. In: 2013 IEEE workshop on automatic speech recognition and understanding, pp. 273–278. IEEE (2013)
Jedrzejewska, M.K., Zjawinski, A., Stasiak, B.: Generating musical expression of midi music with lstm neural network. In: 2018 11th International Conference on Human System Interaction (HSI), pp. 132–138. IEEE (2018)
Kumar, S.D., Subha, D.: Prediction of depression from eeg signal using long short term memory (lstm). In: 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI), pp. 1248–1253. IEEE (2019)
Chen, X., Du, J., Zhang, H.: Lipreading with densenet and resbi-lstm, pp. 1–9. Signal, Image and Video Processing pp (2020)
Smagulova, K., James, A.P.: A survey on LSTM memristive neural network architectures and applications. Eur. Phys. J. Spec. Top. 228(10), 2313–2324 (2019)
Zhang, Y., Wang, C., Gong, L., Lu, Y., Sun, F., Xu, C., Li, X., Zhou, X.: A power-efficient accelerator based on fpgas for lstm network. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp. 629–630. IEEE (2017)
Ma, H., Xu, G., Meng, H., Wang, M., Yang, S., Wu, R., Wang, W.: Cross model deep learning scheme for automatic modulation classification. IEEE Access 8, 78923–78931 (2020)
Kayode, O., Tosun, A.S.: Lirul: A lightweight lstm based model for remaining useful life estimation at the edge. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 2, pp. 177–182. IEEE (2019)
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Machine Learning, Proceedings of the Twenty-Fifth International Conference (ICML 2008), Helsinki, Finland, June 5-9, 2008 (2008)
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550 (2014)
Zagoruyko, S., Komodakis, N.: Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv preprint arXiv:1612.03928 (2016)
Zhou, G., Fan, Y., Cui, R., Bian, W., Zhu, X., Gai, K.: Rocket launching: A universal and efficient framework for training well-performing light net. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Ma, H., Yang, S., Feng, D., Jiao, L., Zhang, L.: Progressive mimic learning: A new perspective to train lightweight CNN models. Neurocomput. 456, 220–231 (2021)
Xu, Z., Hsu, Y.C., Huang, J.: Training shallow and thin networks for acceleration via knowledge distillation with conditional adversarial networks. arXiv preprint arXiv:1709.00513 (2017)
Quan, T.M., Nguyen-Duc, T., Jeong, W.K.: Compressed sensing MRI reconstruction using a generative adversarial network with a cyclic loss. IEEE Trans. Med. Imag. 37(6), 1488–1497 (2018)
Rajendran, S., Meert, W., Giustiniano, D., Lenders, V., Pollin, S.: Deep learning models for wireless signal classification with distributed low-cost spectrum sensors. IEEE Trans. Cogn. Commun. Netw. 4(3), 433–445 (2018)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Gal, Y., Ghahramani, Z.: A theoretically grounded application of dropout in recurrent neural networks. In: Advances in neural information processing systems, pp. 1019–1027 (2016)
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp. 315–323 (2011)
Howard, A., Sandler, M., Chu, G., Chen, L.C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V., et al.: Searching for mobilenetv3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324 (2019)
Author information
Authors and Affiliations
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported by the State Key Laboratory Foundation of Complex Electromagnetic Environment Effects on Electronics and Information System (Nos. CEMEE2019K0203B and CEMEE2019Z0101), Supported by Innovation Capability Support Program of Shaanxi (Program No. 2020TD-017), and Supported by the National Natural Science Foundation of China (Grant No. 62171357).
Rights and permissions
About this article
Cite this article
Ma, H., Yang, S., Wu, R. et al. Knowledge distillation-based performance transferring for LSTM-RNN model acceleration. SIViP 16, 1541–1548 (2022). https://doi.org/10.1007/s11760-021-02108-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-021-02108-9