Knowledge distillation-based performance transferring for LSTM-RNN model acceleration

Ma, Hongbin; Yang, Shuyuan; Wu, Ruowu; Hao, Xiaojun; Long, Huimin; He, Guangjun

doi:10.1007/s11760-021-02108-9

Knowledge distillation-based performance transferring for LSTM-RNN model acceleration

Original Paper
Published: 29 January 2022

Volume 16, pages 1541–1548, (2022)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Hongbin Ma ORCID: orcid.org/0000-0003-1931-7203^1,2,
Shuyuan Yang²,
Ruowu Wu¹,
Xiaojun Hao¹,
Huimin Long³ &
…
Guangjun He⁴

752 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

The sequence data processing, such as signal classification, is an important part of pattern recognition. Long short-term memory recurrent neural networks (LSTM-RNN) are widely applicable across the sequence data processing due to the ability to learn long-term dynamics while avoiding vanishing and exploding gradient problems. However, the high cost of LSTM-RNN models in computation is the main obstacle to deploy LSTM-RNN models on devices with limited resources. In this paper, we propose performance transferring of LSTM-RNN models based on knowledge distillation for LSTM-RNN model acceleration to overcome this obstacle. Firstly, we propose a paradigm for transferring the performance of LSTM-RNN models to lightweight convolutional neural network (CNN) models. Then, based on the paradigm, we define a novel loss that utilizes the prediction of an LSTM-RNN model to train a lightweight CNN model. Experiments results on two sequence data processing tasks, automatic modulation classification and text classification, show that the proposed paradigm is effective and the proposed loss makes CNN models with low time consumption and few parameters achieve higher accuracies and generate similar category distributions to LSTM-RNN models. Consequently, CNN models trained in the proposed method can be utilized to replace LSTM-RNN models for LSTM-RNN model acceleration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhanced LSTM with Batch Normalization

Deep RNN Architecture: Design and Evaluation

Learning Sparse Hidden States in Long Short-Term Memory

Notes

The code of this structure refers to https://github.com/richliao/textclassifier.

References

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Tang, Z., Wang, D., Chen, Y., Li, L., Abel, A.: Phonetic temporal neural model for language identification. IEEE/ACM Trans. Audio Speech Lang. Process. 26(1), 134–144 (2017)
Article Google Scholar
Graves, A., Jaitly, N., Mohamed, A.r.: Hybrid speech recognition with deep bidirectional lstm. In: 2013 IEEE workshop on automatic speech recognition and understanding, pp. 273–278. IEEE (2013)
Jedrzejewska, M.K., Zjawinski, A., Stasiak, B.: Generating musical expression of midi music with lstm neural network. In: 2018 11th International Conference on Human System Interaction (HSI), pp. 132–138. IEEE (2018)
Kumar, S.D., Subha, D.: Prediction of depression from eeg signal using long short term memory (lstm). In: 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI), pp. 1248–1253. IEEE (2019)
Chen, X., Du, J., Zhang, H.: Lipreading with densenet and resbi-lstm, pp. 1–9. Signal, Image and Video Processing pp (2020)
Smagulova, K., James, A.P.: A survey on LSTM memristive neural network architectures and applications. Eur. Phys. J. Spec. Top. 228(10), 2313–2324 (2019)
Article Google Scholar
Zhang, Y., Wang, C., Gong, L., Lu, Y., Sun, F., Xu, C., Li, X., Zhou, X.: A power-efficient accelerator based on fpgas for lstm network. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp. 629–630. IEEE (2017)
Ma, H., Xu, G., Meng, H., Wang, M., Yang, S., Wu, R., Wang, W.: Cross model deep learning scheme for automatic modulation classification. IEEE Access 8, 78923–78931 (2020)
Article Google Scholar
Kayode, O., Tosun, A.S.: Lirul: A lightweight lstm based model for remaining useful life estimation at the edge. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 2, pp. 177–182. IEEE (2019)
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Machine Learning, Proceedings of the Twenty-Fifth International Conference (ICML 2008), Helsinki, Finland, June 5-9, 2008 (2008)
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550 (2014)
Zagoruyko, S., Komodakis, N.: Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv preprint arXiv:1612.03928 (2016)
Zhou, G., Fan, Y., Cui, R., Bian, W., Zhu, X., Gai, K.: Rocket launching: A universal and efficient framework for training well-performing light net. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Ma, H., Yang, S., Feng, D., Jiao, L., Zhang, L.: Progressive mimic learning: A new perspective to train lightweight CNN models. Neurocomput. 456, 220–231 (2021)
Article Google Scholar
Xu, Z., Hsu, Y.C., Huang, J.: Training shallow and thin networks for acceleration via knowledge distillation with conditional adversarial networks. arXiv preprint arXiv:1709.00513 (2017)
Quan, T.M., Nguyen-Duc, T., Jeong, W.K.: Compressed sensing MRI reconstruction using a generative adversarial network with a cyclic loss. IEEE Trans. Med. Imag. 37(6), 1488–1497 (2018)
Article Google Scholar
Rajendran, S., Meert, W., Giustiniano, D., Lenders, V., Pollin, S.: Deep learning models for wireless signal classification with distributed low-cost spectrum sensors. IEEE Trans. Cogn. Commun. Netw. 4(3), 433–445 (2018)
Article Google Scholar
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Gal, Y., Ghahramani, Z.: A theoretically grounded application of dropout in recurrent neural networks. In: Advances in neural information processing systems, pp. 1019–1027 (2016)
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp. 315–323 (2011)
Howard, A., Sandler, M., Chu, G., Chen, L.C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V., et al.: Searching for mobilenetv3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324 (2019)

Download references

Author information

Authors and Affiliations

State Key Laboratory of Complex Electromagnetic Environment Effects on Electronics and Information System, Luoyang, 471003, China
Hongbin Ma, Ruowu Wu & Xiaojun Hao
School of Artificial Intelligence, Xidian University, Xi’an, 710071, China
Hongbin Ma & Shuyuan Yang
Southwest China Institute of Electronic Technology, Chengdu, 610036, China
Huimin Long
State Key Laboratory of Space-Ground Integrated Information Technology, Beijing, 100086, China
Guangjun He

Authors

Hongbin Ma
View author publications
You can also search for this author in PubMed Google Scholar
Shuyuan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Ruowu Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaojun Hao
View author publications
You can also search for this author in PubMed Google Scholar
Huimin Long
View author publications
You can also search for this author in PubMed Google Scholar
Guangjun He
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by the State Key Laboratory Foundation of Complex Electromagnetic Environment Effects on Electronics and Information System (Nos. CEMEE2019K0203B and CEMEE2019Z0101), Supported by Innovation Capability Support Program of Shaanxi (Program No. 2020TD-017), and Supported by the National Natural Science Foundation of China (Grant No. 62171357).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ma, H., Yang, S., Wu, R. et al. Knowledge distillation-based performance transferring for LSTM-RNN model acceleration. SIViP 16, 1541–1548 (2022). https://doi.org/10.1007/s11760-021-02108-9

Download citation

Received: 12 June 2020
Revised: 24 October 2021
Accepted: 27 October 2021
Published: 29 January 2022
Issue Date: September 2022
DOI: https://doi.org/10.1007/s11760-021-02108-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Knowledge distillation-based performance transferring for LSTM-RNN model acceleration

Abstract

Access this article

Similar content being viewed by others

Enhanced LSTM with Batch Normalization

Deep RNN Architecture: Design and Evaluation

Learning Sparse Hidden States in Long Short-Term Memory

Notes

References

Author information

Authors and Affiliations

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Knowledge distillation-based performance transferring for LSTM-RNN model acceleration

Abstract

Access this article

Similar content being viewed by others

Enhanced LSTM with Batch Normalization

Deep RNN Architecture: Design and Evaluation

Learning Sparse Hidden States in Long Short-Term Memory

Notes

References

Author information

Authors and Affiliations

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation