Skip to main content
Log in

Knowledge distillation-based performance transferring for LSTM-RNN model acceleration

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

The sequence data processing, such as signal classification, is an important part of pattern recognition. Long short-term memory recurrent neural networks (LSTM-RNN) are widely applicable across the sequence data processing due to the ability to learn long-term dynamics while avoiding vanishing and exploding gradient problems. However, the high cost of LSTM-RNN models in computation is the main obstacle to deploy LSTM-RNN models on devices with limited resources. In this paper, we propose performance transferring of LSTM-RNN models based on knowledge distillation for LSTM-RNN model acceleration to overcome this obstacle. Firstly, we propose a paradigm for transferring the performance of LSTM-RNN models to lightweight convolutional neural network (CNN) models. Then, based on the paradigm, we define a novel loss that utilizes the prediction of an LSTM-RNN model to train a lightweight CNN model. Experiments results on two sequence data processing tasks, automatic modulation classification and text classification, show that the proposed paradigm is effective and the proposed loss makes CNN models with low time consumption and few parameters achieve higher accuracies and generate similar category distributions to LSTM-RNN models. Consequently, CNN models trained in the proposed method can be utilized to replace LSTM-RNN models for LSTM-RNN model acceleration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. The code of this structure refers to https://github.com/richliao/textclassifier.

References

  1. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  2. Tang, Z., Wang, D., Chen, Y., Li, L., Abel, A.: Phonetic temporal neural model for language identification. IEEE/ACM Trans. Audio Speech Lang. Process. 26(1), 134–144 (2017)

    Article  Google Scholar 

  3. Graves, A., Jaitly, N., Mohamed, A.r.: Hybrid speech recognition with deep bidirectional lstm. In: 2013 IEEE workshop on automatic speech recognition and understanding, pp. 273–278. IEEE (2013)

  4. Jedrzejewska, M.K., Zjawinski, A., Stasiak, B.: Generating musical expression of midi music with lstm neural network. In: 2018 11th International Conference on Human System Interaction (HSI), pp. 132–138. IEEE (2018)

  5. Kumar, S.D., Subha, D.: Prediction of depression from eeg signal using long short term memory (lstm). In: 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI), pp. 1248–1253. IEEE (2019)

  6. Chen, X., Du, J., Zhang, H.: Lipreading with densenet and resbi-lstm, pp. 1–9. Signal, Image and Video Processing pp (2020)

  7. Smagulova, K., James, A.P.: A survey on LSTM memristive neural network architectures and applications. Eur. Phys. J. Spec. Top. 228(10), 2313–2324 (2019)

    Article  Google Scholar 

  8. Zhang, Y., Wang, C., Gong, L., Lu, Y., Sun, F., Xu, C., Li, X., Zhou, X.: A power-efficient accelerator based on fpgas for lstm network. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp. 629–630. IEEE (2017)

  9. Ma, H., Xu, G., Meng, H., Wang, M., Yang, S., Wu, R., Wang, W.: Cross model deep learning scheme for automatic modulation classification. IEEE Access 8, 78923–78931 (2020)

    Article  Google Scholar 

  10. Kayode, O., Tosun, A.S.: Lirul: A lightweight lstm based model for remaining useful life estimation at the edge. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 2, pp. 177–182. IEEE (2019)

  11. Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Machine Learning, Proceedings of the Twenty-Fifth International Conference (ICML 2008), Helsinki, Finland, June 5-9, 2008 (2008)

  12. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)

  13. Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550 (2014)

  14. Zagoruyko, S., Komodakis, N.: Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv preprint arXiv:1612.03928 (2016)

  15. Zhou, G., Fan, Y., Cui, R., Bian, W., Zhu, X., Gai, K.: Rocket launching: A universal and efficient framework for training well-performing light net. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)

  16. Ma, H., Yang, S., Feng, D., Jiao, L., Zhang, L.: Progressive mimic learning: A new perspective to train lightweight CNN models. Neurocomput. 456, 220–231 (2021)

    Article  Google Scholar 

  17. Xu, Z., Hsu, Y.C., Huang, J.: Training shallow and thin networks for acceleration via knowledge distillation with conditional adversarial networks. arXiv preprint arXiv:1709.00513 (2017)

  18. Quan, T.M., Nguyen-Duc, T., Jeong, W.K.: Compressed sensing MRI reconstruction using a generative adversarial network with a cyclic loss. IEEE Trans. Med. Imag. 37(6), 1488–1497 (2018)

    Article  Google Scholar 

  19. Rajendran, S., Meert, W., Giustiniano, D., Lenders, V., Pollin, S.: Deep learning models for wireless signal classification with distributed low-cost spectrum sensors. IEEE Trans. Cogn. Commun. Netw. 4(3), 433–445 (2018)

    Article  Google Scholar 

  20. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  21. Gal, Y., Ghahramani, Z.: A theoretically grounded application of dropout in recurrent neural networks. In: Advances in neural information processing systems, pp. 1019–1027 (2016)

  22. Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp. 315–323 (2011)

  23. Howard, A., Sandler, M., Chu, G., Chen, L.C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V., et al.: Searching for mobilenetv3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324 (2019)

Download references

Author information

Authors and Affiliations

Authors

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by the State Key Laboratory Foundation of Complex Electromagnetic Environment Effects on Electronics and Information System (Nos. CEMEE2019K0203B and CEMEE2019Z0101), Supported by Innovation Capability Support Program of Shaanxi (Program No. 2020TD-017), and Supported by the National Natural Science Foundation of China (Grant No. 62171357).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, H., Yang, S., Wu, R. et al. Knowledge distillation-based performance transferring for LSTM-RNN model acceleration. SIViP 16, 1541–1548 (2022). https://doi.org/10.1007/s11760-021-02108-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-021-02108-9

Keywords

Navigation