Parallelizable Simple Recurrent Units with Hierarchical Memory

Qiao, Yu; Zhang, Hengyi; Sun, Pengfei; Tian, Yuan; Guan, Yong; Shao, Zhenzhou; Shi, Zhiping

doi:10.1007/978-981-99-8184-7_29

Yu Qiao^10,11,
Hengyi Zhang^10,11,
Pengfei Sun¹²,
Yuan Tian¹³,
Yong Guan^10,11,
Zhenzhou Shao^10,11 &
…
Zhiping Shi¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1969))

Included in the following conference series:

International Conference on Neural Information Processing

873 Accesses

Abstract

Recurrent neural networks and its many variants have been widely used in language modeling, text generation, machine translation, speech recognition and so forth, due to the excellent ability to process sequence data. However, the above-mentioned networks are constructed in a multi-layer stacking way, which makes the memory-dependent information in the distant past continuously decay. To this end, this paper proposes a parallelizable simple recurrent unit with hierarchical memory (PSRU-HM) to preserve more long-term historical information for inference. It is achieved by the nested SRU structure, and realizes the information interaction between inner and outer memory cell through the connection between inner and outer layers. The depth of network can be dynamically adjusted according to the task complexity. Meanwhile, a jump connection that combines high-level and low-level features is added to the outermost layer. It maximizes the utilization of effective input information. In order to accelerate the training and inference of the network, the weights of PSRU-HM are reorganized to enable the parallelization deployment in the CUDA framework. Extensive experiments are conducted to verify the proposed method using several public datasets, including text classification, language modeling and question answering. Experimental results show that PSRU-HM outperforms the traditional methods and achieves 2$\times $ speed-up compared to cuDNN-optimized LSTM.

Supported by the Natural Science Foundation of China (62272322, 62002246, 62272323), the Project of Beijing Municipal Education Commission (KM202010028010) and the Applied Basic Research Project of Liaoning Province (2022JH2/101300279).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Are 2D-LSTM really dead for offline text recognition?

Article 06 June 2019

Advanced recurrent network-based hybrid acoustic models for low resource speech recognition

Article Open access 17 July 2018

Neural Machine Translation with Recurrent Highway Networks

References

Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013)
Google Scholar
Liu, P., Qiu, X., Huang, X.: Recurrent neural network for text classification with multi-task learning. arXiv preprint arXiv:1605.05101 (2016)
Jean, S., Cho, K., Memisevic, R., Bengio, Y.: On using very large target vocabulary for neural machine translation. arXiv preprint arXiv:1412.2007 (2014)
Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 (2014)
Srivastava, R.K., Greff, K., Schmidhuber, J.: Training very deep networks. Adv. Neural Inf. Process. Syst. 28 (2015)
Google Scholar
Lei, T.: When attention meets fast recurrence: training language models with reduced compute. arXiv preprint arXiv:2102.12459 (2021)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Lei, T., Zhang, Y., Wang, S.I., Dai, H., Artzi, Y.: Simple recurrent units for highly parallelizable recurrence. arXiv preprint arXiv:1709.02755 (2017)
Ullah, R., et al.: End-to-end deep convolutional recurrent models for noise robust waveform speech enhancement. Sensors 22(20), 7782 (2022)
Article Google Scholar
Pan, J., Lei, T., Kim, K., Han, K.J., Watanabe, S.: SRU++: pioneering fast recurrence with attention for speech recognition. In: ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7872–7876. IEEE (2022)
Google Scholar
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang,P.: Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250 (2016)
Miltsakaki, E., Prasad, R., Joshi, A.K., Webber, B.L.: The penn discourse treebank. In: LREC (2004)
Google Scholar
Li, X., Roth, D.: Learning question classifiers. In: COLING 2002: The 19th International Conference on Computational Linguistics (2002)
Google Scholar
Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168–177 (2004)
Google Scholar
Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. arXiv preprint arxiv:cs/0506075 (2005)
Wiebe, J., Wilson, T., Cardie, C.: Annotating expressions of opinions and emotions in language. Lang. Res. Eval. 39, 165–210 (2005)
Article Google Scholar
Pang, B. Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. arXiv preprint arxiv:cs/0409058 (2004)
Lyu, Z., Wang, Y., Li, W., Guo, L., Yang, J., Sun, J., Liu, M., Gui, G.: Robust automatic modulation classification based on convolutional and recurrent fusion network. Phys. Commun. 43, 101213 (2020)
Article Google Scholar
Neishi, M., Yoshinaga, N.: On the relation between position information and sentence length in neural machine translation. In: Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pp. 328–338 (2019)
Google Scholar
Cui, X., Chen, Z., Yin, F.: Speech enhancement based on simple recurrent unit network. Appl. Acoust. 157, 107019 (2020)
Article Google Scholar

Download references

Author information

Authors and Affiliations

College of Information Engineering, Capital Normal University, Beijing, 100048, China
Yu Qiao, Hengyi Zhang, Yong Guan, Zhenzhou Shao & Zhiping Shi
Beijing Key Laboratory of Light Industrial Robot and Safety Verification, Capital Normal University, Beijing, 100048, China
Yu Qiao, Hengyi Zhang, Yong Guan & Zhenzhou Shao
Beijing Smartchip Microelectronics Technology Company Limited, Beijing, China
Pengfei Sun
Industrial and Commercial Bank of China Limited Beijing Branch, Beijing, 100032, China
Yuan Tian

Authors

Yu Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Hengyi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Pengfei Sun
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Tian
View author publications
You can also search for this author in PubMed Google Scholar
Yong Guan
View author publications
You can also search for this author in PubMed Google Scholar
Zhenzhou Shao
View author publications
You can also search for this author in PubMed Google Scholar
Zhiping Shi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhenzhou Shao .

Editor information

Editors and Affiliations

School of Automation, Central South University, Changsha, China
Biao Luo
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Long Cheng
Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, China
Zheng-Guang Wu
School of Automation, Guangdong University of Technology, Guangzhou, China
Hongyi Li
School of Electrical Engineering and Telecommunications, UNSW Sydney, Sydney, NSW, Australia
Chaojie Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qiao, Y. et al. (2024). Parallelizable Simple Recurrent Units with Hierarchical Memory. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1969. Springer, Singapore. https://doi.org/10.1007/978-981-99-8184-7_29

Download citation

DOI: https://doi.org/10.1007/978-981-99-8184-7_29
Published: 26 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8183-0
Online ISBN: 978-981-99-8184-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Parallelizable Simple Recurrent Units with Hierarchical Memory