Multi-GPU Based Recurrent Neural Network Language Model Training

Zhang, Xiaoci; Gu, Naijie; Ye, Hong

doi:10.1007/978-981-10-2053-7_43

Xiaoci Zhang²⁰,
Naijie Gu²⁰ &
Hong Ye²⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 623))

Included in the following conference series:

International Conference of Pioneering Computer Scientists, Engineers and Educators

1384 Accesses
3 Citations

Abstract

Recurrent neural network language models (RNNLMs) have been applied in a wide range of research fields, including nature language processing and speech recognition. One challenge in training RNNLMs is the heavy computational cost of the crucial back-propagation (BP) algorithm. This paper presents an effective approach to train recurrent neural network on multiple GPUs, where parallelized stochastic gradient descent (SGD) is applied. Results on text-based experiments show that the proposed approach achieves \(3.4\times \) speedup on 4 GPUs than the single one, without any performance loss in language model perplexity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.cis.upenn.edu/treebank/.
2.
available at http://corpora2.informatik.uni-leipzig.de/download.html.

References

Bengio, Y., Ducharme, R., Vincent, P., et al.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
MATH Google Scholar
Schwenk, H., Gauvain, J.L.: Training neural network language models on very large corpora. In: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp. 201–208 (2005)
Google Scholar
Mikolov, T., Karafiàt, M., Burget, L., et al.: Recurrent neural network based language model. In: INTERSPEECH, pp. 1045–1048 (2010)
Google Scholar
Sak, H., Senior, A., Beaufays, F.: Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH) (2014)
Google Scholar
Mikolov, T., Kombrink, S., Lukas Burget, J.H., Cernocky, S.K.: Extensions of recurrent neural network language model. In: Proceedings of the ICASSP, pp. 5528–5531. IEEE (2011)
Google Scholar
Sutskever, I., Martens, J., Hinton, G.E.: Generating text with recurrent neural networks. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 1017–1024 (2011)
Google Scholar
Chen, X., Wang, Y., Liu, X., et al.: Efficient GPU-based training of recurrent neural network language models using spliced sentence bunch. Submitted to Proceedings of the ISCA Interspeech (2014)
Google Scholar
Chen, X., Liu, X., Gales, M.J.F., et al.: Improving the training and evaluation efficiency of recurrent neural network language models. In: Proceedings of the IEEE ICASSP, Brisbane, Australia (2015)
Google Scholar
Mikolov, T., Kombrink, S., Deoras, A., et al.: RNNLM-Recurrent neural network language modeling toolkit. In: Proceedings of the 2011 ASRU Workshop, pp. 196–201 (2011)
Google Scholar
Hinton, G., Deng, L., Yu, D., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)
Article Google Scholar
Jaitly, N., Nguyen, P., Senior, A.W., et al.: Application of pretrained deep neural networks to large vocabulary speech recognition. In: INTERSPEECH (2012)
Google Scholar
Dahl, G.E., Yu, D., Deng, L., et al.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)
Article Google Scholar
Dean, J., Corrado, G., Monga, R., et al.: Large scale distributed deep networks. In: Advances in Neural Information Processing Systems, pp. 1223–1231 (2012)
Google Scholar
Hinton, G., Rumelhart, D., Williams, R.: Learning representations by back-propagating errors. Nature 323, 533–535 (1986)
Article Google Scholar
Nvidia, C.: Programming guide (2008)
Google Scholar
Stolcke, A.: SRILM-an extensible language modeling toolkit. In: INTERSPEECH 2002 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, University of Science and Technology of China, Hefei, China
Xiaoci Zhang, Naijie Gu & Hong Ye

Authors

Xiaoci Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Naijie Gu
View author publications
You can also search for this author in PubMed Google Scholar
Hong Ye
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoci Zhang .

Editor information

Editors and Affiliations

Harbin Institute of Technology , Harbin, China
Wanxiang Che
Harbin Engineering University , Harbin, China
Qilong Han
Harbin Institute of Technology , Harbin, China
Hongzhi Wang
Northeast Forestry University , Harbin, China
Weipeng Jing
National University of Defense Technology , Changsha, China
Shaoliang Peng
Harbin Engineering University , Harbin, China
Junyu Lin
Harbin Univ. of Science and Technology , Harbin, China
Guanglu Sun
Harbin Univ. of Science and Technology , Harbin, China
Xianhua Song
Harbin Engineering University , Harbin, China
Hongtao Song
Harbin Sea of Clouds & Computer Tech. , Harbin, China
Zeguang Lu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, X., Gu, N., Ye, H. (2016). Multi-GPU Based Recurrent Neural Network Language Model Training. In: Che, W., et al. Social Computing. ICYCSEE 2016. Communications in Computer and Information Science, vol 623. Springer, Singapore. https://doi.org/10.1007/978-981-10-2053-7_43

Download citation

DOI: https://doi.org/10.1007/978-981-10-2053-7_43
Published: 31 July 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-2052-0
Online ISBN: 978-981-10-2053-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics