Abstract
Recurrent neural network language models (RNNLMs) have been applied in a wide range of research fields, including nature language processing and speech recognition. One challenge in training RNNLMs is the heavy computational cost of the crucial back-propagation (BP) algorithm. This paper presents an effective approach to train recurrent neural network on multiple GPUs, where parallelized stochastic gradient descent (SGD) is applied. Results on text-based experiments show that the proposed approach achieves \(3.4\times \) speedup on 4 GPUs than the single one, without any performance loss in language model perplexity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Bengio, Y., Ducharme, R., Vincent, P., et al.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
Schwenk, H., Gauvain, J.L.: Training neural network language models on very large corpora. In: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp. 201–208 (2005)
Mikolov, T., Karafià t, M., Burget, L., et al.: Recurrent neural network based language model. In: INTERSPEECH, pp. 1045–1048 (2010)
Sak, H., Senior, A., Beaufays, F.: Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH) (2014)
Mikolov, T., Kombrink, S., Lukas Burget, J.H., Cernocky, S.K.: Extensions of recurrent neural network language model. In: Proceedings of the ICASSP, pp. 5528–5531. IEEE (2011)
Sutskever, I., Martens, J., Hinton, G.E.: Generating text with recurrent neural networks. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 1017–1024 (2011)
Chen, X., Wang, Y., Liu, X., et al.: Efficient GPU-based training of recurrent neural network language models using spliced sentence bunch. Submitted to Proceedings of the ISCA Interspeech (2014)
Chen, X., Liu, X., Gales, M.J.F., et al.: Improving the training and evaluation efficiency of recurrent neural network language models. In: Proceedings of the IEEE ICASSP, Brisbane, Australia (2015)
Mikolov, T., Kombrink, S., Deoras, A., et al.: RNNLM-Recurrent neural network language modeling toolkit. In: Proceedings of the 2011 ASRU Workshop, pp. 196–201 (2011)
Hinton, G., Deng, L., Yu, D., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)
Jaitly, N., Nguyen, P., Senior, A.W., et al.: Application of pretrained deep neural networks to large vocabulary speech recognition. In: INTERSPEECH (2012)
Dahl, G.E., Yu, D., Deng, L., et al.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)
Dean, J., Corrado, G., Monga, R., et al.: Large scale distributed deep networks. In: Advances in Neural Information Processing Systems, pp. 1223–1231 (2012)
Hinton, G., Rumelhart, D., Williams, R.: Learning representations by back-propagating errors. Nature 323, 533–535 (1986)
Nvidia, C.: Programming guide (2008)
Stolcke, A.: SRILM-an extensible language modeling toolkit. In: INTERSPEECH 2002 (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media Singapore
About this paper
Cite this paper
Zhang, X., Gu, N., Ye, H. (2016). Multi-GPU Based Recurrent Neural Network Language Model Training. In: Che, W., et al. Social Computing. ICYCSEE 2016. Communications in Computer and Information Science, vol 623. Springer, Singapore. https://doi.org/10.1007/978-981-10-2053-7_43
Download citation
DOI: https://doi.org/10.1007/978-981-10-2053-7_43
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-2052-0
Online ISBN: 978-981-10-2053-7
eBook Packages: Computer ScienceComputer Science (R0)