Skip to main content
Log in

Speed Up the Training of Neural Machine Translation

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Neural machine translation (NMT) has achieved notable achievements in recent years. Although existing models provide reasonable translation performance, they cost too much training time. Especially, when the corpus is enormous, their computational cost will be extremely high. In this paper, we propose a novel NMT model based on the conventional bidirectional recurrent neural network (bi-RNN). In this model, we apply a tanh activation function, which can learn the future and history context information more sufficiently, to speed up the training process. Experimental results on tasks of German–English and English–French translation demonstrate that the proposed model can save much training time compared with the state-of-the-art models and provide better translation performances.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Alinejad A, Siahbani M, Sarkar A (2018) Prediction improves simultaneous neural machine translation. In: Proceedings of the 2018 conference on empirical methods in natural language processing, Brussels, Belgium, October 31–November 4, 2018, pp 3022–3027

  2. Amari S (1998) Natural gradient works efficiently in learning. Neural Comput 10(2):251–276

    Google Scholar 

  3. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473

  4. Chen X, Liu X, Wang Y, Gales MJF, Woodland PC (2016) Efficient training and evaluation of recurrent neural network language models for automatic speech recognition. IEEE/ACM Trans Audio Speech Lang Process 24(11):2146–2157

    Google Scholar 

  5. Cho K, van Merrienboer B, Gülçehre Ç, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing, EMNLP 2014, October 25–29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pp 1724–1734

  6. Gehring J, Auli M, Grangier D, Dauphin YN (2017) A convolutional encoder model for neural machine translation. In: Proceedings of the 55th annual meeting of the association for computational linguistics, ACL 2017, Vancouver, Canada, July 30–August 4, Volume 1: long papers, pp 123–135

  7. Gu J, Bradbury J, Xiong C, Li VOK, Socher R (2017) Non-autoregressive neural machine translation. CoRR arXiv:1711.02281

  8. Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012) Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580

  9. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural comput 9(8):1735–1780

    Google Scholar 

  10. Jean S, Cho K, Memisevic R, Bengio Y (2015) On using very large target vocabulary for neural machine translation. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing of the Asian federation of natural language processing, ACL 2015, July 26-31, 2015, Beijing, China, Volume 1: long papers, pp 1–10

  11. Kalchbrenner N, Blunsom P (2013) Recurrent continuous translation models. In: Proceedings of the 2013 conference on empirical methods in natural language processing, EMNLP 2013, 18–21 October 2013, Grand Hyatt Seattle, Seattle, Washington, USA, A meeting of SIGDAT, a Special Interest Group of the ACL, pp 1700–1709

  12. Kalchbrenner N, Espeholt L, Simonyan K, Oord AVD, Graves A, Kavukcuoglu K (2016) Neural machine translation in linear time. CoRR arXiv:1610.10099

  13. Klein G, Kim Y, Deng Y, Senellart J, Rush AM (2017) Opennmt: open-source toolkit for neural machine translation. In: Proceedings of the 55th annual meeting of the association for computational linguistics, ACL 2017, Vancouver, Canada, July 30–August 4, System Demonstrations, pp 67–72

  14. Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: Conference of the North American chapter of the association for computational linguistics on human language technology, NAACL2003, May 27–June 1, Edmonton, Canda, pp 48–54

  15. Lei T, Zhang Y (2017) Training rnns as fast as cnns. arXiv preprint arXiv:1709.02755

  16. Luong M, Brevdo E, Zhao R (2017) Neural machine translation (seq2seq) tutorial. https://github.com/tensorflow/nmt

  17. Luong MT, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 conference on empirical methods in natural language processing, EMNLP 2015, Lisbon, Portugal, September 17–21, 2015, pp 1412–1421

  18. Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics, ACL 2002, Pennsylvania Philadelphia, PA 19104 , July 2–12, pp 311–318. Association for Computational Linguistics

  19. Press O, Smith NA (2018) You may not need attention. CoRR arXiv:1810.13409

  20. Ranzato M, Chopra S, Auli M, Zaremba W (2015) Sequence level training with recurrent neural networks. CoRR arXiv:1511.06732

  21. Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681

    Google Scholar 

  22. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Proceedings of the 27th annual conference on neural information processing systems, NIPS 2014, December 8–13, 2014, Montreal, Quebec, Canada, pp 3104–3112

  23. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN,Kaiser L, Polosukhin I (2017) Attention is all you need. In: Proceedings of the 30th annual conference on neural information processing systems, NIPS 2017, December 4–9, 2017, Long Beach, CA, USA, pp 6000–6010

  24. Wu L, Xia Y, Zhao L, Tian F, Qin T, Lai J, Liu TY (2017) Adversarial neural machine translation. arXiv preprint arXiv:1704.06933

  25. Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K et al. (2016) Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144

  26. Yan Y, Wang Y, Gao W, Zhang B, Yang C, Yin X (2018) \(\text{ Lstm }^{2}\): multi-label ranking for document classification. Neural Process Lett 47(1):117–138

    Google Scholar 

  27. Zhang B, Xiong D, Su J, Duan H (2017) A context-aware recurrent encoder for neural machine translation. IEEE/ACM Trans Audio Speech Lang Process 25(12):2424–2432

    Google Scholar 

  28. Zhang D, Kim J, Crego JM, Senellart J (2017) Boosting neural machine translation. In: Proceedings of the eighth international joint conference on natural language processing, IJCNLP 2017, Taipei, Taiwan, November 27–December 1, 2017, Volume 2: short papers, pp 271–276

  29. Zhou J, Cao Y, Wang X, Li P, Xu W (2016) Deep recurrent models with fast-forward connections for neural machine translation. TACL 4:371–383

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenxin Liang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by National Science Foundation of China (Nos. 61632019, 61876028).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, X., Wang, W., Liang, W. et al. Speed Up the Training of Neural Machine Translation. Neural Process Lett 51, 231–249 (2020). https://doi.org/10.1007/s11063-019-10084-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-019-10084-y

Keywords

Navigation