Abstract
Neural machine translation (NMT) has achieved notable achievements in recent years. Although existing models provide reasonable translation performance, they cost too much training time. Especially, when the corpus is enormous, their computational cost will be extremely high. In this paper, we propose a novel NMT model based on the conventional bidirectional recurrent neural network (bi-RNN). In this model, we apply a tanh activation function, which can learn the future and history context information more sufficiently, to speed up the training process. Experimental results on tasks of German–English and English–French translation demonstrate that the proposed model can save much training time compared with the state-of-the-art models and provide better translation performances.
Similar content being viewed by others
References
Alinejad A, Siahbani M, Sarkar A (2018) Prediction improves simultaneous neural machine translation. In: Proceedings of the 2018 conference on empirical methods in natural language processing, Brussels, Belgium, October 31–November 4, 2018, pp 3022–3027
Amari S (1998) Natural gradient works efficiently in learning. Neural Comput 10(2):251–276
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473
Chen X, Liu X, Wang Y, Gales MJF, Woodland PC (2016) Efficient training and evaluation of recurrent neural network language models for automatic speech recognition. IEEE/ACM Trans Audio Speech Lang Process 24(11):2146–2157
Cho K, van Merrienboer B, Gülçehre Ç, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing, EMNLP 2014, October 25–29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pp 1724–1734
Gehring J, Auli M, Grangier D, Dauphin YN (2017) A convolutional encoder model for neural machine translation. In: Proceedings of the 55th annual meeting of the association for computational linguistics, ACL 2017, Vancouver, Canada, July 30–August 4, Volume 1: long papers, pp 123–135
Gu J, Bradbury J, Xiong C, Li VOK, Socher R (2017) Non-autoregressive neural machine translation. CoRR arXiv:1711.02281
Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012) Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural comput 9(8):1735–1780
Jean S, Cho K, Memisevic R, Bengio Y (2015) On using very large target vocabulary for neural machine translation. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing of the Asian federation of natural language processing, ACL 2015, July 26-31, 2015, Beijing, China, Volume 1: long papers, pp 1–10
Kalchbrenner N, Blunsom P (2013) Recurrent continuous translation models. In: Proceedings of the 2013 conference on empirical methods in natural language processing, EMNLP 2013, 18–21 October 2013, Grand Hyatt Seattle, Seattle, Washington, USA, A meeting of SIGDAT, a Special Interest Group of the ACL, pp 1700–1709
Kalchbrenner N, Espeholt L, Simonyan K, Oord AVD, Graves A, Kavukcuoglu K (2016) Neural machine translation in linear time. CoRR arXiv:1610.10099
Klein G, Kim Y, Deng Y, Senellart J, Rush AM (2017) Opennmt: open-source toolkit for neural machine translation. In: Proceedings of the 55th annual meeting of the association for computational linguistics, ACL 2017, Vancouver, Canada, July 30–August 4, System Demonstrations, pp 67–72
Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: Conference of the North American chapter of the association for computational linguistics on human language technology, NAACL2003, May 27–June 1, Edmonton, Canda, pp 48–54
Lei T, Zhang Y (2017) Training rnns as fast as cnns. arXiv preprint arXiv:1709.02755
Luong M, Brevdo E, Zhao R (2017) Neural machine translation (seq2seq) tutorial. https://github.com/tensorflow/nmt
Luong MT, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 conference on empirical methods in natural language processing, EMNLP 2015, Lisbon, Portugal, September 17–21, 2015, pp 1412–1421
Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics, ACL 2002, Pennsylvania Philadelphia, PA 19104 , July 2–12, pp 311–318. Association for Computational Linguistics
Press O, Smith NA (2018) You may not need attention. CoRR arXiv:1810.13409
Ranzato M, Chopra S, Auli M, Zaremba W (2015) Sequence level training with recurrent neural networks. CoRR arXiv:1511.06732
Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Proceedings of the 27th annual conference on neural information processing systems, NIPS 2014, December 8–13, 2014, Montreal, Quebec, Canada, pp 3104–3112
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN,Kaiser L, Polosukhin I (2017) Attention is all you need. In: Proceedings of the 30th annual conference on neural information processing systems, NIPS 2017, December 4–9, 2017, Long Beach, CA, USA, pp 6000–6010
Wu L, Xia Y, Zhao L, Tian F, Qin T, Lai J, Liu TY (2017) Adversarial neural machine translation. arXiv preprint arXiv:1704.06933
Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K et al. (2016) Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144
Yan Y, Wang Y, Gao W, Zhang B, Yang C, Yin X (2018) \(\text{ Lstm }^{2}\): multi-label ranking for document classification. Neural Process Lett 47(1):117–138
Zhang B, Xiong D, Su J, Duan H (2017) A context-aware recurrent encoder for neural machine translation. IEEE/ACM Trans Audio Speech Lang Process 25(12):2424–2432
Zhang D, Kim J, Crego JM, Senellart J (2017) Boosting neural machine translation. In: Proceedings of the eighth international joint conference on natural language processing, IJCNLP 2017, Taipei, Taiwan, November 27–December 1, 2017, Volume 2: short papers, pp 271–276
Zhou J, Cao Y, Wang X, Li P, Xu W (2016) Deep recurrent models with fast-forward connections for neural machine translation. TACL 4:371–383
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported by National Science Foundation of China (Nos. 61632019, 61876028).
Rights and permissions
About this article
Cite this article
Liu, X., Wang, W., Liang, W. et al. Speed Up the Training of Neural Machine Translation. Neural Process Lett 51, 231–249 (2020). https://doi.org/10.1007/s11063-019-10084-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-019-10084-y