Skip to main content
Log in

LenM: Improving Low-Resource Neural Machine Translation Using Target Length Modeling

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Neural machine translation (NMT) is a hot field in artificial intelligence which aims at translating a text from a source language into a different target language. Although NMT systems perform quite well in high-resource setup, but their performance for low-resource data is low. One aspect of data scarcity is the lack of diversity in the sentence length of training data. Also, since we usually set a maximum sentence length during training, we observe degeneration in the translation of sentences longer than the max length. In this paper, we propose LenM—a method to model the length of a target (translated) sentence given the source sentence using a deep recurrent neural structure—and apply it to the decoder side of neural machine translation systems to generate translation sentences with appropriate lengths which have a better quality. Our proposed method helps to fix some drawbacks of NMT like output degradation on unseen sentence lengths, and the limitation of using larger beam sizes in the decoding phase of translation. This method can be applied to any NMT model regardless of the structure and does not slow down the translation speed. Moreover, it can be used efficiently in non-autoregressive machine translation systems which need to know the target length before decoding. The final outcome of this paper is improving the output quality of neural machine translation systems when trained on low-resource corpora. Our experiments show the superior performance of the proposed method compared to the state-of-the-art neural machine translation systems when facing target length mismatch in training and inference, with up to 9.82 BLEU points improvement for German-to-English translation and up to 6.28 BLEU points improvement for Arabic-to-English translation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Kalchbrenner N, Blunsom P (2013) Recurrent continuous translation models. In: Proceedings of the 2013 conference on empirical methods in natural language processing (EMNLP 2013), pp 1700–1709

  2. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In Advances in neural information processing systems (NIPS), pp 3104–3112

  3. Cho K, Merrienboer B, Gulcehre C, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP 2014)

  4. Bahdanau D, Cho KH, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: 3rd international conference on learning representations (ICLR 2015)

  5. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008

  6. Haddow B, Bawden R, Barone AVM, Helcl J, Birch A (2022) Survey of low-resource machine translation. Comput Linguist (COLING) 48(3):673–732

    Article  Google Scholar 

  7. Koehn P, Knowles R (2017) Six challenges for neural machine translation. In: Proceedings of the first workshop on neural machine translation, pp 28–39

  8. Stahlberg F, Byrne B (2019) On NMT search errors and model errors: cat got your tongue? In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 3356–3362

  9. Shaw P, Uszkoreit J, Vaswani A (2018) Self-attention with relative position representations. In: Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 2 (short papers), pp 464–468

  10. Gu J, Bradbury J, Xiong C, Li VO, Socher R (2018) Non-autoregressive neural machine translation. In: International conference on learning representations (ICLR)

  11. Lee J, Mansimov E, Cho K (2018) Deterministic non-autoregressive neural sequence modeling by iterative refinement. In: 2018 Conference on Empirical methods in natural language processing (EMNLP 2018), Association for Computational Linguistics (ACL), pp 1173–1182

  12. Ghazvininejad M, Levy O, Liu Y, Zettlemoyer L (2019) Mask-predict: parallel decoding of conditional masked language models. In: Proceedings of the 2019 Conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 6112–6121

  13. Murray K, Chiang D (2018) Correcting length bias in neural machine translation. In: Proceedings of the third conference on machine translation: research papers, pp 212–223

  14. Graves A, Schmidhuber J (2005) Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw 18:602–610

    Article  Google Scholar 

  15. Ray A, Rajeswar S, Chaudhury S (2015) Text recognition using deep BLSTM networks. In: 2015 Eighth international conference on advances in pattern recognition (ICAPR)

  16. Lefebvre G, Berlemont S, Mamalet F, Garcia C (2013) BLSTM-RNN based 3D gesture classification. In: International conference on artificial neural networks

  17. Fu SW, Tsao Y, Hwang HT, Wang HM (2018) Quality-net: an end-to-end non-intrusive speech quality assessment model based on BLSTM. arXiv:1808.05344

  18. Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45:2673–2681

    Article  Google Scholar 

  19. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780

    Article  Google Scholar 

  20. Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K (2016) Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144

  21. Graves A (2013) Generating sequences with recurrent neural networks. arXiv:1308.0850

  22. Graves A, Wayne G, Danihelka I (2014) Neural turing machines. arXiv:1410.5401

  23. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)

  24. Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: 2019 Conference of the North American chapter of the Association for Computational Linguistics—human language technologies (NAACL-HLT 2019) NAACL-HLT (1)

  25. Jean S, Firat O, Cho K, Memisevic R, Bengio Y (2015) Montreal neural machine translation systems for WMT’15. In: Proceedings of the tenth workshop on statistical machine translation, pp 134–140

  26. Boulanger-Lewandowski N, Bengio Y, Vincent P (2013) Audio chord recognition with recurrent neural networks. In: 14th International society for music information retrieval conference (ISMIR 2013), pp 335–340

  27. He W, He Z, Wu H, Wang H (2016) Improved neural machine translation with SMT features. In: Thirtieth AAAI conference on artificial intelligence

  28. Wu C, Wu F, Huang Y (2021) Da-transformer: distance-aware transformer. In: Proceedings of the 2021 conference of the North American chapter of the association for computational linguistics: human language technologies (NAACL-HLT 2021), pp 2059–2068

  29. Dufter P, Schmitt M, Schütze H (2022) Position information in transformers: an overview. Comput Linguist (COLING) 48(3):733–763

    Article  Google Scholar 

  30. Papineni K, Roukos S, Ward T, Zhu W-J (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp 311–318

  31. Klein G, Hernandez F, Nguyen V, Senellart J (2020) The OpenNMT neural machine translation toolkit: 2020 edition. In: Proceedings of the 14th conference of the association for machine translation in the Americas (AMTA), (volume 1: research track)

  32. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L (2019) Pytorch: an imperative style, high-performance deep learning library. In: Advances in neural information processing systems (NIPS), vol 32

  33. Cettolo M, Girardi C, Federico M (2012) Wit3: web inventory of transcribed and translated talks. In: Conference of European association for machine translation, pp 261–268

  34. Cettolo M, Jan N, Sebastian S, Bentivogli L, Cattoni R, Federico M (2016) The IWSLT 2016 evaluation campaign. In: International Workshop on spoken language translation (IWSLT)

  35. Ba JL, Kiros JR, Hinton GE (2016) Layer normalization. arXiv:1607.06450

  36. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, JMLR workshop and conference proceedings, pp 249–256

  37. Rosendahl J, Tran VAK, Wang W, Ney H (2019) Analysis of positional encodings for neural machine translation. In: Proceedings of the 16th international workshop on spoken language translation (IWSLT 2019), Hong Kong, China

  38. Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th conference of the association for machine translation in the Americas: technical papers, pp 223–231

  39. Popović M (2015) chrF: character n-gram F-score for automatic MT evaluation. In: Proceedings of the tenth workshop on statistical machine translation, pp 392–395

  40. Popović M (2016) chrF deconstructed: beta parameters and n-gram weights. In: Proceedings of the first conference on machine translation: volume 2, shared task papers, pp 499–504

  41. Rei R, Stewart C, Farinha AC, Lavie A (2020) COMET: a neural framework for MT evaluation. In: Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP 2020), pp 2685–2702

  42. Conneau A, Lample G (2019) Cross-lingual language model pretraining. Adv Neural Inf Process Syst (NIPS) 32:7059–7069

    Google Scholar 

  43. Conneau A, Khandelwal K, Goyal N, Chaudhary V, Wenzek G, Guzmán F, Grave É, Ott M, Zettlemoyer L, Stoyanov V (2020) Unsupervised cross-lingual representation learning at scale. In: Proceedings of the 58th annual meeting of the Association for Computational Linguistics, pp 8440–8451

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Mehdi Homayounpour.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mahsuli, M.M., Khadivi, S. & Homayounpour, M.M. LenM: Improving Low-Resource Neural Machine Translation Using Target Length Modeling. Neural Process Lett 55, 9435–9466 (2023). https://doi.org/10.1007/s11063-023-11208-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-023-11208-1

Keywords

Navigation