LenM: Improving Low-Resource Neural Machine Translation Using Target Length Modeling

Mahsuli, Mohammad Mahdi; Khadivi, Shahram; Homayounpour, Mohammad Mehdi

doi:10.1007/s11063-023-11208-1

LenM: Improving Low-Resource Neural Machine Translation Using Target Length Modeling

Published: 31 March 2023

Volume 55, pages 9435–9466, (2023)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

234 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Neural machine translation (NMT) is a hot field in artificial intelligence which aims at translating a text from a source language into a different target language. Although NMT systems perform quite well in high-resource setup, but their performance for low-resource data is low. One aspect of data scarcity is the lack of diversity in the sentence length of training data. Also, since we usually set a maximum sentence length during training, we observe degeneration in the translation of sentences longer than the max length. In this paper, we propose LenM—a method to model the length of a target (translated) sentence given the source sentence using a deep recurrent neural structure—and apply it to the decoder side of neural machine translation systems to generate translation sentences with appropriate lengths which have a better quality. Our proposed method helps to fix some drawbacks of NMT like output degradation on unseen sentence lengths, and the limitation of using larger beam sizes in the decoding phase of translation. This method can be applied to any NMT model regardless of the structure and does not slow down the translation speed. Moreover, it can be used efficiently in non-autoregressive machine translation systems which need to know the target length before decoding. The final outcome of this paper is improving the output quality of neural machine translation systems when trained on low-resource corpora. Our experiments show the superior performance of the proposed method compared to the state-of-the-art neural machine translation systems when facing target length mismatch in training and inference, with up to 9.82 BLEU points improvement for German-to-English translation and up to 6.28 BLEU points improvement for Arabic-to-English translation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Foundation and large language models: fundamentals, challenges, opportunities, and social impacts

Article 27 November 2023

How to Fine-Tune BERT for Text Classification?

Automated machine learning: past, present and future

Article Open access 18 April 2024

References

Kalchbrenner N, Blunsom P (2013) Recurrent continuous translation models. In: Proceedings of the 2013 conference on empirical methods in natural language processing (EMNLP 2013), pp 1700–1709
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In Advances in neural information processing systems (NIPS), pp 3104–3112
Cho K, Merrienboer B, Gulcehre C, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP 2014)
Bahdanau D, Cho KH, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: 3rd international conference on learning representations (ICLR 2015)
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Haddow B, Bawden R, Barone AVM, Helcl J, Birch A (2022) Survey of low-resource machine translation. Comput Linguist (COLING) 48(3):673–732
Article Google Scholar
Koehn P, Knowles R (2017) Six challenges for neural machine translation. In: Proceedings of the first workshop on neural machine translation, pp 28–39
Stahlberg F, Byrne B (2019) On NMT search errors and model errors: cat got your tongue? In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 3356–3362
Shaw P, Uszkoreit J, Vaswani A (2018) Self-attention with relative position representations. In: Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 2 (short papers), pp 464–468
Gu J, Bradbury J, Xiong C, Li VO, Socher R (2018) Non-autoregressive neural machine translation. In: International conference on learning representations (ICLR)
Lee J, Mansimov E, Cho K (2018) Deterministic non-autoregressive neural sequence modeling by iterative refinement. In: 2018 Conference on Empirical methods in natural language processing (EMNLP 2018), Association for Computational Linguistics (ACL), pp 1173–1182
Ghazvininejad M, Levy O, Liu Y, Zettlemoyer L (2019) Mask-predict: parallel decoding of conditional masked language models. In: Proceedings of the 2019 Conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 6112–6121
Murray K, Chiang D (2018) Correcting length bias in neural machine translation. In: Proceedings of the third conference on machine translation: research papers, pp 212–223
Graves A, Schmidhuber J (2005) Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw 18:602–610
Article Google Scholar
Ray A, Rajeswar S, Chaudhury S (2015) Text recognition using deep BLSTM networks. In: 2015 Eighth international conference on advances in pattern recognition (ICAPR)
Lefebvre G, Berlemont S, Mamalet F, Garcia C (2013) BLSTM-RNN based 3D gesture classification. In: International conference on artificial neural networks
Fu SW, Tsao Y, Hwang HT, Wang HM (2018) Quality-net: an end-to-end non-intrusive speech quality assessment model based on BLSTM. arXiv:1808.05344
Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45:2673–2681
Article Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780
Article Google Scholar
Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K (2016) Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144
Graves A (2013) Generating sequences with recurrent neural networks. arXiv:1308.0850
Graves A, Wayne G, Danihelka I (2014) Neural turing machines. arXiv:1410.5401
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: 2019 Conference of the North American chapter of the Association for Computational Linguistics—human language technologies (NAACL-HLT 2019) NAACL-HLT (1)
Jean S, Firat O, Cho K, Memisevic R, Bengio Y (2015) Montreal neural machine translation systems for WMT’15. In: Proceedings of the tenth workshop on statistical machine translation, pp 134–140
Boulanger-Lewandowski N, Bengio Y, Vincent P (2013) Audio chord recognition with recurrent neural networks. In: 14th International society for music information retrieval conference (ISMIR 2013), pp 335–340
He W, He Z, Wu H, Wang H (2016) Improved neural machine translation with SMT features. In: Thirtieth AAAI conference on artificial intelligence
Wu C, Wu F, Huang Y (2021) Da-transformer: distance-aware transformer. In: Proceedings of the 2021 conference of the North American chapter of the association for computational linguistics: human language technologies (NAACL-HLT 2021), pp 2059–2068
Dufter P, Schmitt M, Schütze H (2022) Position information in transformers: an overview. Comput Linguist (COLING) 48(3):733–763
Article Google Scholar
Papineni K, Roukos S, Ward T, Zhu W-J (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp 311–318
Klein G, Hernandez F, Nguyen V, Senellart J (2020) The OpenNMT neural machine translation toolkit: 2020 edition. In: Proceedings of the 14th conference of the association for machine translation in the Americas (AMTA), (volume 1: research track)
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L (2019) Pytorch: an imperative style, high-performance deep learning library. In: Advances in neural information processing systems (NIPS), vol 32
Cettolo M, Girardi C, Federico M (2012) Wit3: web inventory of transcribed and translated talks. In: Conference of European association for machine translation, pp 261–268
Cettolo M, Jan N, Sebastian S, Bentivogli L, Cattoni R, Federico M (2016) The IWSLT 2016 evaluation campaign. In: International Workshop on spoken language translation (IWSLT)
Ba JL, Kiros JR, Hinton GE (2016) Layer normalization. arXiv:1607.06450
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, JMLR workshop and conference proceedings, pp 249–256
Rosendahl J, Tran VAK, Wang W, Ney H (2019) Analysis of positional encodings for neural machine translation. In: Proceedings of the 16th international workshop on spoken language translation (IWSLT 2019), Hong Kong, China
Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th conference of the association for machine translation in the Americas: technical papers, pp 223–231
Popović M (2015) chrF: character n-gram F-score for automatic MT evaluation. In: Proceedings of the tenth workshop on statistical machine translation, pp 392–395
Popović M (2016) chrF deconstructed: beta parameters and n-gram weights. In: Proceedings of the first conference on machine translation: volume 2, shared task papers, pp 499–504
Rei R, Stewart C, Farinha AC, Lavie A (2020) COMET: a neural framework for MT evaluation. In: Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP 2020), pp 2685–2702
Conneau A, Lample G (2019) Cross-lingual language model pretraining. Adv Neural Inf Process Syst (NIPS) 32:7059–7069
Google Scholar
Conneau A, Khandelwal K, Goyal N, Chaudhary V, Wenzek G, Guzmán F, Grave É, Ott M, Zettlemoyer L, Stoyanov V (2020) Unsupervised cross-lingual representation learning at scale. In: Proceedings of the 58th annual meeting of the Association for Computational Linguistics, pp 8440–8451

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran
Mohammad Mahdi Mahsuli, Shahram Khadivi & Mohammad Mehdi Homayounpour

Authors

Mohammad Mahdi Mahsuli
View author publications
You can also search for this author in PubMed Google Scholar
Shahram Khadivi
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Mehdi Homayounpour
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammad Mehdi Homayounpour.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Mahsuli, M.M., Khadivi, S. & Homayounpour, M.M. LenM: Improving Low-Resource Neural Machine Translation Using Target Length Modeling. Neural Process Lett 55, 9435–9466 (2023). https://doi.org/10.1007/s11063-023-11208-1

Download citation

Accepted: 24 February 2023
Published: 31 March 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s11063-023-11208-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LenM: Improving Low-Resource Neural Machine Translation Using Target Length Modeling

Abstract

Access this article

Similar content being viewed by others

Foundation and large language models: fundamentals, challenges, opportunities, and social impacts

How to Fine-Tune BERT for Text Classification?

Automated machine learning: past, present and future

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

LenM: Improving Low-Resource Neural Machine Translation Using Target Length Modeling

Abstract

Access this article

Similar content being viewed by others

Foundation and large language models: fundamentals, challenges, opportunities, and social impacts

How to Fine-Tune BERT for Text Classification?

Automated machine learning: past, present and future

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation