Recursive alignment block classification technique for word reordering in statistical machine translation

Costa-jussà, Marta R.; Fonollosa, José A. R.; Monte, Enric

doi:10.1007/s10579-010-9133-9

Recursive alignment block classification technique for word reordering in statistical machine translation

Original Paper
Published: 26 November 2010

Volume 45, pages 165–179, (2011)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

Marta R. Costa-jussà¹,
José A. R. Fonollosa² &
Enric Monte²

124 Accesses
1 Citation
Explore all metrics

Abstract

Statistical machine translation (SMT) is based on alignment models which learn from bilingual corpora the word correspondences between source and target language. These models are assumed to be capable of learning reorderings. However, the difference in word order between two languages is one of the most important sources of errors in SMT. In this paper, we show that SMT can take advantage of inductive learning in order to solve reordering problems. Given a word alignment, we identify those pairs of consecutive source blocks (sequences of words) whose translation is swapped, i.e. those blocks which, if swapped, generate a correct monotonic translation. Afterwards, we classify these pairs into groups, following recursively a co-occurrence block criterion, in order to infer reorderings. Inside the same group, we allow new internal combination in order to generalize the reorder to unseen pairs of blocks. Then, we identify the pairs of blocks in the source corpora (both training and test) which belong to the same group. We swap them and we use the modified source training corpora to realign and to build the final translation system. We have evaluated our reordering approach both in alignment and translation quality. In addition, we have used two state-of-the-art SMT systems: a Phrased-based and an Ngram-based. Experiments are reported on the EuroParl task, showing improvements almost over 1 point in the standard MT evaluation metrics (mWER and BLEU).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Natural language processing: state of the art, current trends and challenges

Article 14 July 2022

Machine translation systems and quality assessment: a systematic review

Article Open access 10 April 2021

Pre-trained models for natural language processing: A survey

Article 15 September 2020

Notes

http://www.europarl.eu.int/.
http://www.tc-star.org/.
TC-STAR (Technology and Corpora for Speech to Speech Translation) is an European Community project funded by the Sixth Framework Programme.

References

Brants, T. (2000) Tnt–a statistical part-of-speech tagger. In Proceedings of the sixth applied natural language processing.
Brown, P., Della Pietra, S., Della Pietra, V., & Mercer, R. (1993). The mathematics of statistical machine translation. Computational Linguistics, 19(2), 263–311.
Google Scholar
Carreras, X., Chao, I., Padró, L., & Padró, M. (2004) Freeling: An open-source suite of language analyzers. In 4th international conference on language resources and evaluation, LREC’06, Lisboa, Portugal.
Costa-jussà, M. R., & Fonollosa, J. A. R. (2009). State-of-the-art word reordering approaches in statistical machine translation. IEICE Transactions on Information and Systems, 92(11), 2179–2185.
Article Google Scholar
Costa-jussà, M. R., Fonollosa, J. A. R., & Monte, E. (2008). Using reordering in statistical machine translation based on alignment block classification. In 6th international conference on language resources and evaluation, LREC’08.
de Gispert, A., Mariño, J. (2003). Experiments in word-ordering and morphological preprocessing for transducer-based statistical machine translation. In IEEE automatic speech recognition and understanding workhsop, ASRU’03 (pp. 634–639). St. Thomas, USA.
Kanthak, S., Vilar, D., Matusov, E., Zens, R., & Ney, H. (2005). Novel reordering approaches in phrase-based statistical machine translation. In Proceedings of the ACL workshop on building and using parallel texts: Data-driven machine translation and beyond (pp. 167–174). Ann Arbor, MI.
Kneser, R., & Ney, H. (1995) Improved backing-off for ngram language modeling. IEEE International Conference on ASSP, 2, 181–184.
Google Scholar
Koehn, P., Och, F. J., & Marcu, D. (2003). Statistical phrase-based translation. In Proceedings of the human language technology conference, HLT-NAACL’2003 (pp. 48–54). Edmonton, Canada.
Lambert, P. (2008). Exploiting lexical information and discriminative alignment training in statistical machine translation. Ph.D. thesis, Software Department, Universitat Politècnica de Catalunya (UPC).
Mariño, J. B., Banchs, R. E., Crego, J. M., de Gispert, A., Lambert, P., Fonollosa, J. A. R., & Costa-jussà, M. R. (2006) N-gram based machine translation. Computational Linguistics, 32(4), 527–549.
Article Google Scholar
Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K., & Tengi, R. (1991). Five papers on word net. Special Issue of International Journal of Lexicography, 3(4), 235–312.
Article Google Scholar
Nelder, J. A., & Mead, R. (1965). A simplex method for function minimization. The Computer Journal, 7, 308–313.
Google Scholar
Nießen, S., & Ney, H. (2001). Morpho-syntactic analysis for reordering in statistical machine translation. In Proceedings of the MT-Summit VII (pp. 247–252).
Och, F. J., & Ney, H. (2003). A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1), 19–51.
Article Google Scholar
Popovic, M., & Ney, H. (2006). Pos-based word reorderings for statistical machine translation. In 5th international conference on language resources and evaluation (LREC) (pp. 1278–1283). Genoa.
Stolcke, A. (2002). Srilm–an extensible language modeling toolkit. In Proceedings of the 7th international conference on spoken language processing, ICSLP’02 (pp. 901–904). Denver, USA.
Tillmann, C., & Zhang, T. (2005). A localized prediction model for statistical machine translation. In ACL.
Zens, R., Och, F. J., & Ney, H. (2004) Improvements in phrase-based statistical machine translation. In Proceedings of the human language technology conference, HLT-NAACL’2004 (pp. 257–264). Boston, MA (USA).

Download references

Acknowledgments

This work has been partially funded by the Spanish Department of Science and Innovation through the Juan de la Cierva fellowship program and the BUCEADOR project (TEC2009-14094-C04-01). The authors also want to thank the anonymous reviewers of this paper for their valuable comments. Finally, the authors want to thank Barcelona Media Innovation Center, Universitat Politècnica de Catalunya and TALP Research Center for their support and permission to publish this research.

Author information

Authors and Affiliations

Barcelona Media Innovation Center, Av. Diagonal 177, 08018, Barcelona, Spain
Marta R. Costa-jussà
Universitat Politècnica de Catalunya, TALP Research Center, Jordi Girona 1-3, 08034, Barcelona, Spain
José A. R. Fonollosa & Enric Monte

Authors

Marta R. Costa-jussà
View author publications
You can also search for this author in PubMed Google Scholar
José A. R. Fonollosa
View author publications
You can also search for this author in PubMed Google Scholar
Enric Monte
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marta R. Costa-jussà.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Costa-jussà, M.R., Fonollosa, J.A.R. & Monte, E. Recursive alignment block classification technique for word reordering in statistical machine translation. Lang Resources & Evaluation 45, 165–179 (2011). https://doi.org/10.1007/s10579-010-9133-9

Download citation

Published: 26 November 2010
Issue Date: May 2011
DOI: https://doi.org/10.1007/s10579-010-9133-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recursive alignment block classification technique for word reordering in statistical machine translation

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Machine translation systems and quality assessment: a systematic review

Pre-trained models for natural language processing: A survey

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Recursive alignment block classification technique for word reordering in statistical machine translation

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Machine translation systems and quality assessment: a systematic review

Pre-trained models for natural language processing: A survey

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation