Chunk-lattices for verb reordering in Arabic–English statistical machine translation

Bisazza, Arianna; Pighin, Daniele; Federico, Marcello

doi:10.1007/s10590-011-9104-y

Chunk-lattices for verb reordering in Arabic–English statistical machine translation

Special issues on machine translation for Arabic

Published: 25 September 2011

Volume 26, pages 85–103, (2012)
Cite this article

Machine Translation

Arianna Bisazza¹,
Daniele Pighin¹^nAff2 &
Marcello Federico¹

211 Accesses
2 Citations
Explore all metrics

Abstract

Syntactic disfluencies in Arabic-to-English phrase-based SMT output are often due to incorrect verb reordering in Verb–Subject–Object sentences. As a solution, we propose a chunk-based reordering technique to automatically displace clause-initial verbs in the Arabic side of a word-aligned parallel corpus. This method is used to preprocess the training data, and to collect statistics about verb movements. From this analysis we build specific verb reordering lattices on the test sentences before decoding, and test different lattice-weighting schemes. Finally, we train a feature-rich discriminative model to predict likely verb reorderings for a given Arabic sentence. The model scores are used to prune the reordering lattice, leading to better word reordering at decoding time. The application of our reordering methods to the training and test data results in consistent improvements on the NIST-MT 2009 Arabic–English benchmark, both in terms of BLEU (+1.06%) and of reordering quality (+0.85%) measured with the Kendall Reordering Score.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-components System for Automatic Arabic Diacritization

Addressing Limited Vocabulary and Long Sentences Constraints in English–Arabic Neural Machine Translation

Article 02 March 2021

A POS-Based Preordering Approach for English-to-Arabic Statistical Machine Translation

References

Al-Onaizan Y, Papineni K (2006) Distortion models for statistical machine translation. In: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Sydney, Australia, pp 529–536
Birch A, Blunsom P, Osborne M (2009) A quantitative analysis of reordering phenomena. In: Proceedings of the fourth workshop on statistical MT. Association for Computational Linguistics, Morristown, NJ, pp 197–205
Birch A, Osborne M, Blunsom P (2010) Metrics for MT evaluation: evaluating reordering. Mach Transl 24(1): 15–26
Article Google Scholar
Bisazza A, Federico M (2010) Chunk-based verb reordering in VSO sentences for Arabic-English statistical machine translation. In: Proceedings of the joint fifth workshop on statistical MT and metrics MATR. Association for Computational Linguistics, Uppsala, Sweden, pp 241–249
Boser BE, Guyon I, Vapnik V (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the 5th annual workshop on computational learning theory. ACM Press, New York, NY, USA
Callison-Burch C, Osborne M, Koehn P (2006) Re-evaluating the role of BLEU in machine translation research. In: Proceedings of the 11th conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy
Carpuat M, Marton Y, Habash N (2010) Improving Arabic-to-English SMT by reordering post-verbal subjects for alignment. In: Proceedings of the ACL 2010 conference short papers. Association for Computational Linguistics, Uppsala, Sweden, pp 178–183
Casacuberta F, Federico M, Ney H, Vidal E (2008) Recent efforts in spoken language processing. IEEE Signal Process Mag 25(3): 80–88
Article Google Scholar
Collins M, Duffy N (2001) Convolution kernels for natural language. In: Advances in neural information processing systems, vol 14. MIT Press, Cambridge, pp 625–632
Crego JM, Habash N (2008) Using shallow syntax information to improve word alignment and reordering for SMT. In: Proceedings of the third workshop on statistical MT. Association for Computational Linguistics, Morristown, NJ, USA, pp 53–61
Diab M, Hacioglu K, Jurafsky D (2004) Automatic tagging of Arabic text: from raw text to base phrase chunks. In: Susan Dumais DM, Roukos S (eds) HLT-NAACL 2004: short papers. Association for Computational Linguistics, Boston, MA, USA, pp 149–152
Dyer C, Muresan S, Resnik P (2008) Generalizing word lattice ranslation. In: Proceedings of ACL-08: HLT. Association for Computational Linguistics, Columbus, OH, pp 1012–1020
Elming J, Habash N (2009) Syntactic reordering for English-Arabic phrase-based MT. In: Proceedings of the EACL 2009 workshop on computational approaches to semitic languages. Association for Computational Linguistics, Athens, Greece, pp 69–77
Galley M, Manning CD (2008) A simple and effective hierarchical phrase reordering model. In: EMNLP ’08: Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, Morristown, NJ, USA, pp 848–856
Germann U, Jahr M, Knight K, Marcu D, Yamada K (2001) Fast decoding and optimal decoding for machine translation. In: Proceedings of the 39th annual meeting of the Association for Computational Linguistics (ACL), Toulouse, France, pp 228–335
Green S, Sathi C, Manning CD (2009) NP subject detection in verb-initial Arabic clauses. In: Proceedings of the third workshop on computational approaches to Arabic script-based languages (CAASL3), Ottawa, Canada
Green S, Galley M, Manning CD (2010) Improved models of distortion cost for statistical machine translation. In: Human language technologies: the 2010 annual conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Association for Computational Linguistics, Los Angeles, CA, pp 867–875
Habash N (2007) Syntactic preprocessing for statistical MT. In: Maegaard B (ed) Proceedings of the machine translation summit XI, Copenhagen, Denmark, pp 215–222
Hardmeier C, Bisazza A, Federico M (2010) FBK at WMT 2010: Word lattices for morphological reduction and chunk-based reordering. In: Proceedings of the joint fifth workshop on statistical MT and metrics MATR. Association for Computational Linguistics, Uppsala, Sweden, pp 88–92
Haussler D (1999) Convolution kernels on discrete structures. Technical report. Department of Computer Science, University of California at Santa Cruz
Koehn P, Axelrod A, Mayne AB, Callison-Burch C, Osborne M, Talbot D (2005) Edinburgh system description for the 2005 IWSLT speech translation evaluation. In: Proceedings of the international workshop on spoken language translation. Trento, Italy
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: Open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the Association for Computational Linguistics, companion volume. Proceedings of the demo and poster sessions, Prague, Czech Republic, pp 177–180
Liang P, Taskar B, Klein D (2006) Alignment by agreement. In: Proceedings of the human language technology conference of the NAACL, Main Conference, Association for Computational Linguistics, New York City, USA, pp 104–111
Lopez A, Resnik P (2006) Word-based alignment, phrase-based translation: What’s the link? In: 5th Conference of the Association for MT in the Americas (AMTA), Boston, MA
Niehues J, Kolss M (2009) A POS-based model for long-range reorderings in SMT. In: Proceedings of the fourth workshop on statistical MT. Association for Computational Linguistics, Athens, Greece, pp 206–214
Och FJ (2003) Minimum error rate training in statistical machine translation. In: Hinrichs E, Roth D (eds) Proceedings of the 41st annual meeting of the Association for Computational Linguistics, pp 160–167
Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1): 19–51
Article Google Scholar
Och FJ, Gildea D, Khudanpur S, Sarkar A, Yamada K, Fraser A, Kumar S, Shen L, Smith D, Eng K, Jain V, Jin Z, Radev D (2004) A smorgasbord of features for statistical machine translation. In: Proceedings of the joint conference on human language technologies and the annual meeting of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL), Boston, MA
Papineni K, Roukos S, Ward T, Zhu WJ (2001) Bleu: a method for automatic evaluation of machine translation. Research report RC22176, IBM Research Division, Thomas J. Watson Research Center
Tillmann C (2004) A unigram orientation model for statistical machine translation. In: Proceedings of the joint conference on human language technologies and the annual meeting of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL)
Vapnik VN (1998) Statistical learning theory. Wiley-Interscience, New York
MATH Google Scholar
Zens R, Ney H (2006) Discriminative reordering models for statistical machine translation. In: Proceedings on the workshop on statistical MT. Association for Computational Linguistics, New York City, pp 55–63
Zens R, Och FJ, Ney H (2002) Phrase-based statistical MT. In: 25th German conference on artificial intelligence (KI2002), Springer Verlag, Aachen, Germany, pp 18–32
Zhang Y, Zens R, Ney H (2007) Chunk-level reordering of source language sentences with automatically learned rules for statistical machine translation. In: Proceedings of SSST, NAACL-HLT 2007 / AMTA workshop on syntax and structure in statistical translation, Association for Computational Linguistics, Rochester, NY, pp 1–8

Download references

Author information

Daniele Pighin
Present address: TALP Research Center-Universitat Politécnica de Catalunya, Barcelona, Spain

Authors and Affiliations

Fondazione Bruno Kessler-IRST, Via Sommarive 18, Povo, Trento, Italy
Arianna Bisazza, Daniele Pighin & Marcello Federico

Authors

Arianna Bisazza
View author publications
You can also search for this author in PubMed Google Scholar
Daniele Pighin
View author publications
You can also search for this author in PubMed Google Scholar
Marcello Federico
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arianna Bisazza.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bisazza, A., Pighin, D. & Federico, M. Chunk-lattices for verb reordering in Arabic–English statistical machine translation. Machine Translation 26, 85–103 (2012). https://doi.org/10.1007/s10590-011-9104-y

Download citation

Received: 24 June 2010
Accepted: 08 August 2011
Published: 25 September 2011
Issue Date: March 2012
DOI: https://doi.org/10.1007/s10590-011-9104-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Chunk-lattices for verb reordering in Arabic–English statistical machine translation

Abstract

Access this article

Similar content being viewed by others

Multi-components System for Automatic Arabic Diacritization

Addressing Limited Vocabulary and Long Sentences Constraints in English–Arabic Neural Machine Translation

A POS-Based Preordering Approach for English-to-Arabic Statistical Machine Translation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Chunk-lattices for verb reordering in Arabic–English statistical machine translation

Abstract

Access this article

Similar content being viewed by others

Multi-components System for Automatic Arabic Diacritization

Addressing Limited Vocabulary and Long Sentences Constraints in English–Arabic Neural Machine Translation

A POS-Based Preordering Approach for English-to-Arabic Statistical Machine Translation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation