Skip to main content
Log in

Chunk-lattices for verb reordering in Arabic–English statistical machine translation

Special issues on machine translation for Arabic

  • Published:
Machine Translation

Abstract

Syntactic disfluencies in Arabic-to-English phrase-based SMT output are often due to incorrect verb reordering in Verb–Subject–Object sentences. As a solution, we propose a chunk-based reordering technique to automatically displace clause-initial verbs in the Arabic side of a word-aligned parallel corpus. This method is used to preprocess the training data, and to collect statistics about verb movements. From this analysis we build specific verb reordering lattices on the test sentences before decoding, and test different lattice-weighting schemes. Finally, we train a feature-rich discriminative model to predict likely verb reorderings for a given Arabic sentence. The model scores are used to prune the reordering lattice, leading to better word reordering at decoding time. The application of our reordering methods to the training and test data results in consistent improvements on the NIST-MT 2009 Arabic–English benchmark, both in terms of BLEU (+1.06%) and of reordering quality (+0.85%) measured with the Kendall Reordering Score.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Al-Onaizan Y, Papineni K (2006) Distortion models for statistical machine translation. In: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Sydney, Australia, pp 529–536

  • Birch A, Blunsom P, Osborne M (2009) A quantitative analysis of reordering phenomena. In: Proceedings of the fourth workshop on statistical MT. Association for Computational Linguistics, Morristown, NJ, pp 197–205

  • Birch A, Osborne M, Blunsom P (2010) Metrics for MT evaluation: evaluating reordering. Mach Transl 24(1): 15–26

    Article  Google Scholar 

  • Bisazza A, Federico M (2010) Chunk-based verb reordering in VSO sentences for Arabic-English statistical machine translation. In: Proceedings of the joint fifth workshop on statistical MT and metrics MATR. Association for Computational Linguistics, Uppsala, Sweden, pp 241–249

  • Boser BE, Guyon I, Vapnik V (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the 5th annual workshop on computational learning theory. ACM Press, New York, NY, USA

  • Callison-Burch C, Osborne M, Koehn P (2006) Re-evaluating the role of BLEU in machine translation research. In: Proceedings of the 11th conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy

  • Carpuat M, Marton Y, Habash N (2010) Improving Arabic-to-English SMT by reordering post-verbal subjects for alignment. In: Proceedings of the ACL 2010 conference short papers. Association for Computational Linguistics, Uppsala, Sweden, pp 178–183

  • Casacuberta F, Federico M, Ney H, Vidal E (2008) Recent efforts in spoken language processing. IEEE Signal Process Mag 25(3): 80–88

    Article  Google Scholar 

  • Collins M, Duffy N (2001) Convolution kernels for natural language. In: Advances in neural information processing systems, vol 14. MIT Press, Cambridge, pp 625–632

  • Crego JM, Habash N (2008) Using shallow syntax information to improve word alignment and reordering for SMT. In: Proceedings of the third workshop on statistical MT. Association for Computational Linguistics, Morristown, NJ, USA, pp 53–61

  • Diab M, Hacioglu K, Jurafsky D (2004) Automatic tagging of Arabic text: from raw text to base phrase chunks. In: Susan Dumais DM, Roukos S (eds) HLT-NAACL 2004: short papers. Association for Computational Linguistics, Boston, MA, USA, pp 149–152

  • Dyer C, Muresan S, Resnik P (2008) Generalizing word lattice ranslation. In: Proceedings of ACL-08: HLT. Association for Computational Linguistics, Columbus, OH, pp 1012–1020

  • Elming J, Habash N (2009) Syntactic reordering for English-Arabic phrase-based MT. In: Proceedings of the EACL 2009 workshop on computational approaches to semitic languages. Association for Computational Linguistics, Athens, Greece, pp 69–77

  • Galley M, Manning CD (2008) A simple and effective hierarchical phrase reordering model. In: EMNLP ’08: Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, Morristown, NJ, USA, pp 848–856

  • Germann U, Jahr M, Knight K, Marcu D, Yamada K (2001) Fast decoding and optimal decoding for machine translation. In: Proceedings of the 39th annual meeting of the Association for Computational Linguistics (ACL), Toulouse, France, pp 228–335

  • Green S, Sathi C, Manning CD (2009) NP subject detection in verb-initial Arabic clauses. In: Proceedings of the third workshop on computational approaches to Arabic script-based languages (CAASL3), Ottawa, Canada

  • Green S, Galley M, Manning CD (2010) Improved models of distortion cost for statistical machine translation. In: Human language technologies: the 2010 annual conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Association for Computational Linguistics, Los Angeles, CA, pp 867–875

  • Habash N (2007) Syntactic preprocessing for statistical MT. In: Maegaard B (ed) Proceedings of the machine translation summit XI, Copenhagen, Denmark, pp 215–222

  • Hardmeier C, Bisazza A, Federico M (2010) FBK at WMT 2010: Word lattices for morphological reduction and chunk-based reordering. In: Proceedings of the joint fifth workshop on statistical MT and metrics MATR. Association for Computational Linguistics, Uppsala, Sweden, pp 88–92

  • Haussler D (1999) Convolution kernels on discrete structures. Technical report. Department of Computer Science, University of California at Santa Cruz

  • Koehn P, Axelrod A, Mayne AB, Callison-Burch C, Osborne M, Talbot D (2005) Edinburgh system description for the 2005 IWSLT speech translation evaluation. In: Proceedings of the international workshop on spoken language translation. Trento, Italy

  • Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: Open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the Association for Computational Linguistics, companion volume. Proceedings of the demo and poster sessions, Prague, Czech Republic, pp 177–180

  • Liang P, Taskar B, Klein D (2006) Alignment by agreement. In: Proceedings of the human language technology conference of the NAACL, Main Conference, Association for Computational Linguistics, New York City, USA, pp 104–111

  • Lopez A, Resnik P (2006) Word-based alignment, phrase-based translation: What’s the link? In: 5th Conference of the Association for MT in the Americas (AMTA), Boston, MA

  • Niehues J, Kolss M (2009) A POS-based model for long-range reorderings in SMT. In: Proceedings of the fourth workshop on statistical MT. Association for Computational Linguistics, Athens, Greece, pp 206–214

  • Och FJ (2003) Minimum error rate training in statistical machine translation. In: Hinrichs E, Roth D (eds) Proceedings of the 41st annual meeting of the Association for Computational Linguistics, pp 160–167

  • Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1): 19–51

    Article  Google Scholar 

  • Och FJ, Gildea D, Khudanpur S, Sarkar A, Yamada K, Fraser A, Kumar S, Shen L, Smith D, Eng K, Jain V, Jin Z, Radev D (2004) A smorgasbord of features for statistical machine translation. In: Proceedings of the joint conference on human language technologies and the annual meeting of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL), Boston, MA

  • Papineni K, Roukos S, Ward T, Zhu WJ (2001) Bleu: a method for automatic evaluation of machine translation. Research report RC22176, IBM Research Division, Thomas J. Watson Research Center

  • Tillmann C (2004) A unigram orientation model for statistical machine translation. In: Proceedings of the joint conference on human language technologies and the annual meeting of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL)

  • Vapnik VN (1998) Statistical learning theory. Wiley-Interscience, New York

    MATH  Google Scholar 

  • Zens R, Ney H (2006) Discriminative reordering models for statistical machine translation. In: Proceedings on the workshop on statistical MT. Association for Computational Linguistics, New York City, pp 55–63

  • Zens R, Och FJ, Ney H (2002) Phrase-based statistical MT. In: 25th German conference on artificial intelligence (KI2002), Springer Verlag, Aachen, Germany, pp 18–32

  • Zhang Y, Zens R, Ney H (2007) Chunk-level reordering of source language sentences with automatically learned rules for statistical machine translation. In: Proceedings of SSST, NAACL-HLT 2007 / AMTA workshop on syntax and structure in statistical translation, Association for Computational Linguistics, Rochester, NY, pp 1–8

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arianna Bisazza.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bisazza, A., Pighin, D. & Federico, M. Chunk-lattices for verb reordering in Arabic–English statistical machine translation. Machine Translation 26, 85–103 (2012). https://doi.org/10.1007/s10590-011-9104-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-011-9104-y

Keywords

Navigation