Abstract
Syntactic disfluencies in Arabic-to-English phrase-based SMT output are often due to incorrect verb reordering in Verb–Subject–Object sentences. As a solution, we propose a chunk-based reordering technique to automatically displace clause-initial verbs in the Arabic side of a word-aligned parallel corpus. This method is used to preprocess the training data, and to collect statistics about verb movements. From this analysis we build specific verb reordering lattices on the test sentences before decoding, and test different lattice-weighting schemes. Finally, we train a feature-rich discriminative model to predict likely verb reorderings for a given Arabic sentence. The model scores are used to prune the reordering lattice, leading to better word reordering at decoding time. The application of our reordering methods to the training and test data results in consistent improvements on the NIST-MT 2009 Arabic–English benchmark, both in terms of BLEU (+1.06%) and of reordering quality (+0.85%) measured with the Kendall Reordering Score.
Similar content being viewed by others
References
Al-Onaizan Y, Papineni K (2006) Distortion models for statistical machine translation. In: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Sydney, Australia, pp 529–536
Birch A, Blunsom P, Osborne M (2009) A quantitative analysis of reordering phenomena. In: Proceedings of the fourth workshop on statistical MT. Association for Computational Linguistics, Morristown, NJ, pp 197–205
Birch A, Osborne M, Blunsom P (2010) Metrics for MT evaluation: evaluating reordering. Mach Transl 24(1): 15–26
Bisazza A, Federico M (2010) Chunk-based verb reordering in VSO sentences for Arabic-English statistical machine translation. In: Proceedings of the joint fifth workshop on statistical MT and metrics MATR. Association for Computational Linguistics, Uppsala, Sweden, pp 241–249
Boser BE, Guyon I, Vapnik V (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the 5th annual workshop on computational learning theory. ACM Press, New York, NY, USA
Callison-Burch C, Osborne M, Koehn P (2006) Re-evaluating the role of BLEU in machine translation research. In: Proceedings of the 11th conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy
Carpuat M, Marton Y, Habash N (2010) Improving Arabic-to-English SMT by reordering post-verbal subjects for alignment. In: Proceedings of the ACL 2010 conference short papers. Association for Computational Linguistics, Uppsala, Sweden, pp 178–183
Casacuberta F, Federico M, Ney H, Vidal E (2008) Recent efforts in spoken language processing. IEEE Signal Process Mag 25(3): 80–88
Collins M, Duffy N (2001) Convolution kernels for natural language. In: Advances in neural information processing systems, vol 14. MIT Press, Cambridge, pp 625–632
Crego JM, Habash N (2008) Using shallow syntax information to improve word alignment and reordering for SMT. In: Proceedings of the third workshop on statistical MT. Association for Computational Linguistics, Morristown, NJ, USA, pp 53–61
Diab M, Hacioglu K, Jurafsky D (2004) Automatic tagging of Arabic text: from raw text to base phrase chunks. In: Susan Dumais DM, Roukos S (eds) HLT-NAACL 2004: short papers. Association for Computational Linguistics, Boston, MA, USA, pp 149–152
Dyer C, Muresan S, Resnik P (2008) Generalizing word lattice ranslation. In: Proceedings of ACL-08: HLT. Association for Computational Linguistics, Columbus, OH, pp 1012–1020
Elming J, Habash N (2009) Syntactic reordering for English-Arabic phrase-based MT. In: Proceedings of the EACL 2009 workshop on computational approaches to semitic languages. Association for Computational Linguistics, Athens, Greece, pp 69–77
Galley M, Manning CD (2008) A simple and effective hierarchical phrase reordering model. In: EMNLP ’08: Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, Morristown, NJ, USA, pp 848–856
Germann U, Jahr M, Knight K, Marcu D, Yamada K (2001) Fast decoding and optimal decoding for machine translation. In: Proceedings of the 39th annual meeting of the Association for Computational Linguistics (ACL), Toulouse, France, pp 228–335
Green S, Sathi C, Manning CD (2009) NP subject detection in verb-initial Arabic clauses. In: Proceedings of the third workshop on computational approaches to Arabic script-based languages (CAASL3), Ottawa, Canada
Green S, Galley M, Manning CD (2010) Improved models of distortion cost for statistical machine translation. In: Human language technologies: the 2010 annual conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Association for Computational Linguistics, Los Angeles, CA, pp 867–875
Habash N (2007) Syntactic preprocessing for statistical MT. In: Maegaard B (ed) Proceedings of the machine translation summit XI, Copenhagen, Denmark, pp 215–222
Hardmeier C, Bisazza A, Federico M (2010) FBK at WMT 2010: Word lattices for morphological reduction and chunk-based reordering. In: Proceedings of the joint fifth workshop on statistical MT and metrics MATR. Association for Computational Linguistics, Uppsala, Sweden, pp 88–92
Haussler D (1999) Convolution kernels on discrete structures. Technical report. Department of Computer Science, University of California at Santa Cruz
Koehn P, Axelrod A, Mayne AB, Callison-Burch C, Osborne M, Talbot D (2005) Edinburgh system description for the 2005 IWSLT speech translation evaluation. In: Proceedings of the international workshop on spoken language translation. Trento, Italy
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: Open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the Association for Computational Linguistics, companion volume. Proceedings of the demo and poster sessions, Prague, Czech Republic, pp 177–180
Liang P, Taskar B, Klein D (2006) Alignment by agreement. In: Proceedings of the human language technology conference of the NAACL, Main Conference, Association for Computational Linguistics, New York City, USA, pp 104–111
Lopez A, Resnik P (2006) Word-based alignment, phrase-based translation: What’s the link? In: 5th Conference of the Association for MT in the Americas (AMTA), Boston, MA
Niehues J, Kolss M (2009) A POS-based model for long-range reorderings in SMT. In: Proceedings of the fourth workshop on statistical MT. Association for Computational Linguistics, Athens, Greece, pp 206–214
Och FJ (2003) Minimum error rate training in statistical machine translation. In: Hinrichs E, Roth D (eds) Proceedings of the 41st annual meeting of the Association for Computational Linguistics, pp 160–167
Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1): 19–51
Och FJ, Gildea D, Khudanpur S, Sarkar A, Yamada K, Fraser A, Kumar S, Shen L, Smith D, Eng K, Jain V, Jin Z, Radev D (2004) A smorgasbord of features for statistical machine translation. In: Proceedings of the joint conference on human language technologies and the annual meeting of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL), Boston, MA
Papineni K, Roukos S, Ward T, Zhu WJ (2001) Bleu: a method for automatic evaluation of machine translation. Research report RC22176, IBM Research Division, Thomas J. Watson Research Center
Tillmann C (2004) A unigram orientation model for statistical machine translation. In: Proceedings of the joint conference on human language technologies and the annual meeting of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL)
Vapnik VN (1998) Statistical learning theory. Wiley-Interscience, New York
Zens R, Ney H (2006) Discriminative reordering models for statistical machine translation. In: Proceedings on the workshop on statistical MT. Association for Computational Linguistics, New York City, pp 55–63
Zens R, Och FJ, Ney H (2002) Phrase-based statistical MT. In: 25th German conference on artificial intelligence (KI2002), Springer Verlag, Aachen, Germany, pp 18–32
Zhang Y, Zens R, Ney H (2007) Chunk-level reordering of source language sentences with automatically learned rules for statistical machine translation. In: Proceedings of SSST, NAACL-HLT 2007 / AMTA workshop on syntax and structure in statistical translation, Association for Computational Linguistics, Rochester, NY, pp 1–8
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bisazza, A., Pighin, D. & Federico, M. Chunk-lattices for verb reordering in Arabic–English statistical machine translation. Machine Translation 26, 85–103 (2012). https://doi.org/10.1007/s10590-011-9104-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-011-9104-y