Abstract
Extant Statistical Machine Translation systems are very complex pieces of software, which embed multiple layers of heuristics and encompass very large numbers of numerical parameters. As a result, it is difficult to analyze output translations and there is a real need for tools that could help developers to better understand the various causes of errors. In this study, we make a step in that direction and present an attempt to evaluate the quality of the phrase-based translation model. In order to identify those translation errors that stem from deficiencies in the phrase table, we propose to compute the oracle BLEU-4 score, that is the best score that a system based on this phrase table can achieve on a reference corpus. By casting the computation of the oracle BLEU-1 as an Integer Linear Programming problem, we show that it is possible to efficiently compute accurate upper-bounds of this score, and report measures performed on several standard benchmarks. Various other applications of these oracle decoding techniques are also reported and discussed.
Similar content being viewed by others
References
Arun A, Koehn P (2007) Online learning methods for discriminative training of phrase based statistical machine translation. In: Machine Translation Summit XI: Proceedings, Copenhagen, Denmark, pp 15–20
Auli M, Lopez A, Hoang H, Koehn P (2009) A systematic analysis of translation model search spaces. In: EACL 2009: Fourth Workshop on Statistical Machine Translation, Proceedings of the Workshop, Athens, Greece, pp 224–232
Banerjee S, Lavie A (2005) METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Intrinsic and extrinsic evaluation measures for machine translation and/or summarization, Proceedings of the ACL-05 Workshop, Ann Arbor, MI, pp 65–72
Berger AL, Brown PF, Della Pietra SA, Della Pietra VJ, Kehler AS, Mercer RL (1996) Language translation apparatus and method using context-based translation models, United States Patent 5510981. http://www.freepatentsonline.com/5510981.html
Bottou L, Bousquet O (2008) The tradeoffs of large scale learning. In: Advances in Neural Information Processing Systems, Vancouver, BC, Canada, vol 20, pp 161–168
Callison-Burch C, Koehn P, Monz C, Schroeder J (2009) Findings of the 2009 Workshop on Statistical Machine Translation. In: EACL 2009: Fourth Workshop on Statistical Machine Translation, Proceedings of the Workshop, Athens, Greece, pp 1–28
Carpuat M, Marton Y, Habash N (2010) Improving Arabic-to-English statistical machine translation by reordering post-verbal subjects for alignment. In: Proceedings of the ACL 2010 Conference Short Papers, Uppsala, Sweden, pp 178–183
Chiang D, DeNeefe S, Chan YS, Ng HT (2008) Decomposability of translation metrics for improved evaluation and efficient algorithms. In: EMNLP 2008: Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, Honolulu, Hawaii, pp 610–619
Cormen TH, Leiserson CE, Rivest RL, Stein C (2001) Introduction to algorithms. MIT Press and McGraw-Hill, Cambridge, MA
De Nero J, Klein D (2008) The complexity of phrase alignment problems. In: ACL-08: HLT, 46th annual meeting of the Association for Computational Linguistics: Human Language Technologies, Short Papers, Columbus, OH, pp 25–28
Dreyer M, Hall KB, Khudanpur SP (2007) Comparing reordering constraints for SMT using efficient BLEU oracle computation. In: Proceedings of SSST, NAACL-HLT 2007/AMTA workshop on syntax and structure in statistical translation, Rochester, NY, pp 103–110
Galron D, Penkale S, Way A, Melamed D (2009) Accuracy-based scoring for DOT: a step towards evaluation measure-based MT training. In: EMNLP 2009: Proceedings of the 2009 conference on empirical methods in natural language processing, Singapore, pp 371–380
Germann U (2003) Greedy decoding for statistical machine translation in almost linear time. In: HLT-NAACL 2003: conference combining Human Language Technology conference series and the North American Chapter of the Association for Computational Linguistics conference series, Edmonton, Canada, pp 1–8
Germann U, Jahr M, Knight K, Marcu D, Yamada K (2001) Fast decoding and optimal decoding for machine translation. In: 39th annual meeting and 10th conference of the European Chapter, Proceedings of the Conference, Toulouse, France, pp 228–235
Germann U, Jahr M, Knight K, Marcu D, Yamada K (2004) Fast and optimal decoding for machine translation. Artif Intell 154(1-2): 127–143
Gimpel K, Smith N (2008) Rich source-side context for statistical machine translation. In: ACL-08: HLT: Third Workshop on Statistical Machine Translation, Proceedings of the Workshop, Columbus, OH, pp 9–17
Habash N, Sadat F (2006) Arabic preprocessing schemes for statistical machine translation. In: HLT-NAACL 2006: Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Short Papers, New York, NY, pp 49–52
Johnson H, Martin J, Foster G, Kuhn R (2007) Improving translation quality by discarding most of the phrasetable. In: Proceedings of the 2007 joint conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic, pp 967–975
Karp RM (1972) Reducibility among combinatorial problems. In: Miller RE, Thatcher JW (eds) Complexity of computer computations. Plenum Press, New York, pp 85–103
Koehn P (2004) Pharaoh: A beam search decoder for phrase-based statistical machine translation models. In: Machine Translation: From Real Users to Research, 6th conference of the Association for Machine Translation in the Americas, AMTA 2004, Proceedings, Washington, DC, USA, pp 115–124
Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: HLT-NAACL 2003: conference combining Human Language Technology conference series and the North American Chapter of the Association for Computational Linguistics conference series, Edmonton, Canada, pp 48–54
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: ACL 2007, Proceedings of the Interactive Poster and Demonstration Sessions, Prague, Czech Republic, pp 177–180
Kumar S, Byrne W (2005) Local phrase reordering models for statistical machine translation. In: HLT/EMNLP 2005: Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, Vancouver, British Columbia, Canada, pp 161–168
Langlais P, Patry A, Gotti F (2007) A greedy decoder for phrase-based statistical machine translation. In: Proceedings of the 11th international conference on Theoretical and Methodological Issues in Machine Translation (TMI’07), Skövde (Sweden), pp 104–113
Lavie A, Sagae K, Jayaraman S (2004) The significance of recall in automatic metrics for MT evaluation. In: Machine Translation: From Real Users to Research, 6th conference of the Association for Machine Translation in the Americas, AMTA 2004, Proceedings, Washington, DC, USA, pp 134–143
Leusch G, Matusov E, Ney H (2008) Complexity of finding the BLEU-optimal hypothesis in a confusion network. In: EMNLP 2008: 2008 conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, Honolulu, Hawaii, USA, pp 839–847
Li Z, Khudanpur S (2009) Efficient extraction of oracle-best translations from hypergraphs. In: Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Conference, Boulder, CO, pp 9–12
Liang P, Bouchard-Côté A, Klein D, Taskar B (2006) An end-to-end discriminative approach to machine translation. In: COLING-ACL 2006, 21st international conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, Sydney, Australia, pp 761–768
Lopez A (2009) Translation as weighted deduction. In: EACL 2009: Proceedings of the 12th conference of the European Chapter of the Association for Computational Linguistics, Athens, Greece, pp 532–540
Och FJ (2003) Minimum error rate training in statistical machine translation. In: 41st annual meeting of the Association for Computational Linguistics, Proceedings of the Conference, Sapporo, Japan, pp 160–167
Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1): 19–51
Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: 40th annual meeting of the Association for Computational Linguistics, Proceedings of the Conference, Philadelphia, PA, USA, pp 311–318
Penkale S, Ma Y, Galron D, Way A (2010) Accuracy-based scoring for phrase-based statistical machine translation. In: AMTA 2010: The Ninth Conference of the Association for Machine Translation in the Americas, Proceedings, Denver, CO, pp 257–266
Popović M, Ney H (2011) Towards automatic error analysis of machine translation output. Comput Linguist 37(4): 657–688
Roth D, Yih W (2005) Integer linear programming inference for conditional random fields. In: Machine Learning, Proceedings of the Twenty-Second International Conference (ICML 2005), Bonn, Germany, pp 737–744
Schwartz L (2008) Multi-source translation methods. In: AMTA-2008: MT at work: Proceedings of the Eighth Conference of the Association for Machine Translation in the Americas, Waikiki, Hawaii, pp 279–288
Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: AMTA 2006: Proceedings of the 7th conference of the Association for Machine Translation in the Americas: Visions for the Future of Machine Translation, Cambridge, MA, pp 223–231
Snover M, Madnani N, Dorr B, Schwartz R (2009) Fluency, adequacy, or HTER? Exploring different human judgments with a tunable MT metric. In: EACL 2009: Fourth Workshop on Statistical Machine Translation, Proceedings of the Workshop, Athens, Greece, pp 259–268
Sokolov A, Wisniewski G, Yvon F (2012) Computing lattice BLEU oracle scores for machine translation. In: Proceedings of the 13th conference of the European Chapter of the Association for Computational Linguistics, Avignon, France, pp 120–129
Specia L, Raj D, Turchi M (2010) Machine translation evaluation versus quality estimation. Mach Transl 24(1): 39–50
Srivastava A, Ma Y, Way A (2011) Oracle-based training for phrase-based statistical machine translation. In: Proceedings of the 15th annual meeting of the European Association for Machine Translation, Leuven, Belgium, pp 169–176
Stroppa N, van den Bosch A, Way A (2007) Exploiting source similarity for SMT using context-informed features. In: Proceedings of the 11th international conference on Theoretical and Methodological Issues in Machine Translation (TMI’07), Skövde, (Sweden), pp 231–240
Stymne S, Ahrenberg L (2012) On the practice of error analysis for machine translation evaluation. In: Proceedings of LREC 2012: Eighth international conference on Language Resources and Evaluation, Istanbul, Turkey, pp 1785–1790
Tillmann C, Zhang T (2006) A discriminative global training algorithm for statistical mt. In: COLING-ACL 2006, 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, Sydney, Australia, pp 721–728
Turchi M, De Bie T, Cristianini N (2008) Learning performance of a machine translation system: a statistical and computational analysis. In: ACL-08: HLT: Third Workshop on Statistical Machine Translation, Proceedings of the Workshop, Columbus, OH, pp 35–43
Vilar D, Xu J, D’Haro L, Ney H (2006) Error analysis of statistical machine translation output. In: LREC-2006: fifth international conference on Language Resources and Evaluation, Proceedings, Genoa, Italy, pp 697–702
Wolsey L (1998) Integer programming. Wiley, New York
Zens R, Ney H (2003) A comparative study on reordering constraints in statistical machine translation. In: 41st annual meeting of the Association for Computational Linguistics, Proceedings of the Conference, Sapporo, Japan, pp 144–151
Zens R, Ney H (2005) Word graphs for statistical machine translation. In: ACL-05: Building and Using Parallel Texts: Data-Driven Machine Translation and Beyond, Proceedings of the Workshop, Ann Arbor, MI, pp 191–198
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wisniewski, G., Yvon, F. Oracle decoding as a new way to analyze phrase-based machine translation. Machine Translation 27, 115–138 (2013). https://doi.org/10.1007/s10590-012-9134-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-012-9134-0