Skip to main content
Log in

Evaluating machine translation with LFG dependencies

  • Published:
Machine Translation

Abstract

In this paper we show how labelled dependencies produced by a Lexical-Functional Grammar parser can be used in Machine Translation evaluation. In contrast to most popular evaluation metrics based on surface string comparison, our dependency-based method does not unfairly penalize perfectly valid syntactic variations in the translation, shows less bias towards statistical models, and the addition of WordNet provides a way to accommodate lexical differences. In comparison with other metrics on a Chinese–English newswire text, our method obtains high correlation with human scores, both on a segment and system level.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Albrecht JS, Hwa R (2007) Regression for sentence-level MT evaluation with pseudo references. In: ACL 2007, proceedings of the 45th annual meeting of the Association of Computational Linguistics, Prague, Czech Republic, pp 296–303

  • Banerjee S, Lavie A (2005) METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Intrinsic and extrinsic evaluation measures for MT and/or summarization, proceedings of the ACL-05 workshop, Ann Arbor, Michigan, pp 65–73

  • Bresnan J (2001) Lexical-functional syntax. Blackwell, Oxford

    Google Scholar 

  • Cahill A, Burke M, O’Donovan R, van Genabith J, Way A (2004) Long-distance dependency resolution in automatically acquired wide-coverage PCFG-based LFG approximations. In: ACL-04, 42nd annual meeting of the Association of Computational Linguistics, Barcelona, Spain, pp 320–327

  • Cahill A, Burke M, O’Donovan R, van Genabith J, Way A (2008) Wide-coverage deep statistical parsing using automatic dependency structure annotation. Comput Ling 34: 81–124

    Article  Google Scholar 

  • Callison-Burch C, Osborne M, Koehn P (2006) Re-evaluating the role of BLEU in machine translation research. In: EACL-2006, 11th conference of the European chapter of the Association of Computational Linguistics, Oslo, Norway, pp 249–256

  • Callison-Burch C, Fordyce C, Koehn P, Monz C, Schroeder J (2007) (Meta-)evaluation of machine translation. In: Proceedings of the second workshop on statistical machine translation, Prague, Czech Republic, pp 136–158

  • Charniak E (2000) A maximum entropy inspired parser. In: 1st meeting of the North American chapter of the Association for Computational Linguistics Seattle, Washington, pp 132–139

  • Collins M (1999) Head-driven statistical models for natural language parsing. PhD thesis, University of Pennsylvania, Philadelphia, PA

  • Doddington G (2002) Automatic evaluation of MT quality using n-gram co-occurrence statistics. In: Proceedings of human language technology conference 2002, San Diego, CA, pp 138–145

  • Fisher RA (1990) Statistical methods, experimental design, and scientific inference: A re-issue of Statistical methods for research workers, the design of experiments, and Statistical methods and scientific inference. Oxford University Press, Oxford

    Google Scholar 

  • Giménez J, Màrquez L (2007) Linguistic features for automatic evaluation of heterogeneous MT systems. In: Proceedings of the second workshop on statistical machine translation, Prague, Czech Republic, pp 256–264

  • Kaplan RM, Bresnan J (1982) Lexical-functional grammar: a formal system for grammatical representation. In: Bresnan J (ed) The mental representation of grammatical relations, MIT Press, Cambridge, MA, 173–281. Repr. in: Dalrymple M, Kaplan RM, Maxwell J, Zaenen A (eds) Formal issues in lexical-functional grammar, Center for the Study of Language and Information, Stanford, 1995, pp 29–130

  • Kauchak D, Barzilay R (2006) Paraphrasing for automatic evaluation. In: HLT-NAACL 2006: human language technology conference of the North American chapter of the Association of Computational Linguistics conference, New York, NY, pp 455–462

  • Koehn P (2004) Pharaoh: A beam search decoder for phrase-based statistical machine translation models. In: Frederking RE, Taylor KB (eds) Machine translation: from real users to research, 6th conference of the Association for machine translation in the Americas, AMTA 2004, Washington, DC, USA, Springer, Berlin, pp 115–124

  • Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: MT summit X, the tenth machine translation summit, Phuket, Thailand, pp 79–86

  • Kulesza A, Shieber SM (2004) A learning approach to improving sentence-level MT evaluation. In: Proceedings of the tenth conference on theoretical and methodological issues in machine translation TMI-04, Baltimore, Maryland, pp 75–84

  • Leusch G, Ueffing N, Ney H (2006) CDER: efficient MT evaluation Using block movements. In: EACL-2006, 11th conference of the European chapter of the Association of Computational Linguistics, Trento, Italy, pp 241–248

  • Liu D, Gildea D (2005) Syntactic features for evaluation of machine translation. In: Intrinsic and extrinsic evaluation measures for MT and/or summarization, proceedings of the ACL-05 workshop, Ann Arbor, Michigan, pp 25–32

  • Melamed ID, Green R, Turian JP (2003) Precision and recall of machine translation. In: HLT-NAACL 2003: Human language technology conference of the North American chapter of the Association of Computational Linguistics, companion volume: short papers, student research workshop, demonstrations, tutorial abstracts, Edmonton, Alberta, Canada, pp 61–63

  • Noreen EW (1989) Computer-intensive methods for testing hypotheses: an introduction. Wiley-Interscience, New York, NY

    Google Scholar 

  • Och FJ, Ney H (2003) A systematic comparison of various statistical alignment modes. Comput Ling 29: 19–51

    Article  Google Scholar 

  • Owczarzak K, Groves D, van Genabith J, Way A (2006) Contextual bitext-derived paraphrases in automatic MT evaluation. In: HLT-NAACL 2006 [workshop on] statistical machine translation, New York, NY, pp 86–93

  • Owczarzak K, van Genabith J, Way A (2007a) Dependency-based automatic evaluation for machine translation. In: Proceedings of SSST, HLT-NAACL 2007/AMTA workshop on syntax and structure in statistical machine translation, Rochester, New York, pp 86–93

  • Owczarzak K, van Genabith J, Way A (2007b) Labelled dependencies in machine translation evaluation. In: Proceedings of the second workshop on statistical machine translation, Prague, Czech Republic, pp 104–111

  • Papineni K, Roukos S, Ward T, Zhu W (2002) BLEU: a method for automatic evaluation of machine translation. In: 40th annual meeting of the Association of Computational Linguistics, Philadelphia, Pennsylvania, pp 311–318

  • Russo-Lassner G, Lin J, Resnik P (2005) A paraphrase-based approach to machine translation evaluation. Technical report LAMP-TR-125/CS-TR-4754/UMIACS-TR-2005-57, University of Maryland, College Park, MD

  • Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation error rate with targeted human annotation. In: AMTA 2006, proceedings of the 7th conference of the Association for machine translation in the Americas: visions for the future of machine translation, Cambridge, MA, pp 223–231

  • Stroppa N, Owczarzak K (2007) A cluster-based representation for multi-system MT evaluation. In: TMI 2007, proceedings of the 11th international conference on theoretical and methodological issues in machine translation, Skövde [Sweden], pp 221–230

  • Turian JP, Shen L, Melamed ID (2003) Evaluation of machine translation and its evaluation. In: MT summit IX, proceedings of the ninth machine translation summit, New Orleans, USA, pp 386–393

  • Yang Y, Zhou M, Lin C-Y (2007) Sentence level machine translation evaluation as a ranking problem: one step aside from BLEU. In: Proceedings of the second workshop on statistical machine translation, Prague, Czech Republic, pp 240–247

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Karolina Owczarzak.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Owczarzak, K., van Genabith, J. & Way, A. Evaluating machine translation with LFG dependencies. Machine Translation 21, 95–119 (2007). https://doi.org/10.1007/s10590-008-9038-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-008-9038-1

Keywords

Navigation