Skip to main content
Log in

EBMT by tree-phrasing

  • Original Paper
  • Published:
Machine Translation

Abstract

This article presents an attempt to build a repository storing associations between simple syntactic dependency treelets in a source language and their corresponding phrases in a target language. We assess the usefulness of this resource in two different settings. First, we show that it improves upon a standard subsentential translation memory. Second, we observe improvements in translation quality when a standard statistical phrase-based translation engine is augmented with the ability to exploit such a repository.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bertoldi N, Cattoni R, Cettolo M, Federico M (2004) The ITC-irst statistical machine translation system for IWSLT-2004. In: International workshop on spoken language translation, Kyoto, Japan, pp 51–58

  • Bourigault D, Fabre C (2000) Approche linguistique pour l’analyse syntaxique de corpus [A linguistic approach to the syntactic corpus analysis]. Cah Gramm 25:131–151

    Google Scholar 

  • Brown PE, Della Pietra VJ, Della Pietra SA, Mercer RL (1993) The mathematics of statistical machine translation: Parameter estimation. Comput Ling 19:263–311

    Google Scholar 

  • Brown RD (1996) Example-based machine translation in the Pangloss system. In: COLING-96: Proceedings of the 16th international conference on computational linguistics, Copenhagen, Denmark, pp 169–174

  • Chiang D (2005) A hierarchical phrase-based model for statistical machine translation. In: 43rd annual meeting of the Association for Computational Linguistics, Ann Arbor, MI, pp 263–270

  • Ding Y, Palmer M (2004) Automatic learning of parallel dependency treelet pairs. In: IJCNLP-04, first international joint conference on natural language processing, Sanya, Hainan Island, China, pp 30–37

  • Ding Y, Palmer M (2005) Machine translation using probabilistic synchronous dependency insertion grammars. In: 43rd annual meeting of the Association for Computational Linguistics, Ann Arbor, MI, pp 541–548

  • Gildea D (2003) Loosely tree-based alignment for machine translation. In: 41st annual meeting of the Association for Computational Linguistics, Sapporo, Japan, pp 80–87

  • Gotti F, Langlais P, Macklovitch E, Bourigault D, Robichaud B, Coulombe C (2005) 3GTM: A third-generation translation memory. In: CLiNE 05 3rd computational linguistics in the north-east workshop, Gatineau, Québec, Canada, http://www.crtl.ca/cline05

  • Graehl J, Knight K (2004) Training tree transducers. In: Proceedings of the joint human language technology conference and the annual meeting of the North American chapter of the Association for Computational Linguistics, Boston, MA, pp 105–112

  • Groves D, Way A (2006) Hybrid data-driven models of machine translation. Mach Translat 19:299–321

    Google Scholar 

  • Hearne M, Way A (2003) Seeing the wood for the trees: Data-oriented translation. In: MT summit IX: Proceedings of the ninth machine translation summit, New Orleans, USA, pp 165–172

  • Hildebrand AS, Eck M, Vogel S, Waibel A (2005) Adaptation of the translation model for statistical machine translation based on information retrieval. In: Proceedings of the 10th annual meeting of the European Association for Machine Translation, Budapest, Hungary, pp 133–142

  • Koehn P (2004) Pharaoh: A beam search decoder for phrase-based statistical machine translation models. In: Frederking RE, Taylor KB (eds) Machine translation: From real users to research; 6th conference of the Association for Machine Translation in the Americas, AMTA 2004, Washington, DC, USA, September/October 2004, Springer, Berlin, Germany, pp 115–124

  • Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In HLT-NAACL: Human language technology conference of the North American Chapter of the Association for Computational Linguistics, Edmonton, Alberta, Canada, pp 127–133

  • Langlais P, Simard M (2003) De la traduction probabiliste aux mémoires de traduction (ou l’inverse) [From statistical translation to translation memory (or vice versa)]. In: TALN 2003: Traitement automatique des langues naturelles VVF, Batz-sur-Mer, France, pp 195–204

  • Matusov E, Kanthak S, Ney H (2005) Efficient statistical machine translation with constraint reordering. In: Proceedings of the 10th annual meeting of the European Association for Machine Translation, Budapest, Hungary, pp 181–188

  • Melamed ID (2004) Statistical machine translation by parsing. In: 42nd annual meeting of the Association for Computational Linguistics, Barcelona, Spain, pp 653–660

  • Menezes A, Quirk C (2005) Dependency treelet translation: The convergence of statistical and example-based machine-translation? In: MT summit X workshop: Second workshop on example-based machine translation, Phuket, Thailand, pp 99–108

  • Och FJ, Ney H (2000) Improved statistical alignment models. In: 38th annual meeting of the Association for Computational Linguistics, Hong Kong, China, pp 440–447

  • Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: 40th annual meeting of the Association for Computational Linguistics, Philadelphia, PA, pp 295–302

  • Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Ling 29:19–51

    Article  Google Scholar 

  • Ortiz-Martínez D, Garcìa-Varea I, Casacuberta F (2005) \({\mathsf{Thot}}\) : A toolkit to train phrase-based statistical translation models. In: The tenth machine translation summit, Phuket, Thailand, pp 141–148

  • Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: A method for automatic evaluation of machine translation. In: 40th annual meeting of the Association for Computational Linguistics, Philadelphia, PA, pp 311–318

  • Planas E (2000) Extending translation memories. In: Fifth EAMT workshop “Harvesting existing resources”, Ljubljana, Slovenia [no page numbers]

  • Poutsma A (2000) Data-oriented translation. In: Proceedings of the 18th international conference on computational linguistics: COLING 2000 in Europe, Saarbrücken, Germany, pp 635–641

  • Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2002) Numerical recipes in C++. The art of scientific programming. Cambridge University Press, Cambridge, UK

    Google Scholar 

  • Quirk C, Menezes A (2006) Dependency treelet translation: The convergence of statistical and example-based machine-translation? Mach Translat 20:45–66

    Google Scholar 

  • Quirk C, Menezes A, Cherry C (2005) Dependency treelet translation: Syntactically informed phrasal SMT. In: 43rd annual meeting of the Association for Computational Linguistics, Ann Arbor, MI, pp 271–279

  • Schmid H (1994) Probabilistic part-of-speech tagging using decision trees. In: International conference on new methods in language processing (NeMLaP), Manchester, UK, pp 44–49; repr. in Jones D, Somers H (eds) New methods in language processing, UCL Press, London (1997), pp 154–164

  • Simard M, Cancedda N, Cavestro B, Dymetman M, Gaussier E, Goutte C, Yamada K, Langlais P, Mauser A (2005) Translating with non-contiguous phrases. In: Proceedings of human language technology conference and conference on empirical methods in natural language processing, Vancouver, British Columbia, Canada, pp 755–762

  • Simard M, Langlais P (2001) Sub-sentential exploitation of translation memories. In: MT summit VIII: Machine translation in the information age, Santiago de Compostela, Spain, pp 335–339

  • Stolcke A (2002) SRILM—An extensible language modeling toolkit. In: 7th international conference on spoken language processing (ICSLP2002 – Interspeech 2002), Denver, CO, pp 901–904

  • Zens R, Ney H (2004) Improvements in phrase-based statistical machine translation. In: Proceedings of the human language technology conference and the North American Chapter of the Association for Computational Linguistics, Boston, MA, pp 257–264

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Philippe Langlais.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Langlais, P., Gotti, F. EBMT by tree-phrasing. Machine Translation 20, 1–23 (2006). https://doi.org/10.1007/s10590-006-9017-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-006-9017-3

Keywords

Navigation