Abstract
This article presents an attempt to build a repository storing associations between simple syntactic dependency treelets in a source language and their corresponding phrases in a target language. We assess the usefulness of this resource in two different settings. First, we show that it improves upon a standard subsentential translation memory. Second, we observe improvements in translation quality when a standard statistical phrase-based translation engine is augmented with the ability to exploit such a repository.
Similar content being viewed by others
References
Bertoldi N, Cattoni R, Cettolo M, Federico M (2004) The ITC-irst statistical machine translation system for IWSLT-2004. In: International workshop on spoken language translation, Kyoto, Japan, pp 51–58
Bourigault D, Fabre C (2000) Approche linguistique pour l’analyse syntaxique de corpus [A linguistic approach to the syntactic corpus analysis]. Cah Gramm 25:131–151
Brown PE, Della Pietra VJ, Della Pietra SA, Mercer RL (1993) The mathematics of statistical machine translation: Parameter estimation. Comput Ling 19:263–311
Brown RD (1996) Example-based machine translation in the Pangloss system. In: COLING-96: Proceedings of the 16th international conference on computational linguistics, Copenhagen, Denmark, pp 169–174
Chiang D (2005) A hierarchical phrase-based model for statistical machine translation. In: 43rd annual meeting of the Association for Computational Linguistics, Ann Arbor, MI, pp 263–270
Ding Y, Palmer M (2004) Automatic learning of parallel dependency treelet pairs. In: IJCNLP-04, first international joint conference on natural language processing, Sanya, Hainan Island, China, pp 30–37
Ding Y, Palmer M (2005) Machine translation using probabilistic synchronous dependency insertion grammars. In: 43rd annual meeting of the Association for Computational Linguistics, Ann Arbor, MI, pp 541–548
Gildea D (2003) Loosely tree-based alignment for machine translation. In: 41st annual meeting of the Association for Computational Linguistics, Sapporo, Japan, pp 80–87
Gotti F, Langlais P, Macklovitch E, Bourigault D, Robichaud B, Coulombe C (2005) 3GTM: A third-generation translation memory. In: CLiNE 05 3rd computational linguistics in the north-east workshop, Gatineau, Québec, Canada, http://www.crtl.ca/cline05
Graehl J, Knight K (2004) Training tree transducers. In: Proceedings of the joint human language technology conference and the annual meeting of the North American chapter of the Association for Computational Linguistics, Boston, MA, pp 105–112
Groves D, Way A (2006) Hybrid data-driven models of machine translation. Mach Translat 19:299–321
Hearne M, Way A (2003) Seeing the wood for the trees: Data-oriented translation. In: MT summit IX: Proceedings of the ninth machine translation summit, New Orleans, USA, pp 165–172
Hildebrand AS, Eck M, Vogel S, Waibel A (2005) Adaptation of the translation model for statistical machine translation based on information retrieval. In: Proceedings of the 10th annual meeting of the European Association for Machine Translation, Budapest, Hungary, pp 133–142
Koehn P (2004) Pharaoh: A beam search decoder for phrase-based statistical machine translation models. In: Frederking RE, Taylor KB (eds) Machine translation: From real users to research; 6th conference of the Association for Machine Translation in the Americas, AMTA 2004, Washington, DC, USA, September/October 2004, Springer, Berlin, Germany, pp 115–124
Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In HLT-NAACL: Human language technology conference of the North American Chapter of the Association for Computational Linguistics, Edmonton, Alberta, Canada, pp 127–133
Langlais P, Simard M (2003) De la traduction probabiliste aux mémoires de traduction (ou l’inverse) [From statistical translation to translation memory (or vice versa)]. In: TALN 2003: Traitement automatique des langues naturelles VVF, Batz-sur-Mer, France, pp 195–204
Matusov E, Kanthak S, Ney H (2005) Efficient statistical machine translation with constraint reordering. In: Proceedings of the 10th annual meeting of the European Association for Machine Translation, Budapest, Hungary, pp 181–188
Melamed ID (2004) Statistical machine translation by parsing. In: 42nd annual meeting of the Association for Computational Linguistics, Barcelona, Spain, pp 653–660
Menezes A, Quirk C (2005) Dependency treelet translation: The convergence of statistical and example-based machine-translation? In: MT summit X workshop: Second workshop on example-based machine translation, Phuket, Thailand, pp 99–108
Och FJ, Ney H (2000) Improved statistical alignment models. In: 38th annual meeting of the Association for Computational Linguistics, Hong Kong, China, pp 440–447
Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: 40th annual meeting of the Association for Computational Linguistics, Philadelphia, PA, pp 295–302
Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Ling 29:19–51
Ortiz-Martínez D, Garcìa-Varea I, Casacuberta F (2005) \({\mathsf{Thot}}\) : A toolkit to train phrase-based statistical translation models. In: The tenth machine translation summit, Phuket, Thailand, pp 141–148
Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: A method for automatic evaluation of machine translation. In: 40th annual meeting of the Association for Computational Linguistics, Philadelphia, PA, pp 311–318
Planas E (2000) Extending translation memories. In: Fifth EAMT workshop “Harvesting existing resources”, Ljubljana, Slovenia [no page numbers]
Poutsma A (2000) Data-oriented translation. In: Proceedings of the 18th international conference on computational linguistics: COLING 2000 in Europe, Saarbrücken, Germany, pp 635–641
Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2002) Numerical recipes in C++. The art of scientific programming. Cambridge University Press, Cambridge, UK
Quirk C, Menezes A (2006) Dependency treelet translation: The convergence of statistical and example-based machine-translation? Mach Translat 20:45–66
Quirk C, Menezes A, Cherry C (2005) Dependency treelet translation: Syntactically informed phrasal SMT. In: 43rd annual meeting of the Association for Computational Linguistics, Ann Arbor, MI, pp 271–279
Schmid H (1994) Probabilistic part-of-speech tagging using decision trees. In: International conference on new methods in language processing (NeMLaP), Manchester, UK, pp 44–49; repr. in Jones D, Somers H (eds) New methods in language processing, UCL Press, London (1997), pp 154–164
Simard M, Cancedda N, Cavestro B, Dymetman M, Gaussier E, Goutte C, Yamada K, Langlais P, Mauser A (2005) Translating with non-contiguous phrases. In: Proceedings of human language technology conference and conference on empirical methods in natural language processing, Vancouver, British Columbia, Canada, pp 755–762
Simard M, Langlais P (2001) Sub-sentential exploitation of translation memories. In: MT summit VIII: Machine translation in the information age, Santiago de Compostela, Spain, pp 335–339
Stolcke A (2002) SRILM—An extensible language modeling toolkit. In: 7th international conference on spoken language processing (ICSLP2002 – Interspeech 2002), Denver, CO, pp 901–904
Zens R, Ney H (2004) Improvements in phrase-based statistical machine translation. In: Proceedings of the human language technology conference and the North American Chapter of the Association for Computational Linguistics, Boston, MA, pp 257–264
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Langlais, P., Gotti, F. EBMT by tree-phrasing. Machine Translation 20, 1–23 (2006). https://doi.org/10.1007/s10590-006-9017-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-006-9017-3