Abstract
We describe a novel approach to MT that combines the strengths of the two leading corpus-based approaches: Phrasal SMT and EBMT. We use a syntactically informed decoder and reordering model based on the source dependency tree, in combination with conventional SMT models to incorporate the power of phrasal SMT with the linguistic generality available in a parser. We show that this approach significantly outperforms a leading string-based Phrasal SMT decoder and an EBMT system. We present results from two radically different language pairs, and investigate the sensitivity of this approach to parse quality by using two distinct parsers and oracle experiments. We also validate our automated bleu scores with a small human evaluation.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bikel DM (2004) Intricacies of Collins parsing model. Comput Ling 30:479–511
Brown PF, Della Pietra SA, Della Pietra VJ, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Ling 19:263–311
Carl M (2005) A system-theoretic view of EBMT. Mach Translat 19:229–249
Carl M, Way A (eds) (2003) Recent advances in example-based machine translation. Kluwer Academic Publishers Dordrecht, The Netherlands
Charniak E, Knight K, Yamada K (2003) Syntax-based language models for statistical machine translation. In: MT Summit IX, Proceedings of the ninth machine translation summit, New Orleans, USA, pp 40–46
Chiang D (2005) A hierarchical phrase-based model for statistical machine translation. In: 43rd annual meeting of the Association for Computational Linguistics, Ann Arbor, MI pp 263–270
Chickering DM (2002) The WinMine toolkit. Technical Report MSR-TR-2002-103, Microsoft Research, Seattle, WA
Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc 39:1–38
Goodman J (2001) A bit of progress in language modeling. Technical Report MSR-TR-2001-72, Microsoft Research, Seattle, WA
Graehl J, Knight K (2004) Training tree transducers. In: HLT-NAACL 2004: Human language technology conference of North American chapter of the Association for Computational Linguistics, Boston, MA, pp 105–112
Groves D, Way A (2005) Hybrid data-driven models of machine translation. Mach Translat 19:299–321
Heidorn G (2000) Intelligent writing assistance. In: Dale R, Moisl H, Somers H (eds) Handbook of natural language processing. Marcel Dekker New York, NY, pp 181–208
Hutchins J (2005) Example-based machine translation—a review and commentary. Mach Translat 19:197–211
Imamura K, Okuma H, Sumita E (2005) Practical approach to syntax-based statistical machine translation. In: MT Summit X, The tenth machine translation summit, Phuket, Thailand, pp 267–274
Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: HLT-NAACL 2003: Human language technology conference of North American chapter of the Association for Computational Linguistics, Edmonton, Alberta, Canada, pp 127–133
Kurohashi S, Nakazawa T, Alexis K, Kawahara D (2005) Example-based machine translation pursuing fully structural NLP. In: Proceedings of the international workshop on spoken language translation, Pittsburgh, PA, pp.207–212
Langlais P, Gotti F (2006) EBMT by tree-phrasing. Mach Translat 20:1–25
Lepage Y, Denoual E (2005) Purest ever example-based machine translation: detailed presentation and assessment. Mach Translat 19:251–280
Lin D (2004) A path-based transfer model for machine translation. In: Coling: 20th international conference on computational linguistics, Geneva, Switzerland, pp 625–630
Melamed ID (2004) Statistical machine translation by parsing. In: 42nd annual meeting of the Association for Computational Linguistics, Barcelona, Spain, pp 653–660
Menezes A, Richardson SD (2003) A best-first alignment algorithm for extraction of transfer mappings. In: Carl and Way (2003), pp 421–442
Och FJ (2003) Minimum error rate training in statistical machine translation. In: 41st annual meeting of the Association for Computational Linguistics, Sapporo, Japan, pp 160–167
Och FJ, Gildea D, Khudanpur S, Sarkar A, Yamada K, Fraser A, Kumar S, Shen L, Smith D, Eng K, Jain V, Jin Z, Radev D (2004) A smorgasbord of features for statistical machine translation. In: HLT-NAACL 2004: Human language technology conference of North American chapter of the Association for Computational Linguistics, Boston, MA, pp 161–168
Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: 40th annual meeting of the Association for Computational Linguistics, Philadelphia, PA, pp 295–302
Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Ling 29:19–51
Och FJ, Ney H (2004) The alignment template approach to statistical machine translation. Comput Ling 30:417–449
Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: A method for automatic evaluation of machine translation. In: 40th annual meeting of the Association for Computational Linguistics, Philadelphia, PA, pp 311–318
Somers H (2003) An overview of EBMT. In: Carl and Way (2003), pp 3–58 [Revised version of article in Mach Translat 14 (1999), 113–158]
Vogel S, Zhang Y, Huang F, Tribble A, Venugopal A, Zhao B, Waibel A (2003) The CMU statistical machine translation system. In: MT Summit IX, Proceedings of the ninth machine translation summit, New Orleans, USA, pp 402–409
Way A, Gough N (2005) Comparing example-based and statistical machine translation. Nat Lang Eng 11:295–309
Wu D (1997) Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Comput Ling 23:377–403
Wu D (2005) MT model space: Statistical vs. compositional vs. example-based machine translation. Mach Translat 19:213–227
Yamada K, Knight K (2002) A decoder for syntax-based statistical MT. In: 40th annual meeting of the Association for Computational Linguistics, Philadelphia, PA, pp 303–310
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Quirk, C., Menezes, A. Dependency treelet translation: the convergence of statistical and example-based machine-translation?. Machine Translation 20, 43–65 (2006). https://doi.org/10.1007/s10590-006-9008-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-006-9008-4