Skip to main content
Log in

Example-based machine translation based on tree–string correspondence and statistical generation

  • Original Paper
  • Published:
Machine Translation

Abstract

This paper describes an example-based machine translation (EBMT) method based on tree–string correspondence (TSC) and statistical generation. In this method, the translation example is represented as a TSC, which is a triple consisting of a parse tree in the source language, a string in the target language, and the correspondence between the leaf node of the source-language tree and the substring of the target-language string. For an input sentence to be translated, it is first parsed into a tree. Then the TSC forest which best matches the input tree is searched for. Finally the translation is generated using a statistical generation model to combine the target-language strings of the TSCs. The generation model consists of three features: the semantic similarity between the tree in the TSC and the input tree, the translation probability of translating the source word into the target word, and the language-model probability for the target-language string. Based on the above method, we build an English-to-Chinese MT system. Experimental results indicate that the performance of our system is comparable with phrase-based statistical MT systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Akiba Y, Watanabe T, Sumita E (2002) Using language and translation models to select the best among outputs from multiple MT systems. In: Proceedings of the 19th international conference on computational linguistics. Taipei, Taiwan, pp 8–14

  • Al-Adhaileh MH, Kong TE (1999) Example-based machine translation based on the synchronous SSTC annotation schema. In: Proceedings of machine translation summit VII, “MT in the great translation era”. Singapore, pp 244–249

  • Al-Adhaileh MH, Kong TE, Zaharin Y (2002) A synchronization structure of SSTC and its applications in machine translation. In: Proceedings of the Coling-2002 post-conference workshop on machine translation in Asia. Taipei, Taiwan, pp 1–8

  • Aramaki E, Kurohashi S (2004) Example-based machine translation using structural translation examples. In: Proceedings of the IWSLT2004: International workshop on spoken language translation – Evaluation campaign on spoken language translation. Kyoto, Japan, pp 91–94

  • Aramaki E, Kurohashi S, Kashioka H, Tanaka H (2003) Word selection for EBMT based on monolingual similarity and translation confidence. In: Proceedings of the HLT/NAACL 2003 workshop on building and using parallel texts: Data driven machine translation and beyond. Edmonton, Canada, pp 57–64

  • Bikel D (2004) Intricacies in Collins’ parsing model. Comput Linguist 30:479–511

    Article  Google Scholar 

  • Brown PF, Della Pietra SA, Della Pietra VJ, Mercer RL (1993) The mathematics of statistical machine translation: Parameter estimation. Comput Linguist 19:263–311

    Google Scholar 

  • Callison-Burch C, Flournoy RS (2001) A program for automatically selecting the best output from multiple machine translation engines. In: Machine translation summit VIII, “machine translation in the information age”. Santiago de Compostela, Spain, pp 63–66

  • Collins M (1999) Head-driven statistical models for natural language parsing. PhD Thesis, University of Pennsylvania, Philadelphia, PA

  • Doddington G (2002) Automatic evaluation of machine translation quality using n-gram statistics. In: Proceedings of the ARPA workshop on human language technology notebook proceedings. San Diego, CA, pp 128–132

  • Fellbaum C (ed) (1998) WordNet: an electronic lexical database. MIT Press, Cambridge, MA

    Google Scholar 

  • Germann U (2003) Greedy decoding for statistical machine translation in almost linear time. In: HLT-NAACL: Human language technology conference of the North American chapter of the Association for Computational Linguistics. Edmonton, Alta, Canada, pp 72–79

  • Imamura K, Okuma H, Watanabe T, Sumita E (2004) Example-based machine translation based on syntactic transfer with statistical models. In: Coling: 20th international conference on computational linguistics. Geneva, Switzerland, pp 99–105

  • Kaki S, Yamada S, Sumita E (1999) Scoring multiple translations using character N-gram. In: Proceedings of the 5th natural language processing Pacific rim symposium “Closing the [sic]. Beijing, China, pp 298–302

  • Koehn P (2004) Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In: Frederking RE, Taylor KB (eds) Machine translation: from real users to research; 6th conference of the Association for Machine Translation in the Americas, AMTA 2004, Washington, DC, USA, September/October 2004. Springer, Berlin, Germany, pp 115–124

    Google Scholar 

  • Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: HLT-NAACL: Human language technology conference of the North American chapter of the Association for Computational Linguistics. Edmonton, Alta, Canada, pp 127–133

  • Langlais P, Gotti F (2006) EBMT by tree-phrasing. Mach Translat 20:1–25

    Google Scholar 

  • Lin D (1998) An information-theoretic definition of similarity. In: Machine learning: Proceedings of the fifteenth international conference (ICML ’98). Madison, Wisconsin, pp 296–304

  • Matsumoto Y, Ishimoto H, Utsuro T (1993) Structural matching of parallel texts. In: Proceedings of the 31st annual meeting of the Association for Computational Linguistics. Columbus, OH, pp 23–30

  • Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st annual meeting of the Association for Computational Linguistics. Sapporo, Japan, pp 160–167

  • Och FJ, Ney H (2000) Improved statistical alignment models. In: Proceedings of the 38th annual meeting of the Association for Computational Linguistics. Hong Kong, China, pp 440–447

  • Poutsma A (2000) Data-oriented translation. In: Proceedings of the 18th international conference on computational linguistics: COLING 2000 in Europe. Saarbrücken, Germany, pp 635–641

  • Shieber SM (1994) Restricting the weak generative capacity of synchronous tree adjoining grammar. Comput Intell 10:371–385

    Google Scholar 

  • Somers H (1999) Review article: example-based machine translation. Mach Translat 14:113–157

    Article  Google Scholar 

  • Stolcke A (2002) SRILM – An extensible language modeling toolkit. In: Proceedings of the 7th international conference on spoken language processing (ICSLP2002 – interspeech 2002). Denver, CO, pp 901–904

  • Utsuro T, Uchimoto K, Matsumoto M, Nagao M (1994) Thesaurus-based efficient example retrieval by generating retrieval queries from similarities. In: Proceedings of the 15th international conference on computational linguistics. Kyoto, Japan, pp 1044–1048

  • Watanabe H (1992) A similarity-driven transfer system. In: Proceedings of the fifteenth [sic] international conference on computational linguistics. Nantes, France, pp 770–776

  • Watanabe H (1995) A model of a bi-directional transfer mechanism using rule combinations. Mach Translat 10:269–291

    Article  Google Scholar 

  • Way A (2003) Machine translation using LFG-DOP. In: Bod R, Scha R, Sima’an K (eds) Data-oriented parsing. CSLI Publications, Stanford, CA, pp 359–384

    Google Scholar 

  • Yamada K, Knight K (2002) A decoder for syntax-based statistical MT. In: 40th annual meeting of the Association for Computational Linguistics. Philadelphia, PA, pp 303–310

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haifeng Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Z., Wang, H. & Wu, H. Example-based machine translation based on tree–string correspondence and statistical generation. Machine Translation 20, 25–41 (2006). https://doi.org/10.1007/s10590-006-9016-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-006-9016-4

Keywords

Navigation