Skip to main content
Log in

Improve syntax-based translation using deep syntactic structures

  • Published:
Machine Translation

Abstract

This paper introduces deep syntactic structures to syntax-based Statistical Machine Translation (SMT). We use a Head-driven Phrase Structure Grammar (HPSG) parser to obtain the deep syntactic structures of a sentence, which include not only a fine-grained syntactic property description but also a semantic representation. Considering the abundant information included in the deep syntactic structures, it is interesting to investigate whether or not they improve the traditional syntax-based translation models based on PCFG parsers. In order to use deep syntactic structures for SMT, this paper focuses on extracting tree-to-string translation rules from aligned HPSG tree–string pairs. The major challenge is to properly localize the non-local relations among nodes in an HPSG tree. To localize the semantic dependencies among words and phrases, which can be inherently non-local, a minimum covering tree is defined by taking a predicate word and its lexical/phrasal arguments as the frontier nodes. Starting from this definition, a linear-time algorithm is proposed to extract translation rules through one-time traversal of the leaf nodes in an HPSG tree. Extensive experiments on a tree-to-string translation system testified the effectiveness of our proposal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Birch A, Osborne M, Koehn P (2007) CCG supertags in factored statistical machine translation. In: Proceedings of the second workshop on statistical machine translation, Prague, pp 9–16

  • Carpenter B (1992) The logic of typed feature structures. Cambridge University Press, New York

    Book  MATH  Google Scholar 

  • Chiang D (2005) A hierarchical phrase-based model for statistical machine translation. In: Proceedings of ACL, Ann Arbor, MI, pp 263–270

  • Chiang D (2007) Hierarchical phrase-based translation. Comput Lingust 33(2): 201–228

    Article  Google Scholar 

  • Ding Y, Palmer M (2005) Machine translation using probabilistic synchronous dependency insertion grammers. In: Proceedings of ACL, Ann Arbor, pp 541–548

  • Galley M, Hopkins M, Knight K, Marcu D (2004) What’s in a translation rule? In: Proceedings of HLT-NAACL

  • Galley M, Graehl J, Knight K, Marcu D, De Neefe S, Wang W, Thayer I (2006) Scalable inference and training of context-rich syntactic translation models. In: Proceedings of COLING-ACL, Sydney, pp 961–968

  • Hassan H, Sima’an K, Way A (2007) Supertagged phrase-based statistical machine translation. In: Proceedings of ACL, pp 288–295

  • Huang L, Knight K, Joshi A (2006) Statistical syntax-directed translation with extended domain of locality. In: Proceedings of 7th AMTA, Boston, MA

  • Koehn P (2004) Statistical significance tests for machine translation evaluation. In: Lin D, Wu D (eds) Proceedings of EMNLP 2004, pp 388–395

  • Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the ACL 2007 demo and poster sessions, pp 177–180

  • Li Z, Callison-Burch C, Dyery C, Ganitkevitch J, Khudanpur S, Schwartz L, Thornton WNG, Weese J, Zaidan OF (2009) Demonstration of Joshua: an open source toolkit for parsing-based machine translation. In: Proceedings of the ACL-IJCNLP 2009 software demonstrations, pp 25–28

  • Liu Y, Liu Q, Lin S (2006) Tree-to-string alignment templates for statistical machine translation. In: Proceedings of COLING-ACL, pp 609–616

  • Liu Y, Lü Y, Liu Q (2009a) Improving tree-to-tree translation with packed forests. In: Proceedings of ACL-IJCNLP, Suntec, Singapore, pp 558–566

  • Liu Y, Mi H, Feng Y, Liu Q (2009b) Joint decoding with multiple translation models. In: Proceedings of ACL-IJCNLP, pp 576–584

  • Mi H, Huang L (2008) Forest-based translation rule extraction. In: Proceedings of EMNLP, Honolulu, Hawaii, pp 206–214

  • Mi H, Huang L, Liu Q (2008) Forest-based translation. In: Proceedings of ACL-08:HLT, Columbus, OH, pp 192–199

  • Miyao Y, Tsujii J (2008) Feature forest models for probabilistic HPSG parsing. Comput Lingust 34(1): 35–80

    Article  MathSciNet  Google Scholar 

  • Miyao Y, Ninomiya T, Tsujii J (2003) Probabilistic modeling of argument structures including non-local dependencies. In: Proceedings of the international conference on recent advances in natural language processing, Borovets, pp 285–291

  • Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of ACL, pp 160–167

  • Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1): 19–51

    Article  Google Scholar 

  • Oepen S, Velldal E, Lønning JT, Meurer P, Rosén V (2007) Towards hybrid quality-oriented machine translation—on linguistics and probabilities in MT. In: Proceedings of the 11th international conference on theoretical and methodological issues in machine translation (TMI-07)

  • Papineni K, Roukos S, Ward T, Zhu W-J (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of ACL, pp 311–318

  • Pollard C, Sag IA (1994) Head-driven phrase structure grammar. University of Chicago Press, Chicago

    Google Scholar 

  • Quirk C, Menezes A, Cherry C (2005) Dependency treelet translation: syntactically informed phrasal SMT. In: Proceedings of ACL, pp 271–279

  • Riezler S, Maxwell JT III (2006) Grammatical machine translation. In: Proceedings of HLT-NAACL, Morristown, NJ, USA, pp 248–255

  • Sag IA, Wasow T, Bender EM (2003) Syntactic theory: a formal introduction. Number 152 in CSLI lecture notes. CSLI Publications, Stanford

  • Shen L, Xu J, Weischedel R (2008) A new string-to-dependency machine translation algorithm with a target dependency language model. In: Proceedings of ACL-08:HLT, Columbus, OH, pp 577–585

  • Stolcke A (2002) SRILM—an extensible language modeling toolkit. In: Proceedings of international conference on spoken language processing, pp 901–904

  • Utiyama M, Isahara H (2007) A Japanese-English patent parallel corpus. In: Proceedings of MT summit XI, Copenhagen, pp 475–482

  • Wu D (1997) Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Comput Linguist 23(3): 377–403

    Google Scholar 

  • Zaidan OF (2009) Z-MERT: a fully configurable open source tool for minimum error rate training of machine translation systems. Prague Bull Math Linguist 91: 79–88

    Article  Google Scholar 

  • Zhang H, Zhang M, Li H, Aw A, Tan CL (2009) Forest-based tree sequence to string translation model. In: Proceedings of ACL-IJCNLP, Suntec, Singapore, pp 172–180

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xianchao Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, X., Matsuzaki, T. & Tsujii, J. Improve syntax-based translation using deep syntactic structures. Machine Translation 24, 141–157 (2010). https://doi.org/10.1007/s10590-010-9081-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-010-9081-6

Keywords

Navigation