Abstract
This paper introduces deep syntactic structures to syntax-based Statistical Machine Translation (SMT). We use a Head-driven Phrase Structure Grammar (HPSG) parser to obtain the deep syntactic structures of a sentence, which include not only a fine-grained syntactic property description but also a semantic representation. Considering the abundant information included in the deep syntactic structures, it is interesting to investigate whether or not they improve the traditional syntax-based translation models based on PCFG parsers. In order to use deep syntactic structures for SMT, this paper focuses on extracting tree-to-string translation rules from aligned HPSG tree–string pairs. The major challenge is to properly localize the non-local relations among nodes in an HPSG tree. To localize the semantic dependencies among words and phrases, which can be inherently non-local, a minimum covering tree is defined by taking a predicate word and its lexical/phrasal arguments as the frontier nodes. Starting from this definition, a linear-time algorithm is proposed to extract translation rules through one-time traversal of the leaf nodes in an HPSG tree. Extensive experiments on a tree-to-string translation system testified the effectiveness of our proposal.
Similar content being viewed by others
References
Birch A, Osborne M, Koehn P (2007) CCG supertags in factored statistical machine translation. In: Proceedings of the second workshop on statistical machine translation, Prague, pp 9–16
Carpenter B (1992) The logic of typed feature structures. Cambridge University Press, New York
Chiang D (2005) A hierarchical phrase-based model for statistical machine translation. In: Proceedings of ACL, Ann Arbor, MI, pp 263–270
Chiang D (2007) Hierarchical phrase-based translation. Comput Lingust 33(2): 201–228
Ding Y, Palmer M (2005) Machine translation using probabilistic synchronous dependency insertion grammers. In: Proceedings of ACL, Ann Arbor, pp 541–548
Galley M, Hopkins M, Knight K, Marcu D (2004) What’s in a translation rule? In: Proceedings of HLT-NAACL
Galley M, Graehl J, Knight K, Marcu D, De Neefe S, Wang W, Thayer I (2006) Scalable inference and training of context-rich syntactic translation models. In: Proceedings of COLING-ACL, Sydney, pp 961–968
Hassan H, Sima’an K, Way A (2007) Supertagged phrase-based statistical machine translation. In: Proceedings of ACL, pp 288–295
Huang L, Knight K, Joshi A (2006) Statistical syntax-directed translation with extended domain of locality. In: Proceedings of 7th AMTA, Boston, MA
Koehn P (2004) Statistical significance tests for machine translation evaluation. In: Lin D, Wu D (eds) Proceedings of EMNLP 2004, pp 388–395
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the ACL 2007 demo and poster sessions, pp 177–180
Li Z, Callison-Burch C, Dyery C, Ganitkevitch J, Khudanpur S, Schwartz L, Thornton WNG, Weese J, Zaidan OF (2009) Demonstration of Joshua: an open source toolkit for parsing-based machine translation. In: Proceedings of the ACL-IJCNLP 2009 software demonstrations, pp 25–28
Liu Y, Liu Q, Lin S (2006) Tree-to-string alignment templates for statistical machine translation. In: Proceedings of COLING-ACL, pp 609–616
Liu Y, Lü Y, Liu Q (2009a) Improving tree-to-tree translation with packed forests. In: Proceedings of ACL-IJCNLP, Suntec, Singapore, pp 558–566
Liu Y, Mi H, Feng Y, Liu Q (2009b) Joint decoding with multiple translation models. In: Proceedings of ACL-IJCNLP, pp 576–584
Mi H, Huang L (2008) Forest-based translation rule extraction. In: Proceedings of EMNLP, Honolulu, Hawaii, pp 206–214
Mi H, Huang L, Liu Q (2008) Forest-based translation. In: Proceedings of ACL-08:HLT, Columbus, OH, pp 192–199
Miyao Y, Tsujii J (2008) Feature forest models for probabilistic HPSG parsing. Comput Lingust 34(1): 35–80
Miyao Y, Ninomiya T, Tsujii J (2003) Probabilistic modeling of argument structures including non-local dependencies. In: Proceedings of the international conference on recent advances in natural language processing, Borovets, pp 285–291
Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of ACL, pp 160–167
Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1): 19–51
Oepen S, Velldal E, Lønning JT, Meurer P, Rosén V (2007) Towards hybrid quality-oriented machine translation—on linguistics and probabilities in MT. In: Proceedings of the 11th international conference on theoretical and methodological issues in machine translation (TMI-07)
Papineni K, Roukos S, Ward T, Zhu W-J (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of ACL, pp 311–318
Pollard C, Sag IA (1994) Head-driven phrase structure grammar. University of Chicago Press, Chicago
Quirk C, Menezes A, Cherry C (2005) Dependency treelet translation: syntactically informed phrasal SMT. In: Proceedings of ACL, pp 271–279
Riezler S, Maxwell JT III (2006) Grammatical machine translation. In: Proceedings of HLT-NAACL, Morristown, NJ, USA, pp 248–255
Sag IA, Wasow T, Bender EM (2003) Syntactic theory: a formal introduction. Number 152 in CSLI lecture notes. CSLI Publications, Stanford
Shen L, Xu J, Weischedel R (2008) A new string-to-dependency machine translation algorithm with a target dependency language model. In: Proceedings of ACL-08:HLT, Columbus, OH, pp 577–585
Stolcke A (2002) SRILM—an extensible language modeling toolkit. In: Proceedings of international conference on spoken language processing, pp 901–904
Utiyama M, Isahara H (2007) A Japanese-English patent parallel corpus. In: Proceedings of MT summit XI, Copenhagen, pp 475–482
Wu D (1997) Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Comput Linguist 23(3): 377–403
Zaidan OF (2009) Z-MERT: a fully configurable open source tool for minimum error rate training of machine translation systems. Prague Bull Math Linguist 91: 79–88
Zhang H, Zhang M, Li H, Aw A, Tan CL (2009) Forest-based tree sequence to string translation model. In: Proceedings of ACL-IJCNLP, Suntec, Singapore, pp 172–180
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wu, X., Matsuzaki, T. & Tsujii, J. Improve syntax-based translation using deep syntactic structures. Machine Translation 24, 141–157 (2010). https://doi.org/10.1007/s10590-010-9081-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-010-9081-6