Improve syntax-based translation using deep syntactic structures

Wu, Xianchao; Matsuzaki, Takuya; Tsujii, Jun’ichi

doi:10.1007/s10590-010-9081-6

Improve syntax-based translation using deep syntactic structures

Published: 15 June 2010

Volume 24, pages 141–157, (2010)
Cite this article

Machine Translation

Xianchao Wu¹,
Takuya Matsuzaki¹ &
Jun’ichi Tsujii^1,2,3

202 Accesses
1 Citation
Explore all metrics

Abstract

This paper introduces deep syntactic structures to syntax-based Statistical Machine Translation (SMT). We use a Head-driven Phrase Structure Grammar (HPSG) parser to obtain the deep syntactic structures of a sentence, which include not only a fine-grained syntactic property description but also a semantic representation. Considering the abundant information included in the deep syntactic structures, it is interesting to investigate whether or not they improve the traditional syntax-based translation models based on PCFG parsers. In order to use deep syntactic structures for SMT, this paper focuses on extracting tree-to-string translation rules from aligned HPSG tree–string pairs. The major challenge is to properly localize the non-local relations among nodes in an HPSG tree. To localize the semantic dependencies among words and phrases, which can be inherently non-local, a minimum covering tree is defined by taking a predicate word and its lexical/phrasal arguments as the frontier nodes. Starting from this definition, a linear-time algorithm is proposed to extract translation rules through one-time traversal of the leaf nodes in an HPSG tree. Extensive experiments on a tree-to-string translation system testified the effectiveness of our proposal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Birch A, Osborne M, Koehn P (2007) CCG supertags in factored statistical machine translation. In: Proceedings of the second workshop on statistical machine translation, Prague, pp 9–16
Carpenter B (1992) The logic of typed feature structures. Cambridge University Press, New York
Book MATH Google Scholar
Chiang D (2005) A hierarchical phrase-based model for statistical machine translation. In: Proceedings of ACL, Ann Arbor, MI, pp 263–270
Chiang D (2007) Hierarchical phrase-based translation. Comput Lingust 33(2): 201–228
Article Google Scholar
Ding Y, Palmer M (2005) Machine translation using probabilistic synchronous dependency insertion grammers. In: Proceedings of ACL, Ann Arbor, pp 541–548
Galley M, Hopkins M, Knight K, Marcu D (2004) What’s in a translation rule? In: Proceedings of HLT-NAACL
Galley M, Graehl J, Knight K, Marcu D, De Neefe S, Wang W, Thayer I (2006) Scalable inference and training of context-rich syntactic translation models. In: Proceedings of COLING-ACL, Sydney, pp 961–968
Hassan H, Sima’an K, Way A (2007) Supertagged phrase-based statistical machine translation. In: Proceedings of ACL, pp 288–295
Huang L, Knight K, Joshi A (2006) Statistical syntax-directed translation with extended domain of locality. In: Proceedings of 7th AMTA, Boston, MA
Koehn P (2004) Statistical significance tests for machine translation evaluation. In: Lin D, Wu D (eds) Proceedings of EMNLP 2004, pp 388–395
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the ACL 2007 demo and poster sessions, pp 177–180
Li Z, Callison-Burch C, Dyery C, Ganitkevitch J, Khudanpur S, Schwartz L, Thornton WNG, Weese J, Zaidan OF (2009) Demonstration of Joshua: an open source toolkit for parsing-based machine translation. In: Proceedings of the ACL-IJCNLP 2009 software demonstrations, pp 25–28
Liu Y, Liu Q, Lin S (2006) Tree-to-string alignment templates for statistical machine translation. In: Proceedings of COLING-ACL, pp 609–616
Liu Y, Lü Y, Liu Q (2009a) Improving tree-to-tree translation with packed forests. In: Proceedings of ACL-IJCNLP, Suntec, Singapore, pp 558–566
Liu Y, Mi H, Feng Y, Liu Q (2009b) Joint decoding with multiple translation models. In: Proceedings of ACL-IJCNLP, pp 576–584
Mi H, Huang L (2008) Forest-based translation rule extraction. In: Proceedings of EMNLP, Honolulu, Hawaii, pp 206–214
Mi H, Huang L, Liu Q (2008) Forest-based translation. In: Proceedings of ACL-08:HLT, Columbus, OH, pp 192–199
Miyao Y, Tsujii J (2008) Feature forest models for probabilistic HPSG parsing. Comput Lingust 34(1): 35–80
Article MathSciNet Google Scholar
Miyao Y, Ninomiya T, Tsujii J (2003) Probabilistic modeling of argument structures including non-local dependencies. In: Proceedings of the international conference on recent advances in natural language processing, Borovets, pp 285–291
Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of ACL, pp 160–167
Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1): 19–51
Article Google Scholar
Oepen S, Velldal E, Lønning JT, Meurer P, Rosén V (2007) Towards hybrid quality-oriented machine translation—on linguistics and probabilities in MT. In: Proceedings of the 11th international conference on theoretical and methodological issues in machine translation (TMI-07)
Papineni K, Roukos S, Ward T, Zhu W-J (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of ACL, pp 311–318
Pollard C, Sag IA (1994) Head-driven phrase structure grammar. University of Chicago Press, Chicago
Google Scholar
Quirk C, Menezes A, Cherry C (2005) Dependency treelet translation: syntactically informed phrasal SMT. In: Proceedings of ACL, pp 271–279
Riezler S, Maxwell JT III (2006) Grammatical machine translation. In: Proceedings of HLT-NAACL, Morristown, NJ, USA, pp 248–255
Sag IA, Wasow T, Bender EM (2003) Syntactic theory: a formal introduction. Number 152 in CSLI lecture notes. CSLI Publications, Stanford
Shen L, Xu J, Weischedel R (2008) A new string-to-dependency machine translation algorithm with a target dependency language model. In: Proceedings of ACL-08:HLT, Columbus, OH, pp 577–585
Stolcke A (2002) SRILM—an extensible language modeling toolkit. In: Proceedings of international conference on spoken language processing, pp 901–904
Utiyama M, Isahara H (2007) A Japanese-English patent parallel corpus. In: Proceedings of MT summit XI, Copenhagen, pp 475–482
Wu D (1997) Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Comput Linguist 23(3): 377–403
Google Scholar
Zaidan OF (2009) Z-MERT: a fully configurable open source tool for minimum error rate training of machine translation systems. Prague Bull Math Linguist 91: 79–88
Article Google Scholar
Zhang H, Zhang M, Li H, Aw A, Tan CL (2009) Forest-based tree sequence to string translation model. In: Proceedings of ACL-IJCNLP, Suntec, Singapore, pp 172–180

Download references

Author information

Authors and Affiliations

Department of Computer Science, The University of Tokyo, Tokyo, Japan
Xianchao Wu, Takuya Matsuzaki & Jun’ichi Tsujii
School of Computer Science, University of Manchester, Manchester, UK
Jun’ichi Tsujii
National Centre for Text Mining, Manchester, UK
Jun’ichi Tsujii

Authors

Xianchao Wu
View author publications
You can also search for this author in PubMed Google Scholar
Takuya Matsuzaki
View author publications
You can also search for this author in PubMed Google Scholar
Jun’ichi Tsujii
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xianchao Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, X., Matsuzaki, T. & Tsujii, J. Improve syntax-based translation using deep syntactic structures. Machine Translation 24, 141–157 (2010). https://doi.org/10.1007/s10590-010-9081-6

Download citation

Received: 02 November 2009
Accepted: 25 May 2010
Published: 15 June 2010
Issue Date: June 2010
DOI: https://doi.org/10.1007/s10590-010-9081-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improve syntax-based translation using deep syntactic structures

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Near-term advances in quantum natural language processing

Foundation and large language models: fundamentals, challenges, opportunities, and social impacts

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improve syntax-based translation using deep syntactic structures

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Near-term advances in quantum natural language processing

Foundation and large language models: fundamentals, challenges, opportunities, and social impacts

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation