Skip to main content

Dependency-Based Chinese-English Statistical Machine Translation

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4394))

Abstract

We present a Chinese-English Statistical Machine Translation (SMT) system based on dependency tree mappings. We use a state-of-the-art dependency parser to parse the English translation of the Penn Chinese Treebank to make it bilingual and then learn a tree-to-tree dependency mapping model. We also train a phrase-based translation model and collect a bilingual phrase lexicon to bootstrap a treelet translation model. For decoding, we use the same dependency parser on Chinese, using a log-linear framework to integrate the learned translation model with a variety of dependency tree based probability models, and then find the best English dependency tree by dynamic programming. Finally the English tree is flattened to produce the translation. We evaluate our system on the 863 and NIST 2005 Chinese-English MT test data and find that the dependency-based model significantly outperforms Caravan, our phrase-based SMT system which participated in NIST 2006 and IWSLT 2006.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aho, A.V., Ullman, J.D.: The Theory of Parsing, Translation, and Compiling, Volume I: Parsing. Prentice-Hall, Englewood Cliffs (1972)

    Google Scholar 

  2. Alshawi, H., Bangalore, S., Douglas, S.: Learning dependency transduction models as collections of finite state head transducers. Computational Linguistics 26(1), 45–64 (2000)

    Article  MathSciNet  Google Scholar 

  3. Brown, P., DellaPietra, S., DellaPietra, V., Mercer, R.: The mathematics of machine translation: Parameter estimation. Computational Linguistics. 19(2), 263–312 (1993)

    Google Scholar 

  4. Casacuberta, F., Vidal, E.: Machine translation with inferred stochastic finite-state transducers. Computational Linguistics 30(2), 205–225 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  5. Chelba, C., Engle, D., Jelinek, F., Jimenze, V., Khudanpur, S., Mangu, L., Printz, H., Ristad, E., Rosenfeld, R., Stolcke, A., Wu, D.: Structure and performance of a dependency language model. In: EUROSPEECH’97, Rhodes, Greece (1997)

    Google Scholar 

  6. Charniak, E., Knight, K., Yamada, K.: Syntax-based Language Models for Statistical Machine Translation. In: Proceedings of the 9th Machine Translation Summit, MIT Press, Cambridge (2003)

    Google Scholar 

  7. Chen, Y.D., Shi, X.D.: The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006. In: Proceedings of IWSLT, Kyoto, Japan, pp. 153–157 (2006)

    Google Scholar 

  8. Cmejrek, M., Curın, J., Havelka, J.: Czech-English Dependency-based Machine Translation. In: Proceedings of EACL 2003, April 12–17, pp. 83–90 (2003)

    Google Scholar 

  9. Collins, M.: Three generative, lexicalized models for statistical parsing. In: Proc. of ACL-97 (1997)

    Google Scholar 

  10. Collins, M.: Head-Driven Statistical Models for Natural Language Parsing, PhD-thesis, University of Pennsylvania, PA. P. Desain and H. Honing (1999)

    Google Scholar 

  11. Dorr, B.J., Pearl, L., Hwa, R., Habash, N.: DUSTer: A Method for Unraveling Cross-Language Divergences for Statistical Word-Level Alignment. In: Proceedings of the Fifth Conference of the Association for Machine Translation in the Americas (2002)

    Google Scholar 

  12. Fox, H.J.: Phrasal Cohesion and Statistical Machine Translation. In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), pp. 304–311 (2002)

    Google Scholar 

  13. Fox, H.J.: Dependency-based Statistical Machine Translation. In: Proceedings of the 2005 ACL Student Workshop (2005)

    Google Scholar 

  14. Gildea, D.: Dependencies vs. constituents for tree-based alignment. In: Proceedings of the EMNLP, pp. 214–221 (2004)

    Google Scholar 

  15. Koehn, P., Och, F.J., Marcu, D.: Statistical Phrase-Based Translation. In: Proceedings of NAACL/HLT (2003)

    Google Scholar 

  16. Knoke, D., Burke, P.J.: Log-Linear Models. Sage Publications, Inc, Newberry Park (1980)

    Book  MATH  Google Scholar 

  17. Lee, S.Z., Tsujii, J., Rim, H.C.: Lexicalized Hidden Markov Models for Part-of-Speech Tagging. In: Proceedings of 18th International Conference on Computational Linguistics, Saarbrucken, Germany, August (2000)

    Google Scholar 

  18. Liu, T., Ma, J.S., Li, S.: Building a Dependency Treebank for Improving Chinese Parse. Journal of Chinese Language and Computing 16(4), 207–224 (2006)

    Google Scholar 

  19. Marcus, M., Santorini, B., Marcinkiewicz, M.: Building a large annotated corpus of English: The Penn treebank. Computational Linguistics 19(2) (1993)

    Google Scholar 

  20. Melamed, I.D.: Statistical machine translation by parsing. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics: ACL 2004, pp. 653–660 (2004)

    Google Scholar 

  21. Och, F.J.: Minimum error rate training in statistical. In: Proceedings of the ACL, Sapporo, Japan, pp. 160–167 (2003)

    Google Scholar 

  22. Och, F.J., Ney, H.: The alignment template approach to statistical machine translation. Computational Linguistics 30(4) (2004)

    Google Scholar 

  23. Quirk, C., Menezes, A., Cherry, C.: Dependency Treelet Translation: Syntactically Informed Phrasal SMT. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, Ann Arbor, Michigan, USA, July (2005)

    Google Scholar 

  24. Wu, D.: Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics 23(3), 377–404 (1997)

    Google Scholar 

  25. Xue, N., Xia, F., Chiou, F.D., Palmer, M.: Building a Large Annotated Chinese Corpus: the Penn Chinese Treebank. Journal of Natural Language Engineering 11(2), 207–238 (2005)

    Article  Google Scholar 

  26. Yamada, K., Knight, K.: A Syntax-based Statistical Translation Model. In: Proceedings of the Conference of the Association for Computational Linguistics: ACL 2001 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Shi, X., Chen, Y., Jia, J. (2007). Dependency-Based Chinese-English Statistical Machine Translation. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2007. Lecture Notes in Computer Science, vol 4394. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70939-8_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-70939-8_34

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-70938-1

  • Online ISBN: 978-3-540-70939-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics