Dependency-Based Chinese-English Statistical Machine Translation

Shi, Xiaodong; Chen, Yidong; Jia, Jianfeng

doi:10.1007/978-3-540-70939-8_34

Xiaodong Shi¹,
Yidong Chen¹ &
Jianfeng Jia¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4394))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1529 Accesses
1 Citations

Abstract

We present a Chinese-English Statistical Machine Translation (SMT) system based on dependency tree mappings. We use a state-of-the-art dependency parser to parse the English translation of the Penn Chinese Treebank to make it bilingual and then learn a tree-to-tree dependency mapping model. We also train a phrase-based translation model and collect a bilingual phrase lexicon to bootstrap a treelet translation model. For decoding, we use the same dependency parser on Chinese, using a log-linear framework to integrate the learned translation model with a variety of dependency tree based probability models, and then find the best English dependency tree by dynamic programming. Finally the English tree is flattened to produce the translation. We evaluate our system on the 863 and NIST 2005 Chinese-English MT test data and find that the dependency-based model significantly outperforms Caravan, our phrase-based SMT system which participated in NIST 2006 and IWSLT 2006.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aho, A.V., Ullman, J.D.: The Theory of Parsing, Translation, and Compiling, Volume I: Parsing. Prentice-Hall, Englewood Cliffs (1972)
Google Scholar
Alshawi, H., Bangalore, S., Douglas, S.: Learning dependency transduction models as collections of finite state head transducers. Computational Linguistics 26(1), 45–64 (2000)
Article MathSciNet Google Scholar
Brown, P., DellaPietra, S., DellaPietra, V., Mercer, R.: The mathematics of machine translation: Parameter estimation. Computational Linguistics. 19(2), 263–312 (1993)
Google Scholar
Casacuberta, F., Vidal, E.: Machine translation with inferred stochastic finite-state transducers. Computational Linguistics 30(2), 205–225 (2004)
Article MathSciNet MATH Google Scholar
Chelba, C., Engle, D., Jelinek, F., Jimenze, V., Khudanpur, S., Mangu, L., Printz, H., Ristad, E., Rosenfeld, R., Stolcke, A., Wu, D.: Structure and performance of a dependency language model. In: EUROSPEECH’97, Rhodes, Greece (1997)
Google Scholar
Charniak, E., Knight, K., Yamada, K.: Syntax-based Language Models for Statistical Machine Translation. In: Proceedings of the 9th Machine Translation Summit, MIT Press, Cambridge (2003)
Google Scholar
Chen, Y.D., Shi, X.D.: The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006. In: Proceedings of IWSLT, Kyoto, Japan, pp. 153–157 (2006)
Google Scholar
Cmejrek, M., Curın, J., Havelka, J.: Czech-English Dependency-based Machine Translation. In: Proceedings of EACL 2003, April 12–17, pp. 83–90 (2003)
Google Scholar
Collins, M.: Three generative, lexicalized models for statistical parsing. In: Proc. of ACL-97 (1997)
Google Scholar
Collins, M.: Head-Driven Statistical Models for Natural Language Parsing, PhD-thesis, University of Pennsylvania, PA. P. Desain and H. Honing (1999)
Google Scholar
Dorr, B.J., Pearl, L., Hwa, R., Habash, N.: DUSTer: A Method for Unraveling Cross-Language Divergences for Statistical Word-Level Alignment. In: Proceedings of the Fifth Conference of the Association for Machine Translation in the Americas (2002)
Google Scholar
Fox, H.J.: Phrasal Cohesion and Statistical Machine Translation. In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), pp. 304–311 (2002)
Google Scholar
Fox, H.J.: Dependency-based Statistical Machine Translation. In: Proceedings of the 2005 ACL Student Workshop (2005)
Google Scholar
Gildea, D.: Dependencies vs. constituents for tree-based alignment. In: Proceedings of the EMNLP, pp. 214–221 (2004)
Google Scholar
Koehn, P., Och, F.J., Marcu, D.: Statistical Phrase-Based Translation. In: Proceedings of NAACL/HLT (2003)
Google Scholar
Knoke, D., Burke, P.J.: Log-Linear Models. Sage Publications, Inc, Newberry Park (1980)
Book MATH Google Scholar
Lee, S.Z., Tsujii, J., Rim, H.C.: Lexicalized Hidden Markov Models for Part-of-Speech Tagging. In: Proceedings of 18th International Conference on Computational Linguistics, Saarbrucken, Germany, August (2000)
Google Scholar
Liu, T., Ma, J.S., Li, S.: Building a Dependency Treebank for Improving Chinese Parse. Journal of Chinese Language and Computing 16(4), 207–224 (2006)
Google Scholar
Marcus, M., Santorini, B., Marcinkiewicz, M.: Building a large annotated corpus of English: The Penn treebank. Computational Linguistics 19(2) (1993)
Google Scholar
Melamed, I.D.: Statistical machine translation by parsing. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics: ACL 2004, pp. 653–660 (2004)
Google Scholar
Och, F.J.: Minimum error rate training in statistical. In: Proceedings of the ACL, Sapporo, Japan, pp. 160–167 (2003)
Google Scholar
Och, F.J., Ney, H.: The alignment template approach to statistical machine translation. Computational Linguistics 30(4) (2004)
Google Scholar
Quirk, C., Menezes, A., Cherry, C.: Dependency Treelet Translation: Syntactically Informed Phrasal SMT. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, Ann Arbor, Michigan, USA, July (2005)
Google Scholar
Wu, D.: Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics 23(3), 377–404 (1997)
Google Scholar
Xue, N., Xia, F., Chiou, F.D., Palmer, M.: Building a Large Annotated Chinese Corpus: the Penn Chinese Treebank. Journal of Natural Language Engineering 11(2), 207–238 (2005)
Article Google Scholar
Yamada, K., Knight, K.: A Syntax-based Statistical Translation Model. In: Proceedings of the Conference of the Association for Computational Linguistics: ACL 2001 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Xiamen University, Xiamen 361005, Fujian, China
Xiaodong Shi, Yidong Chen & Jianfeng Jia

Authors

Xiaodong Shi
View author publications
You can also search for this author in PubMed Google Scholar
Yidong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jianfeng Jia
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shi, X., Chen, Y., Jia, J. (2007). Dependency-Based Chinese-English Statistical Machine Translation. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2007. Lecture Notes in Computer Science, vol 4394. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70939-8_34

Download citation

DOI: https://doi.org/10.1007/978-3-540-70939-8_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70938-1
Online ISBN: 978-3-540-70939-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics