Abstract
We present a phrase-based SMT approach in which the word-order problem is solved using syntactic transformation in the preprocessing phase (There is no reordering in the decoding phase.) We describe a syntactic transformation model based on the probabilistic context-free grammar. This model is trained by using bilingual corpus and a broad coverage parser of the source language. This phrase-based SMT approach is applicable to language pairs in which the target language is poor in resources. We considered translation from English to Vietnamese and from English to French. Our experiments showed significant BLEU-score improvements in comparison with Pharaoh, a state-of-the-art phrase-based SMT system.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bikel, D.M.: Intricacies of Collins’ Parsing Model. Computational Linguistics 30(4), 479–511 (2004)
Brown, P.F., Pietra, S.A.D., Pietra, V.J.D., Mercer, R.L.: The mathematics of statistical machine translation. Computational Linguistics 22(1), 39–69 (1993)
Charniak, E.: A maximum entropy inspired parser. In: Proceedings of HLT-NAACL (2000)
Charniak, E., Knight, K., Yamada, K.: Syntax-based language models for statistical machine translation. In: Proceedings of the MT Summit IX (2003)
Collins, M.: Head-Driven Statistical Models for Natural Language Parsing. PhD Thesis, University of Pennsylvania (1999)
Collins, M., Koehn, P., Kucerova, I.: Clause restructuring for statistical machine translation. In: Proceedings of ACL 2005 (2005)
Goldwater, S., McClosky, D.: Improving statistical MT through morphological analysis. In: Proceedings of EMNLP 2005 (2005)
Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of ACL 2003 (2003)
Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of HLT-NAACL 2003 (2003)
Koehn, P.: Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In: Frederking, R.E., Taylor, K.B. (eds.) AMTA 2004. LNCS (LNAI), vol. 3265, pp. 115–124. Springer, Heidelberg (2004)
Lehmann, E.L.: Testing Statistical Hypotheses, 2nd edn. Springer, Heidelberg (1986)
Marcu, D., Wong, W.: A phrase-based, joint probability model for statistical machine translation. In: Proceedings of EMNLP 2002 (2002)
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Buildind a large annotated corpus of English: The Penn TreeBank. Computational Linguistics 19, 313–330 (1993)
Melamed, I.D.: Statistical machine translation by parsing. In: Proceedings of ACL 2004 (2004)
Niessen, S., Ney, H.: Statistical machine translation with scarce resources using morpho-syntactic information. Computational Linguistics 30(2), 181–204 (2004)
Och, F.J., Ney, H.: Improved statistical alignment models. In: Proceedings of ACL 2000 (2000)
Och, F.J., Ney, H.: The alignment template approach to statistical machine translation. Computational Linguistics 30, 417–449 (2004)
Och, F.J., Gildea, D., Khudanpur, S., Sarkar, A., Yamada, K., Fraser, A., Kumar, S., Shen, L., Smith, D., Eng, K., Jain, V., Jin, Z., Radev, D.: A smorgasbord of features for statistical machine translation. In: Proceedings of HLT-NAACL 2004 (2004)
Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. Technical Report RC22176 (W0109-022), IBM Research Report (2001)
Shen, L., Sarkar, A., Och, F.J.: Discriminative reranking for machine translation. In: Proceedings of HLT-NAACL 2004 (2004)
Stolcke, A.: SRILM - An Extensible Language Modeling Toolkit. In: Proc. Intl. Conf. Spoken Language Processing, Denver, Colorado (September 2002)
Nguyen, T.P., Nguyen, V.V., Le, A.C.: Vietnam-ese Word Segmentation Using Hidden Markov Model. In: International Workshop for Computer, Information, and Communication Technologies in Korea and Vietnam (2003)
Nguyen, T.P., Shimazu, A.: Improving Phrase-Based SMT with Morpho-Syntactic Analysis and Transformation. In: Proceedings of AMTA 2006 (2006)
Xia, F., McCord, M.: Improving a statistical MT system with automatically learned rewrite patterns. In: Proceedings of COLING 2004 (2004)
Yamada, K., Knight, K.: A syntax-based statistical translation model. In: Proceedings of ACL 2001 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nguyen, T.P., Shimazu, A. (2006). A Syntactic Transformation Model for Statistical Machine Translation. In: Matsumoto, Y., Sproat, R.W., Wong, KF., Zhang, M. (eds) Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead. ICCPOL 2006. Lecture Notes in Computer Science(), vol 4285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11940098_7
Download citation
DOI: https://doi.org/10.1007/11940098_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49667-0
Online ISBN: 978-3-540-49668-7
eBook Packages: Computer ScienceComputer Science (R0)