Abstract
In machine translation, the re-ordering of word from source to target language is one of the major steps that affect mainly the performance of the system. Among many approaches for this type of problem, syntactic is an effective method for handling word-order in a statistical machine translation (SMT) system. In this paper, we introduce a word re-ordering approach that makes use the syntactic rules extracted from parse tree for the English-Vietnamese SMT system. Our word re-ordering rule set includes rules in noun phrase, verb phrase and adjective phrase. According to the experiment result, the noun phrase rules are the most significant rules of all. Compared with the MOSES phrase-based SMT system [1], these rules can improve BLEU score of 3.24 on our testing corpus. Moreover, we also conduct other experiments by using different combinations of rules to study their effectiveness. And we find that the translation performance for each corpus can be tuned by different ways of combination.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constrantin, A., Moses, E.H.: Open source toolkit for statistical machine translation. In: Proceedings of ACL, Demonstration Session (2007)
Xia, F., McCord, M.: Improving a statistical MT system with automatically learned rewrite patterns. In: Proceedings of COLING (2004)
Quang, P.-C., Tuoutanova, K.: A Discriminative syntactic word order model for machine translation. In: Proceedings of ACL 45th, pp. 9–16 (2007)
Wang, C., Collins, M., Koehn, P.: Chinese syntactic re-ordering for statistical machine translation. In: Proceedings of 2007 Joint Conference on Emperical Methods in NLP and CL NLP, pp. 737–745 (2007)
Collins, M., Koehn, P., Kucerova, I.: Clause restructuring for statistical machine translation. In: Proceedings of the 43rd Annual Meeting of the Assoc. for Computational Linguistics (ACL), Ann Arbor, Michigan, pp. 531–540 (2005)
Nguyen, T.P., Shimazu, A.: A syntactic transformation model for statistical machine translation. In: Matsumoto, Y., Sproat, R.W., Wong, K.-F., Zhang, M. (eds.) ICCPOL 2006. LNCS (LNAI) vol. 4285, pp. 63–74. Springer, Heidelberg (2006)
Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proc. of the HLT-NAACL 2003 conference, Edmonton, Alberta, Canada, pp. 127–133 (2003)
Kumar, S., Byrne, W.: Local phrase re-ordering models for statistical machine translation. In: Proceedings of Human Language Technology Conference and Conference on Emperical Methods in NLP, pp. 161–168 (2007)
Sanchis, G., Casacuberta, F.: N-best re-ordering in statistical machine translation. Jornadas en Techlogia del Habla, pp. 99–104 (2006)
Zhang, Y., Zens, R., Ney, H.: Chunk-level re-ordering of source language with automatically learned rules for statistical machine translation. In: Proceedings of SSST, NAACL-HLT, pp. 1–8 (2007)
Dien, D.: Comparision word order of attributions in English and Vietnamese. In Journal of Social Sciences and Humanities. University of Social Sciences and Humanities. HCM City (2001)
Dinh, D.: Building an Annotated English-Vietnamese parallel Corpus. In MKS: A Journal of Southeast Asian Linguistics and Languages, 35, 21–36 (2005)
Dien, D., Thuy, V.: A maximum entropy approach for Vietnamese word segmentation. In: Proceedings of 4th IEEE International Conference RIVF 2006, Ho Chi Minh City, Vietnam, February 12-16, 2006, pp. 247–252 (2006)
Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of ACL 2003 (2003)
Li, C.-H., Zhang, D., Li, M., Zhou, M., Li, M., Guan, Y.: A probabilistic approach to syntax-based re-ordering for statistical machine translation. In: Proceedings of 45th ACL, pp. 720–727 (2007)
Papineni, K.A., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: The Proc. of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nguyen Thi, HN., Dinh, D. (2008). A Syntactic-based Word Re-ordering for English-Vietnamese Statistical Machine Translation System. In: Ho, TB., Zhou, ZH. (eds) PRICAI 2008: Trends in Artificial Intelligence. PRICAI 2008. Lecture Notes in Computer Science(), vol 5351. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89197-0_75
Download citation
DOI: https://doi.org/10.1007/978-3-540-89197-0_75
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89196-3
Online ISBN: 978-3-540-89197-0
eBook Packages: Computer ScienceComputer Science (R0)