Abstract
It is difficult to achieve auto-alignment between Vietnamese and Chinese, because their syntax and structure are quite different. In this case we present a novel method for the Vietnamese-Chinese word alignment which merges a variety of feature constraint models. In this article, an improved model based on the Vietnamese-Chinese progressive structure and offset features of word sequence is described. From this model which is trained by a log-linear model framework, and with parameters trained by the minimum error rate algorithm, the result of the Vietnamese-Chinese auto-alignment is obtained. The basic model of the experiments is IBM Model 3, and as experimental results suggest, this bilingual word alignment method for Vietnamese and Chinese performs well and precision, recall rates are increased by 28.57 and 25.02 %, AER is reduced by 14.25 %.
Similar content being viewed by others
References
Wang XZ, He YL, Wang DD (2014) Non-naive bayesian classifiers for classification problems with continuous attributes. Cybern IEEE Trans 44(1):21–39
Wang XZ, Wang R, Feng HM, Wang HC (2014) A new approach to classifier fusion based on upper integral. IEEE Trans Cybern 44(5):620
Jiang J, Yan X, Yu Z, Guo J, and Tian W (2014) A Chinese expert disambiguation method based on semi-supervised graph clustering. Intern J Mach Learn Cybern. doi:10.1007/s13042-014-0255-z
Riley D and Gildea D (2012) Improving the IBM alignment models using variational bayes. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers, vol 2. Association for Computational Linguistics, pp 306–310
Cherry C and Foster G (2012) Batch tuning strategies for statistical machine translation. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, pp 427–436
Tang J, Gentzler E (2009) Globalisation, networks and translation: a Chinese perspective. Perspect Stud Transl 16(3–4):169–182
Chu C, Nakazawa T, Kawahara D, and Kurohashi S (2012) Exploiting shared Chinese characters in Chinese word segmentation optimization for Chinese-Japanese machine translation. In: Proceedings of the 16th Annual Conference of the European Association for Machine Translation (EAMT’12)
Wang Z, Dong S, and Guo Y (2012) Machine translation of Japanese-Chinese for conditional sentences based on templates. In: Proceedings of 2012 international conference on measurement, information and control, vol 1, pp 397–400
Le HP and Ho TV (2008) A maximum entropy approach to sentence boundary detection of Vietnamese texts. In: IEEE international conference on research, innovation and vision for the future-RIVF 2008
Huyên NTM, Roussanaly A, and Vinh, HT (2008) A hybrid approach to word segmentation of Vietnamese texts. In: language and automata theory and applications. Springer Berlin Heidelberg, pp 240–249
Brown PF, Pietra VJD, Pietra SAD, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2):263–311
Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1):19–51
Blunsom P and Cohn T (2006) Discriminative word alignment with conditional random fields. In: Proceedings of the 21st international conference on computational linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp 65–72
Tufiş D, Ion R, Ceauşu A, and Ştefánescu D (2005) Combined word alignments. In: Proceedings of the ACL workshop on building and using parallel texts. Association for Computational Linguistics, pp 107–110
Liu Y, Liu Q, Lin S (2010) Discriminative word alignment by linear modeling. Comput Linguist 36(3):303–339
Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol 1, Association for Computational Linguistics, pp 160–167
Och FJ and Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: Proceedings of the 40th annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, pp 295–302
Acknowledgments
This paper is supported by the National Nature Science Foundation of China (No. 61262041, No. 61175068, No. 61163022), and the Open Fund of Software Engineering Key Laboratory of Yunnan Province (No. 2011SE14), and the key project of Yunnan Nature Science Foundation (No. 2013FA030), and the Ministry of Education of Returned Overseas Students to Start Research and Fund Projects.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mo, Y., Guo, J., Yu, Z. et al. A bilingual word alignment algorithm of Vietnamese-Chinese based on feature constraint. Int. J. Mach. Learn. & Cyber. 6, 537–543 (2015). https://doi.org/10.1007/s13042-014-0293-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-014-0293-6