Skip to main content
Log in

A bilingual word alignment algorithm of Vietnamese-Chinese based on feature constraint

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

It is difficult to achieve auto-alignment between Vietnamese and Chinese, because their syntax and structure are quite different. In this case we present a novel method for the Vietnamese-Chinese word alignment which merges a variety of feature constraint models. In this article, an improved model based on the Vietnamese-Chinese progressive structure and offset features of word sequence is described. From this model which is trained by a log-linear model framework, and with parameters trained by the minimum error rate algorithm, the result of the Vietnamese-Chinese auto-alignment is obtained. The basic model of the experiments is IBM Model 3, and as experimental results suggest, this bilingual word alignment method for Vietnamese and Chinese performs well and precision, recall rates are increased by 28.57 and 25.02 %, AER is reduced by 14.25 %.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  1. Wang XZ, He YL, Wang DD (2014) Non-naive bayesian classifiers for classification problems with continuous attributes. Cybern IEEE Trans 44(1):21–39

    Article  Google Scholar 

  2. Wang XZ, Wang R, Feng HM, Wang HC (2014) A new approach to classifier fusion based on upper integral. IEEE Trans Cybern 44(5):620

    Article  MATH  Google Scholar 

  3. Jiang J, Yan X, Yu Z, Guo J, and Tian W (2014) A Chinese expert disambiguation method based on semi-supervised graph clustering. Intern J Mach Learn Cybern. doi:10.1007/s13042-014-0255-z

  4. Riley D and Gildea D (2012) Improving the IBM alignment models using variational bayes. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers, vol 2. Association for Computational Linguistics, pp 306–310

  5. Cherry C and Foster G (2012) Batch tuning strategies for statistical machine translation. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, pp 427–436

  6. Tang J, Gentzler E (2009) Globalisation, networks and translation: a Chinese perspective. Perspect Stud Transl 16(3–4):169–182

    Article  Google Scholar 

  7. Chu C, Nakazawa T, Kawahara D, and Kurohashi S (2012) Exploiting shared Chinese characters in Chinese word segmentation optimization for Chinese-Japanese machine translation. In: Proceedings of the 16th Annual Conference of the European Association for Machine Translation (EAMT’12)

  8. Wang Z, Dong S, and Guo Y (2012) Machine translation of Japanese-Chinese for conditional sentences based on templates. In: Proceedings of 2012 international conference on measurement, information and control, vol 1, pp 397–400

  9. Le HP and Ho TV (2008) A maximum entropy approach to sentence boundary detection of Vietnamese texts. In: IEEE international conference on research, innovation and vision for the future-RIVF 2008

  10. Huyên NTM, Roussanaly A, and Vinh, HT (2008) A hybrid approach to word segmentation of Vietnamese texts. In: language and automata theory and applications. Springer Berlin Heidelberg, pp 240–249

  11. Brown PF, Pietra VJD, Pietra SAD, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2):263–311

    Google Scholar 

  12. Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1):19–51

    Article  MATH  Google Scholar 

  13. Blunsom P and Cohn T (2006) Discriminative word alignment with conditional random fields. In: Proceedings of the 21st international conference on computational linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp 65–72

  14. Tufiş D, Ion R, Ceauşu A, and Ştefánescu D (2005) Combined word alignments. In: Proceedings of the ACL workshop on building and using parallel texts. Association for Computational Linguistics, pp 107–110

  15. Liu Y, Liu Q, Lin S (2010) Discriminative word alignment by linear modeling. Comput Linguist 36(3):303–339

    Article  Google Scholar 

  16. Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol 1, Association for Computational Linguistics, pp 160–167

  17. Och FJ and Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: Proceedings of the 40th annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, pp 295–302

Download references

Acknowledgments

This paper is supported by the National Nature Science Foundation of China (No. 61262041, No. 61175068, No. 61163022), and the Open Fund of Software Engineering Key Laboratory of Yunnan Province (No. 2011SE14), and the key project of Yunnan Nature Science Foundation (No. 2013FA030), and the Ministry of Education of Returned Overseas Students to Start Research and Fund Projects.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhengtao Yu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mo, Y., Guo, J., Yu, Z. et al. A bilingual word alignment algorithm of Vietnamese-Chinese based on feature constraint. Int. J. Mach. Learn. & Cyber. 6, 537–543 (2015). https://doi.org/10.1007/s13042-014-0293-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-014-0293-6

Keywords

Navigation