A bilingual word alignment algorithm of Vietnamese-Chinese based on feature constraint

Mo, Yuanyuan; Guo, Jianyi; Yu, Zhengtao; Luo, Lin; Gao, Shengxiang

doi:10.1007/s13042-014-0293-6

A bilingual word alignment algorithm of Vietnamese-Chinese based on feature constraint

Original Article
Published: 26 August 2014

Volume 6, pages 537–543, (2015)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Yuanyuan Mo¹,
Jianyi Guo^1,2,
Zhengtao Yu^1,2,
Lin Luo¹ &
…
Shengxiang Gao¹

276 Accesses
4 Citations
Explore all metrics

Abstract

It is difficult to achieve auto-alignment between Vietnamese and Chinese, because their syntax and structure are quite different. In this case we present a novel method for the Vietnamese-Chinese word alignment which merges a variety of feature constraint models. In this article, an improved model based on the Vietnamese-Chinese progressive structure and offset features of word sequence is described. From this model which is trained by a log-linear model framework, and with parameters trained by the minimum error rate algorithm, the result of the Vietnamese-Chinese auto-alignment is obtained. The basic model of the experiments is IBM Model 3, and as experimental results suggest, this bilingual word alignment method for Vietnamese and Chinese performs well and precision, recall rates are increased by 28.57 and 25.02 %, AER is reduced by 14.25 %.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Large-Language-Models (LLM)-Based AI Chatbots: Architecture, In-Depth Analysis and Their Performance Evaluation

The English Sublexical Toolkit: Methods for indexing sound–spelling consistency

Article Open access 09 April 2024

Pre-trained models for natural language processing: A survey

Article 15 September 2020

References

Wang XZ, He YL, Wang DD (2014) Non-naive bayesian classifiers for classification problems with continuous attributes. Cybern IEEE Trans 44(1):21–39
Article Google Scholar
Wang XZ, Wang R, Feng HM, Wang HC (2014) A new approach to classifier fusion based on upper integral. IEEE Trans Cybern 44(5):620
Article MATH Google Scholar
Jiang J, Yan X, Yu Z, Guo J, and Tian W (2014) A Chinese expert disambiguation method based on semi-supervised graph clustering. Intern J Mach Learn Cybern. doi:10.1007/s13042-014-0255-z
Riley D and Gildea D (2012) Improving the IBM alignment models using variational bayes. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers, vol 2. Association for Computational Linguistics, pp 306–310
Cherry C and Foster G (2012) Batch tuning strategies for statistical machine translation. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, pp 427–436
Tang J, Gentzler E (2009) Globalisation, networks and translation: a Chinese perspective. Perspect Stud Transl 16(3–4):169–182
Article Google Scholar
Chu C, Nakazawa T, Kawahara D, and Kurohashi S (2012) Exploiting shared Chinese characters in Chinese word segmentation optimization for Chinese-Japanese machine translation. In: Proceedings of the 16th Annual Conference of the European Association for Machine Translation (EAMT’12)
Wang Z, Dong S, and Guo Y (2012) Machine translation of Japanese-Chinese for conditional sentences based on templates. In: Proceedings of 2012 international conference on measurement, information and control, vol 1, pp 397–400
Le HP and Ho TV (2008) A maximum entropy approach to sentence boundary detection of Vietnamese texts. In: IEEE international conference on research, innovation and vision for the future-RIVF 2008
Huyên NTM, Roussanaly A, and Vinh, HT (2008) A hybrid approach to word segmentation of Vietnamese texts. In: language and automata theory and applications. Springer Berlin Heidelberg, pp 240–249
Brown PF, Pietra VJD, Pietra SAD, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2):263–311
Google Scholar
Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1):19–51
Article MATH Google Scholar
Blunsom P and Cohn T (2006) Discriminative word alignment with conditional random fields. In: Proceedings of the 21st international conference on computational linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp 65–72
Tufiş D, Ion R, Ceauşu A, and Ştefánescu D (2005) Combined word alignments. In: Proceedings of the ACL workshop on building and using parallel texts. Association for Computational Linguistics, pp 107–110
Liu Y, Liu Q, Lin S (2010) Discriminative word alignment by linear modeling. Comput Linguist 36(3):303–339
Article Google Scholar
Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol 1, Association for Computational Linguistics, pp 160–167
Och FJ and Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: Proceedings of the 40th annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, pp 295–302

Download references

Acknowledgments

This paper is supported by the National Nature Science Foundation of China (No. 61262041, No. 61175068, No. 61163022), and the Open Fund of Software Engineering Key Laboratory of Yunnan Province (No. 2011SE14), and the key project of Yunnan Nature Science Foundation (No. 2013FA030), and the Ministry of Education of Returned Overseas Students to Start Research and Fund Projects.

Author information

Authors and Affiliations

School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500, China
Yuanyuan Mo, Jianyi Guo, Zhengtao Yu, Lin Luo & Shengxiang Gao
Key Lab of Intelligent Information Processing, Kunming University of Science and Technology, Kunming, 650500, China
Jianyi Guo & Zhengtao Yu

Authors

Yuanyuan Mo
View author publications
You can also search for this author in PubMed Google Scholar
Jianyi Guo
View author publications
You can also search for this author in PubMed Google Scholar
Zhengtao Yu
View author publications
You can also search for this author in PubMed Google Scholar
Lin Luo
View author publications
You can also search for this author in PubMed Google Scholar
Shengxiang Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhengtao Yu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mo, Y., Guo, J., Yu, Z. et al. A bilingual word alignment algorithm of Vietnamese-Chinese based on feature constraint. Int. J. Mach. Learn. & Cyber. 6, 537–543 (2015). https://doi.org/10.1007/s13042-014-0293-6

Download citation

Received: 02 January 2014
Accepted: 06 August 2014
Published: 26 August 2014
Issue Date: August 2015
DOI: https://doi.org/10.1007/s13042-014-0293-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A bilingual word alignment algorithm of Vietnamese-Chinese based on feature constraint

Abstract

Access this article

Similar content being viewed by others

Large-Language-Models (LLM)-Based AI Chatbots: Architecture, In-Depth Analysis and Their Performance Evaluation

The English Sublexical Toolkit: Methods for indexing sound–spelling consistency

Pre-trained models for natural language processing: A survey

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A bilingual word alignment algorithm of Vietnamese-Chinese based on feature constraint

Abstract

Access this article

Similar content being viewed by others

Large-Language-Models (LLM)-Based AI Chatbots: Architecture, In-Depth Analysis and Their Performance Evaluation

The English Sublexical Toolkit: Methods for indexing sound–spelling consistency

Pre-trained models for natural language processing: A survey

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation