Co-occurrence Degree Based Word Alignment: A Case Study on Uyghur-Chinese

Mi, Chenggang; Yang, Yating; Zhou, Xi; Li, Xiao; Osman, Turghun

doi:10.1007/978-3-319-12277-9_23

Chenggang Mi^21,22,
Yating Yang²¹,
Xi Zhou²¹,
Xiao Li²¹ &
…
Turghun Osman²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8801))

Included in the following conference series:

1610 Accesses

Abstract

Most widely used word alignment models are based on word co-occurrence counts in parallel corpus. However, the data sparseness during training of the word alignment model makes word co-occurrence counts of Uyghur-Chinese parallel corpus cannot indicate associations between source and target words effectively. In this paper, we propose a Uyghur-Chinese word alignment method based on word co-occurrence degree to alleviate the data sparseness problem. Our approach combine the co-occurrence counts and the fuzzy co-occurrence weights as word co-occurrence degree, fuzzy co-occurrence weights can be obtained by searching for fuzzy co-occurrence word pairs and computing differences of length between current Uyghur word and other Uyghur words in fuzzy co-occurrence word pairs. Experiment shows that with the co-occurrence degree based word alignment model, the performance of Uyghur-Chinese word alignment result is outperform the baseline word alignment model, the quality of Uyghur-Chinese machine translation also improved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Maximum Likelihood Estimation for Bangla–Odia Word Alignment

Chinese-Vietnamese Word Alignment Method Based on Bidirectional RNN and Linguistic Features

Chinese Word Similarity Computing Based on Combination Strategy

References

Brown, P.E., Pietra, S.A.D., Pietra, V.J.D., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2), 263–311 (1993)
Google Scholar
Kenji, Y., Kevin, K.: A syntax-based statistical translation model. In: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, pp. 523–530. Association for Computational Linguistics, Stroudsburg (2001)
Google Scholar
David, C.: Hierarchical Phrase-Based Translation. Computational Linguistics 33(2), 201–228 (2007)
Article MATH Google Scholar
Gulila, A., Mijit, A.: Research on Uyghur Word Segmentation. Journal of Chinese Information Processing 18(6), 61–65 (2004)
Google Scholar
Dempster, A., Laird, N., Rubin: Maximum-likelihood from incomplete data via the EM algorithm. Journal of The Royal Statistical Society, Series B 39(1), 1–38 (1977)
MathSciNet MATH Google Scholar
Vogel, S., Ney, H., Tillmann, C.: Hmm-based word alignment in statistical translation. In: Proceedings of the 16th Conference on Computational Linguistics, pp. 836–841. Association for Computational Linguistics (1996)
Google Scholar
Yang, L., Qun, L., Shouxun, L.: Log-linear Models for Word Alignment. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Ann Arbor, USA, pp. 459–466. Association for Computational Linguistics (June 2005)
Google Scholar
Percy, L., Dan, K., Michael, J.: Agreement-Based Learning. In: Proceedings of Advances in Neural Information Processing Systems (2008)
Google Scholar
Jörg, T.: Combining clues for word alignment. In: Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics, vol. 1, pp. 339–346. Association for Computational Linguistics, Stroudsburg (2003)
Google Scholar
Aykiz, K., Kaysar, K., Turgun, I.: Morphological Analysis of Uyghur Noun for Natural Language Information Processing. Journal of Chinese Information Processing 20(3), 43–48 (2006)
Google Scholar
Philipp, K., Hieu, H., Alexandra, B., Chris, C.B., Marcello, F., Nicola, B., Brooke, C., Wade, S., Christine, M., Richard, Z., Chris, D., Ondrej, B., Alexandra, C., Evan, H.: Moses: Open Source Toolkit for Statistical Machine Translation. In: Proceedings of ACL, Demonstration Session, Prague, Czech Republic. Association for Computational Linguistics (2007)
Google Scholar
Andreas, S.: SRILM – an extensible language modeling toolkit. In: Proceedings of ICSLP, vol. 2, pp. 901–904 (2002)
Google Scholar
Kishore, P., Salim, R., Todd, W., Weijing, Z.: BLEU: a Method for Automatic Evaluation of Machine Translation. In: Proceedings of ACL, Philadelphia, USA, pp. 311–318 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Xinjiang Technical Institute of Physics & Chemistry of Chinese Academy of Sciences, Urumqi, Xinjiang, 830011, China
Chenggang Mi, Yating Yang, Xi Zhou, Xiao Li & Turghun Osman
University of Chinese Academy of Sciences, Beijing, 100049, China
Chenggang Mi

Authors

Chenggang Mi
View author publications
You can also search for this author in PubMed Google Scholar
Yating Yang
View author publications
You can also search for this author in PubMed Google Scholar
Xi Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Li
View author publications
You can also search for this author in PubMed Google Scholar
Turghun Osman
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Haidian District, 100084, Beijing, China
Maosong Sun & Yang Liu &
Chinese Academy of Sciences, Institute of Automation, 100190, Beijing, China
Jun Zhao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mi, C., Yang, Y., Zhou, X., Li, X., Osman, T. (2014). Co-occurrence Degree Based Word Alignment: A Case Study on Uyghur-Chinese. In: Sun, M., Liu, Y., Zhao, J. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2014 2014. Lecture Notes in Computer Science(), vol 8801. Springer, Cham. https://doi.org/10.1007/978-3-319-12277-9_23

Download citation

DOI: https://doi.org/10.1007/978-3-319-12277-9_23
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12276-2
Online ISBN: 978-3-319-12277-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics