Abstract
Most widely used word alignment models are based on word co-occurrence counts in parallel corpus. However, the data sparseness during training of the word alignment model makes word co-occurrence counts of Uyghur-Chinese parallel corpus cannot indicate associations between source and target words effectively. In this paper, we propose a Uyghur-Chinese word alignment method based on word co-occurrence degree to alleviate the data sparseness problem. Our approach combine the co-occurrence counts and the fuzzy co-occurrence weights as word co-occurrence degree, fuzzy co-occurrence weights can be obtained by searching for fuzzy co-occurrence word pairs and computing differences of length between current Uyghur word and other Uyghur words in fuzzy co-occurrence word pairs. Experiment shows that with the co-occurrence degree based word alignment model, the performance of Uyghur-Chinese word alignment result is outperform the baseline word alignment model, the quality of Uyghur-Chinese machine translation also improved.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Brown, P.E., Pietra, S.A.D., Pietra, V.J.D., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2), 263–311 (1993)
Kenji, Y., Kevin, K.: A syntax-based statistical translation model. In: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, pp. 523–530. Association for Computational Linguistics, Stroudsburg (2001)
David, C.: Hierarchical Phrase-Based Translation. Computational Linguistics 33(2), 201–228 (2007)
Gulila, A., Mijit, A.: Research on Uyghur Word Segmentation. Journal of Chinese Information Processing 18(6), 61–65 (2004)
Dempster, A., Laird, N., Rubin: Maximum-likelihood from incomplete data via the EM algorithm. Journal of The Royal Statistical Society, Series B 39(1), 1–38 (1977)
Vogel, S., Ney, H., Tillmann, C.: Hmm-based word alignment in statistical translation. In: Proceedings of the 16th Conference on Computational Linguistics, pp. 836–841. Association for Computational Linguistics (1996)
Yang, L., Qun, L., Shouxun, L.: Log-linear Models for Word Alignment. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Ann Arbor, USA, pp. 459–466. Association for Computational Linguistics (June 2005)
Percy, L., Dan, K., Michael, J.: Agreement-Based Learning. In: Proceedings of Advances in Neural Information Processing Systems (2008)
Jörg, T.: Combining clues for word alignment. In: Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics, vol. 1, pp. 339–346. Association for Computational Linguistics, Stroudsburg (2003)
Aykiz, K., Kaysar, K., Turgun, I.: Morphological Analysis of Uyghur Noun for Natural Language Information Processing. Journal of Chinese Information Processing 20(3), 43–48 (2006)
Philipp, K., Hieu, H., Alexandra, B., Chris, C.B., Marcello, F., Nicola, B., Brooke, C., Wade, S., Christine, M., Richard, Z., Chris, D., Ondrej, B., Alexandra, C., Evan, H.: Moses: Open Source Toolkit for Statistical Machine Translation. In: Proceedings of ACL, Demonstration Session, Prague, Czech Republic. Association for Computational Linguistics (2007)
Andreas, S.: SRILM – an extensible language modeling toolkit. In: Proceedings of ICSLP, vol. 2, pp. 901–904 (2002)
Kishore, P., Salim, R., Todd, W., Weijing, Z.: BLEU: a Method for Automatic Evaluation of Machine Translation. In: Proceedings of ACL, Philadelphia, USA, pp. 311–318 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Mi, C., Yang, Y., Zhou, X., Li, X., Osman, T. (2014). Co-occurrence Degree Based Word Alignment: A Case Study on Uyghur-Chinese. In: Sun, M., Liu, Y., Zhao, J. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2014 2014. Lecture Notes in Computer Science(), vol 8801. Springer, Cham. https://doi.org/10.1007/978-3-319-12277-9_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-12277-9_23
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12276-2
Online ISBN: 978-3-319-12277-9
eBook Packages: Computer ScienceComputer Science (R0)