Abstract
Vocabulary of natural language is an open set. So we cannot collect all words of a language. Therefore, arising unknown word (UKW) in statistical machine translation (SMT) is unavoidable. Named entity is the most common UKW. In this paper, we will present a new approach based on the meaning relationship in Chinese and Vietnamese to re-translate named entity UKW. Applying this approach to Chinese-Vietnamese SMT, experimental results show that our approach has significantly improved machines performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Tran, T.P., Dinh, D.: Dealing with affirmative-negative question in Chinese-Vietnamese statistical machine translation. Journal of Research, Development and Application on Information & Communication Technology 27, 140–150 (2012) (in Vietnamese)
Tran, T.P., Dinh, D.: Identifying and reodering prepositions in Chinese-Vietnamese machine translation. In: Conjunction with 9th IEEE-RIVF Conference on Computing and Communication Technologies, First International Workshop on Vietnamese Language and Speech Processing (VLSP), Vietnam (2012)
Tran, T.P., Dinh, D.: The issue of word boundary in Chinese-Vietnamese statistical machine translation. The Thirteen Scientific Meeting of Ho Chi Minh City University of Science (2012) (in Vietnamese)
Silva, J., Coheur, L., Costa, A., Trancoso, I.: Dealing with unknown words in statistical machine translation. In: Proceedings of the Eight International Conference on Language Resources and Evaluation, LREC 2012 (2012)
Eck, M., Vogel, S., Waibel, A.: Communicating Unknown words in machine translation. In: International Conference on Language Resources and Evaluation (2008)
Zhang, R., Sumita, E.: Chinese Unknown word Translation by Subword Resegmentation. In: International Joint Conference on Natural Language Processing (2008)
Chen, K.-J., Chen, C.-J.: Knowledge Extraction for Indentification of Chinese Organization Names. In: Second Chinese Language Processing Workshop, Hong Kong (2000)
Gao, J., Li, M., Huang, C.-N.: Improved Source-Channel Models for Chinese Word Segmentation. In: ACL 2003 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics (2003)
Wu, Y., Zhao, J., Xu, B.: Chinese Named Entity Recognition Combining a Statistical Model with Human Knowledge. In: MultiNER 2003 Proceedings of the ACL 2003 Workshop on Multilingual and Mixed-Language Named Entity Recognition, vol. 15 (2003)
Liu, H., Guo, D., Zhou, Q., Kenji, N., Sun, Q.: A pre-identification method for Chinese Named Entity Recognition (2010)
Dinh, D., Vu, T.: A maximum entropy approach for Vietnamese word segmentation. In: 2006 International Conference on Research, Innovation and Vision for the Future (2006)
Chinese names, http://www.chinesenames.org
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Tran, P., Dinh, D., Tran, L. (2014). Resolving Named Entity Unknown Word in Chinese-Vietnamese Machine Translation. In: Huynh, V., Denoeux, T., Tran, D., Le, A., Pham, S. (eds) Knowledge and Systems Engineering. Advances in Intelligent Systems and Computing, vol 245. Springer, Cham. https://doi.org/10.1007/978-3-319-02821-7_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-02821-7_25
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02820-0
Online ISBN: 978-3-319-02821-7
eBook Packages: EngineeringEngineering (R0)