ABSTRACT
In most languages, many words have multiple senses, thus machine translation systems have to choose between several candidates representing different senses of an input word. Although neural machine translation has recently become a dominant paradigm and achieved great progress, it still has to confront with the challenge of word sense disambiguation. Neural machine translation models are trained to identify the correct sense of a word as part of an end-to-end translation task, and their performances on word sense disambiguation are not satisfactory. This paper presents a case study of machine translation for Korean language. We have manually built a Korean lexical semantic network - UWordMap - as a large-scale lexical semantic knowledge-based in which each sense of every polysemous word is associated with a sense-code constituting a network node. Then, based on UWordMap, we determine the correct sense and tag the appropriated sense-code for polysemous words of the training corpus before training neural machine translation models. Experiments on translation from Korean to English and Vietnamese show that UWordMap can significantly improve quality of Korean neural machine translation systems in terms of BLEU and TER cores.
- Bentivogli, L., Bisazza, A., Cettolo, M., and Federico, M. 2016. Neural versus phrase-based machine translation quality: a case study. arXiv preprint arXiv:1608.04631. (Aug. 2016)Google Scholar
- Junczys-Dowmunt, M., Dwojak, T., and Hoang, H. 2016. Is neural machine translation ready for deployment? a case study on 30 translation directions. arXiv preprint arXiv:1610.01108. (Oct. 2016)Google Scholar
- Su, J., Xiong, D., Huang, S., Han, X., and Yao, J. 2015. Graph-Based Collective Lexical Selection for Statistical Machine Translation, In Proceedings of the Conference on Empirical Methods in Natural Language Processing (Lisbon, Portugal, Sept. 17-21, 2015). EMNLP 2015. ALC, NY, 1238--1247.Google ScholarCross Ref
- Neale, S., Gomes, L., Agirre, E., de Lacalle, O. L., and Branco, A. 2016. Word sense-aware machine translation: Including senses as contextual features for improved translation models. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (Portorož, Slovenia, May 23-28, 2016). LREC 2016. ELRA, Paris, 2777--2783.Google Scholar
- Vintar, Š., and Fišer, D. 2016. Using wordnet-based word sense disambiguation to improve MT performance. Hybrid Approaches to Machine Translation. Springer, Cham, 191--205.Google Scholar
- KIM, H. 2006. Korean national corpus in the 21st century Sejong project. In Proceedings of the 13th NIJL International Symposium. Tokyo: NIJL, 49--54.Google Scholar
- Shin, J. C. and Ock, C. Y. 2014. Korean Homograph Tagging model based on Sub-Word Conditional Probability. KIPS: Software and Data Engineering. 3, 10 (Oct. 2014), 407--420.Google Scholar
- Kang, M. Y., Kim, B., and Lee, J. S. 2017. Word Sense Disambiguation Using Embedded Word Space. Computing Science and Engineering. 11, 1 (Mar. 2017), 32--38.Google Scholar
- Min, J., Jeon, J. W., Song, K. H., and Kim, Y. S. 2017. A Study on Word Sense Disambiguation Using Bidirectional Recurrent Neural Network for Korean Language. The Korea Society of Computer and Information. 22, 4 (Apr. 2017), 41--49.Google Scholar
- Shin, J. C. and Ock, C. Y. 2016. Improvement of Korean Homograph Disambiguation using Korean Lexical Semantic Network (UWordMap). Journal of KIISE. 43, 1 (2016), 71--79.Google ScholarCross Ref
- Cho, J. G. and Shin, K. C. 2014. A Graph-based Word Sense Disambiguation Using Measures of Graph Connectivity. Journal of KIIT. 12, 6 (Jun. 2014), 143--151.Google Scholar
- Bae, Y. J. and Ock, C. Y. 2014. Introduction to the Korean Word Map (UWordMap) and API. In Proceedings of 26th Annual Conference on Human and Language Technology (Gangwon, Korea, Oct. 10-11, 2014). 27--31.Google Scholar
- Sutskever, I., Vinyals, O., and Le, Q. V. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. 3104--3112. Google ScholarDigital Library
- Cho, K. et al. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv: 1406.1078. (Sep. 2014)Google Scholar
- Klein, G. et al. 2017. OpenNMT: Open-Source Toolkit for Neural Machine Translation. arXiv preprint arXiv: 1701.02810. (Jan. 2017).Google Scholar
- Bahdanau, D., Cho, K., and Bengio, Y. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the third International Conference on Learning Representations (San Diego, CA, May 7-9, 2015).Google Scholar
- Papineni, K., Roukos, S., Ward, T., and Zhu, W. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on Association for Computational Linguistics (Philadelphia, Pennsylvania, July 07-12, 2002). ACL '02. ACL, PA, USA, 311--318. Google ScholarDigital Library
- Snover, M. et al. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of association for machine translation in the Americas. (Massachusetts, USA, Aug. 8-12, 2006). 223--231.Google Scholar
Index Terms
- Neural Machine Translation Enhancements through Lexical Semantic Network
Recommendations
Explicitly Modeling Word Translations in Neural Machine Translation
In this article, we show that word translations can be explicitly incorporated into NMT effectively to avoid wrong translations. Specifically, we propose three cross-lingual encoders to explicitly incorporate word translations into NMT: (1) Factored ...
Word Sense Based Hindi-Tamil Statistical Machine Translation
Corpus based natural language processing has emerged with great success in recent years. It is not only used for languages like English, French, Spanish, and Hindi but also is widely used for languages like Tamil, Telugu etc. This paper focuses to ...
Statistical machine translation of Indian languages: a survey
AbstractIn this study, performance analysis of a state-of-art phrase-based statistical machine translation (SMT) system is presented on eight Indian languages. State of the art in SMT on different Indian languages to English language has also been ...
Comments