research-article

Neural Machine Translation Enhancements through Lexical Semantic Network

Authors:
Quang-Phuoc Nguyen

Department of IT Convergence, University of Ulsan, Ulsan, Republic of Korea

Department of IT Convergence, University of Ulsan, Ulsan, Republic of Korea
View Profile

,
Anh-Dung Vo

Department of IT Convergence, University of Ulsan, Ulsan, Republic of Korea

Department of IT Convergence, University of Ulsan, Ulsan, Republic of Korea
View Profile

,
Joon-Choul Shin

Department of IT Convergence, University of Ulsan, Ulsan, Republic of Korea

Department of IT Convergence, University of Ulsan, Ulsan, Republic of Korea
View Profile

,
Cheol-Young Ock

Department of IT Convergence, University of Ulsan, Ulsan, Republic of Korea

Department of IT Convergence, University of Ulsan, Ulsan, Republic of Korea
View Profile

ICCMS '18: Proceedings of the 10th International Conference on Computer Modeling and SimulationJanuary 2018Pages 105–109https://doi.org/10.1145/3177457.3177461

Published:08 January 2018Publication History

ICCMS '18: Proceedings of the 10th International Conference on Computer Modeling and Simulation

Pages 105–109

ABSTRACT

In most languages, many words have multiple senses, thus machine translation systems have to choose between several candidates representing different senses of an input word. Although neural machine translation has recently become a dominant paradigm and achieved great progress, it still has to confront with the challenge of word sense disambiguation. Neural machine translation models are trained to identify the correct sense of a word as part of an end-to-end translation task, and their performances on word sense disambiguation are not satisfactory. This paper presents a case study of machine translation for Korean language. We have manually built a Korean lexical semantic network - UWordMap - as a large-scale lexical semantic knowledge-based in which each sense of every polysemous word is associated with a sense-code constituting a network node. Then, based on UWordMap, we determine the correct sense and tag the appropriated sense-code for polysemous words of the training corpus before training neural machine translation models. Experiments on translation from Korean to English and Vietnamese show that UWordMap can significantly improve quality of Korean neural machine translation systems in terms of BLEU and TER cores.

References

Bentivogli, L., Bisazza, A., Cettolo, M., and Federico, M. 2016. Neural versus phrase-based machine translation quality: a case study. arXiv preprint arXiv:1608.04631. (Aug. 2016)Google Scholar
Junczys-Dowmunt, M., Dwojak, T., and Hoang, H. 2016. Is neural machine translation ready for deployment? a case study on 30 translation directions. arXiv preprint arXiv:1610.01108. (Oct. 2016)Google Scholar
Su, J., Xiong, D., Huang, S., Han, X., and Yao, J. 2015. Graph-Based Collective Lexical Selection for Statistical Machine Translation, In Proceedings of the Conference on Empirical Methods in Natural Language Processing (Lisbon, Portugal, Sept. 17-21, 2015). EMNLP 2015. ALC, NY, 1238--1247.Google ScholarCross Ref
Neale, S., Gomes, L., Agirre, E., de Lacalle, O. L., and Branco, A. 2016. Word sense-aware machine translation: Including senses as contextual features for improved translation models. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (Portorož, Slovenia, May 23-28, 2016). LREC 2016. ELRA, Paris, 2777--2783.Google Scholar
Vintar, Š., and Fišer, D. 2016. Using wordnet-based word sense disambiguation to improve MT performance. Hybrid Approaches to Machine Translation. Springer, Cham, 191--205.Google Scholar
KIM, H. 2006. Korean national corpus in the 21st century Sejong project. In Proceedings of the 13th NIJL International Symposium. Tokyo: NIJL, 49--54.Google Scholar
Shin, J. C. and Ock, C. Y. 2014. Korean Homograph Tagging model based on Sub-Word Conditional Probability. KIPS: Software and Data Engineering. 3, 10 (Oct. 2014), 407--420.Google Scholar
Kang, M. Y., Kim, B., and Lee, J. S. 2017. Word Sense Disambiguation Using Embedded Word Space. Computing Science and Engineering. 11, 1 (Mar. 2017), 32--38.Google Scholar
Min, J., Jeon, J. W., Song, K. H., and Kim, Y. S. 2017. A Study on Word Sense Disambiguation Using Bidirectional Recurrent Neural Network for Korean Language. The Korea Society of Computer and Information. 22, 4 (Apr. 2017), 41--49.Google Scholar
Shin, J. C. and Ock, C. Y. 2016. Improvement of Korean Homograph Disambiguation using Korean Lexical Semantic Network (UWordMap). Journal of KIISE. 43, 1 (2016), 71--79.Google ScholarCross Ref
Cho, J. G. and Shin, K. C. 2014. A Graph-based Word Sense Disambiguation Using Measures of Graph Connectivity. Journal of KIIT. 12, 6 (Jun. 2014), 143--151.Google Scholar
Bae, Y. J. and Ock, C. Y. 2014. Introduction to the Korean Word Map (UWordMap) and API. In Proceedings of 26th Annual Conference on Human and Language Technology (Gangwon, Korea, Oct. 10-11, 2014). 27--31.Google Scholar
Sutskever, I., Vinyals, O., and Le, Q. V. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. 3104--3112. Google ScholarDigital Library
Cho, K. et al. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv: 1406.1078. (Sep. 2014)Google Scholar
Klein, G. et al. 2017. OpenNMT: Open-Source Toolkit for Neural Machine Translation. arXiv preprint arXiv: 1701.02810. (Jan. 2017).Google Scholar
Bahdanau, D., Cho, K., and Bengio, Y. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the third International Conference on Learning Representations (San Diego, CA, May 7-9, 2015).Google Scholar
Papineni, K., Roukos, S., Ward, T., and Zhu, W. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on Association for Computational Linguistics (Philadelphia, Pennsylvania, July 07-12, 2002). ACL '02. ACL, PA, USA, 311--318. Google ScholarDigital Library
Snover, M. et al. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of association for machine translation in the Americas. (Massachusetts, USA, Aug. 8-12, 2006). 223--231.Google Scholar

Index Terms

Neural Machine Translation Enhancements through Lexical Semantic Network
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Lexical semantics
      2. Machine translation

Recommendations

Explicitly Modeling Word Translations in Neural Machine Translation

In this article, we show that word translations can be explicitly incorporated into NMT effectively to avoid wrong translations. Specifically, we propose three cross-lingual encoders to explicitly incorporate word translations into NMT: (1) Factored ...
Read More
Word Sense Based Hindi-Tamil Statistical Machine Translation

Corpus based natural language processing has emerged with great success in recent years. It is not only used for languages like English, French, Spanish, and Hindi but also is widely used for languages like Tamil, Telugu etc. This paper focuses to ...
Read More
Statistical machine translation of Indian languages: a survey
Abstract
In this study, performance analysis of a state-of-art phrase-based statistical machine translation (SMT) system is presented on eight Indian languages. State of the art in SMT on different Indian languages to English language has also been ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICCMS '18: Proceedings of the 10th International Conference on Computer Modeling and Simulation
January 2018
310 pages
ISBN:9781450363396
DOI:10.1145/3177457

Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 January 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Lexical semantic network
neural machine translation
parallel corpus
word sense disambiguation
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 123
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Neural Machine Translation Enhancements through Lexical Semantic Network

ICCMS '18: Proceedings of the 10th International Conference on Computer Modeling and Simulation

ABSTRACT

References

Cited By

Index Terms

Recommendations

Explicitly Modeling Word Translations in Neural Machine Translation

Word Sense Based Hindi-Tamil Statistical Machine Translation

Statistical machine translation of Indian languages: a survey

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Neural Machine Translation Enhancements through Lexical Semantic Network

ICCMS '18: Proceedings of the 10th International Conference on Computer Modeling and Simulation

ABSTRACT

References

Cited By

Index Terms

Recommendations

Explicitly Modeling Word Translations in Neural Machine Translation

Word Sense Based Hindi-Tamil Statistical Machine Translation

Statistical machine translation of Indian languages: a survey

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media