note

Improving Chinese-Vietnamese Neural Machine Translation with Linguistic Differences

Authors:
Zhiqiang Yu

Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Yunnan Minzu University, Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming, Yunnan, China

Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Yunnan Minzu University, Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming, Yunnan, China
View Profile

,
Zhengtao Yu

Faculty of Information Engineering and Automation, Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming, Yunnan, China

Faculty of Information Engineering and Automation, Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming, Yunnan, China
View Profile

,
Yantuan Xian

Faculty of Information Engineering and Automation, Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming, Yunnan, China

Faculty of Information Engineering and Automation, Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming, Yunnan, China
View Profile

,
Yuxin Huang

Faculty of Information Engineering and Automation, Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming, Yunnan, China

Faculty of Information Engineering and Automation, Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming, Yunnan, China
View Profile

,
Junjun Guo

Faculty of Information Engineering and Automation, Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming, Yunnan, China

Faculty of Information Engineering and Automation, Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming, Yunnan, China
View Profile

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 21 Issue 2Article No.: 22pp 1–12https://doi.org/10.1145/3477536

Published:25 March 2022Publication History

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

We present a simple, efficient data augmentation approach for boosting Chinese-Vietnamese neural machine translation performance by leveraging the linguistic difference between the two languages. We first define the formalized representation of modifier symmetry, which is one of the most representative linguistic differences between Chinese and Vietnamese. We then propose and test two data augmentation strategies for leveraging the linguistic difference, which can be integrated naturally with different translation models. Results indicate that both strategies can introduce linguistic rules to boost translation accuracy. Tests on Chinese-Vietnamese benchmarks show significant accuracy improvements. To facilitate studies in this domain, we also release an open-source toolkit¹ with flexible implementation for Chinese-Vietnamese linguistic difference tagging.

REFERENCES

[1] Sutskever I., Vinyals O., and Le Q. V.. 2014. Sequence to sequence learning with neural networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems -Volume 2 (NIPS'14). MIT Press, Cambridge, MA, USA, 3104–3112.Google ScholarDigital Library
[2] Bahdanau D., Cho K., and Bengio Y.. 2015. Neural machine translation by jointly learning to align and translate. arXiv.2014. 1409.0473.Google Scholar
[3] Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A. N., Kaiser Ł., and Polosukhin I.. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30, 5998–6008.Google Scholar
[4] Koehn P. and Knowles R.. 2017. Six challenges for neural machine translation. In Proceedings of the 1st Workshop on Neural Machine Translation. Vancouver, Canada, 2017, 28–39.Google ScholarCross Ref
[5] Tan Z., Wang S., Yang Z., Chen G., Huang X., Sun M., and Liu Y.. 2021. Neural machine translation: A review of methods, resources, and tools. arXiv:2012.15515.Google Scholar
[6] Yu Z., Yu Z., Guo J., Huang Y., and Wen Y.. 2019. Efficient low-resource neural machine translation with reread and feedback mechanism. ACM Transactions on Asian and Low-Resource Language Information Processing 19, 3, Article 34 (2019), 13 pages. Google ScholarDigital Library
[7] Xia M., Kong X., Anastasopoulos A., and Neubig G.. 2019. Generalized data augmentation for low-resource translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy, July 28–August 2, 2019, 5786–5796.Google ScholarCross Ref
[8] Burlot F. and Yvon F.. 2019. Using monolingual data in neural machine translation: A systematic study. arxiv:cs.CL/1903.11437.Google Scholar
[9] Song K., Zhang Y., Yu H., Luo W., Wang K., and Zhang M.. 2019. Code-Switching for Enhancing NMT with Pre-Specified Translation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota. Association for Computational Linguistics, 449–459.Google Scholar
[10] Georgiana D., Prashant M., Marcello F., and Yaser A.. 2019. Training neural machine translation to apply terminology constraints. arxiv:cs.CL/1906.01105.Google Scholar
[11] Gao S., Huang J., Xue M., Yu Z., Wang Z., and Zhang Y.. 2019. Syntax-based chinese-vietnamese tree-to-tree statistical machine translation with bilingual features. ACM Transactions on Asian and Low-Resource Language Information Processing 18, 4, 36 (2019), 20. DOI: Google ScholarDigital Library
[12] Che W., Yu Z., Yu Z., Wen Y., and Guo J.. 2020. Towards integrated classification lexicon for handling unknown words in Chinese-Vietnamese neural machine translation. ACM Transactions on Asian and Low-Resource Language Information Processing 19, 3 (2020), 42 (2020), 17 pages. DOI: Google ScholarDigital Library
[13] Huu A. T., Huang H., and Shi S.. 2018. Integrating pronunciation into Chinese-Vietnamese statistical machine translation. Tsinghua Science and Technology 23, 6 (2018), 715–723.Google Scholar
[14] Huu A. T., Huang H., and Shi S.. 2019. Preordering for Chinese-Vietnam statistical machine translation. IEICE Transactions on Information and Systems E102-D, 2, 375–382.Google Scholar
[15] Huu A. T., Tran P., Dinh D., Vu V. V., and Le T.. 2018. Dependency-based pre-ordering of preposition phrases in Chinese-Vietnamese machine translation. ICIC Express Letters, Part B: Applications 9 (2018), 265–272.Google Scholar
[16] He J., Yu Z., Lv C., Lai H., Gao S., and Zhang Y.. 2017. Language post positioned characteristic based Chinese-Vietnamese statistical machine translation method. In Proceedings of the International Conference on Asian Language Processing (IALP’17), Singapore. IEEE, 2017.Google Scholar
[17] Song K., Zhang Y., and Yu H.. 2019. Code-switching for enhancing NMT with pre-specified translation[C]. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL’19). Minneapolis, Minnesota, 2019.Google Scholar
[18] Dinu G., Mathur P., Federico M., and Al-Onaizan Y.. Training neural machine translation to apply terminology constraints. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy. Association for Computational Linguistics, 3063–3068.Google Scholar
[19] Sennrich R. and Zhang B.. 2019. Revisiting low-resource neural machine translation: A case study[C]. In 57th Annual Meeting of the Association for Computational Linguistics (ACL’19). Florence, Italy. Association for Computational Linguistics, 211--221.Google ScholarCross Ref
[20] Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A. N., Kaiser L., and Polosukhin I.. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. Guyon I., Luxburg U. V., Bengio S., Wallach H., Fergus R., Vishwanathan S., and Garnett R. (Eds.). 30. 5998–6008.Google Scholar
[21] Kingma D. P. and Ba J.. 2015. Adam: A method for stochastic optimization. arXiv:1412.6980V5.Google Scholar
[22] Papineni K., Roukos S., Ward T., and Zhu W.. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics, 311--318.Google Scholar
[23] Philipp K.. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing.Google Scholar
[24] Isozaki H., Hirao T., and Duh K.. 2010. Automatic evaluation of translation quality for distant language pairs. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing Cambridge, MA. Association for Computational Linguistics, 944--952.Google Scholar
[25] Snover M., Madnani N., Dorr B. J., and Schwartz R.. 2009. Fluency, adequacy, or HTER?: Exploring different human judgments with a tunable MT metric. In Proceedings of the 4th Workshop on Statistical Machine Translation (StatMT’09). Association for Computational Linguistics, Stroudsburg, PA, 259–268. http://dl.acm.org/citation.cfm?id=1626431.1626480.Google ScholarCross Ref

Index Terms

Improving Chinese-Vietnamese Neural Machine Translation with Linguistic Differences
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Machine translation

Recommendations

Towards Integrated Classification Lexicon for Handling Unknown Words in Chinese-Vietnamese Neural Machine Translation

In Neural Machine Translation (NMT), due to the limitations of the vocabulary, unknown words cannot be translated properly, which brings suboptimal performance of the translation system. For resource-scarce NMT that have small-scale training corpus, the ...
Read More
Syntax-Based Chinese-Vietnamese Tree-to-Tree Statistical Machine Translation with Bilingual Features

Because of the scarcity of bilingual corpora, current Chinese--Vietnamese machine translation is far from satisfactory. Considering the differences between Chinese and Vietnamese, we investigate whether linguistic differences can be used to supervise ...
Read More
Neural Machine Translation Enhancements through Lexical Semantic Network
ICCMS '18: Proceedings of the 10th International Conference on Computer Modeling and Simulation

In most languages, many words have multiple senses, thus machine translation systems have to choose between several candidates representing different senses of an input word. Although neural machine translation has recently become a dominant paradigm ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Asian and Low-Resource Language Information Processing Volume 21, Issue 2
March 2022
413 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/3494070
Editor:
Imed Zitouni
Google, USA
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 March 2022
- Revised: 1 July 2021
- Accepted: 1 July 2021
- Received: 1 April 2021
Published in tallip Volume 21, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Neural machine translation
Chinese-Vietnamese
linguistic difference
data augmentation
Qualifiers
- note
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 227
  Total Downloads
- Downloads (Last 12 months)59
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

Improving Chinese-Vietnamese Neural Machine Translation with Linguistic Differences

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Towards Integrated Classification Lexicon for Handling Unknown Words in Chinese-Vietnamese Neural Machine Translation

Syntax-Based Chinese-Vietnamese Tree-to-Tree Statistical Machine Translation with Bilingual Features

Neural Machine Translation Enhancements through Lexical Semantic Network

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Full Text

HTML Format

Caption

Improving Chinese-Vietnamese Neural Machine Translation with Linguistic Differences

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Towards Integrated Classification Lexicon for Handling Unknown Words in Chinese-Vietnamese Neural Machine Translation

Syntax-Based Chinese-Vietnamese Tree-to-Tree Statistical Machine Translation with Bilingual Features

Neural Machine Translation Enhancements through Lexical Semantic Network

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Full Text

HTML Format

Share this Publication link

Share on Social Media