research-article

Back Deduction Based Testing for Word Sense Disambiguation Ability of Machine Translation Systems

Authors:

Xiaofang Zhang,

Yuming ZhouAuthors Info & Claims

ISSTA 2023: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis

Pages 601 - 613

https://doi.org/10.1145/3597926.3598081

Published: 13 July 2023 Publication History

Abstract

Machine translation systems have penetrated our daily lives, providing translation services from source language to target language to millions of users online daily. Word Sense Disambiguation (WSD) is one of the essential functional requirements of machine translation systems, which aims to determine the exact sense of polysemes in the given context. Commercial machine translation systems (e.g., Google Translate) have been shown to fail in identifying the proper sense and consequently cause translation errors. However, to our knowledge, no prior studies focus on testing such WSD bugs for machine translation systems.

To tackle this challenge, we propose a novel testing method Back Deduction based Testing for Word Sense Disambiguation (BDTD). Our method’s main idea is to obtain the hidden senses of source words via back deduction from the target language, i.e., employ translation words in the target language to deduce senses of original words identified in the translation procedure. To evaluate BDTD, we conduct an extensive empirical study with millions of sentences under three popular translators, including Google Translate and Bing Microsoft Translator. The experimental results indicate that BDTD can identify a considerable number of WSD bugs with high accuracy, more than 80%, under all three translators.

References

[1]

Muhammad Hilmi Asyrofi, Imam Nur Bani Yusuf, Hong Jin Kang, Ferdian Thung, Zhou Yang, and David Lo. 2021. BiasFinder: Metamorphic Test Generation to Uncover Bias for Sentiment Analysis Systems. CoRR abs/2102. 01859 ( 2021 ). arXiv:2102. 01859 https://arxiv.org/abs/2102.01859

[2]

Loïc Barrault, Ondřej Bojar, Marta R. Costa-jussà, Christian Federmann, Mark Fishel, Yvete Graham, Barry Haddow, Mathias Huck, Philipp Koehn, Shervin Malmasi, Christof Monz, Mathias Müller, Santanu Pal, Mat Post, and Marcos Zampieri. 2019. Findings of the 2019 Conference on Machine Translation (WMT19). In Proceedings of the Fourth Conference on Machine Translation (Volume 2 : Shared Task Papers, Day 1 ). Association for Computational Linguistics, Florence, Italy, 1-61. https://doi.org/10.18653/v1/ W19-5301

[3]

Michele Bevilacqua, Tommaso Pasini, Alessandro Raganato, Roberto Navigli, et al. 2021. Recent trends in word sense disambiguation: A survey. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21. International Joint Conference on Artificial Intelligence, Inc.

[4]

Tolga Bolukbasi, Kai-Wei Chang, James Y. Zou, Venkatesh Saligrama, and Adam Tauman Kalai. 2016. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnet (Eds.). 4349-4357. https://proceedings.neurips.cc/paper/2016/hash/ a486cd07e4ac3d270571622f4f316ec5-Abstract.html

[5]

Niccolò Campolungo, Federico Martelli, Francesco Saina, and Roberto Navigli. 2022. DiBiMT: A Novel Benchmark for Measuring Word Sense Disambiguation Biases in Machine Translation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1 : Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguistics, 4331-4352. https://doi.org/10.18653/v1/ 2022. acl-long.298

[6]

CWMT. 2018. CWMT Dataset. http://nlp.nju.edu.cn/cwmt-wmt/

[7]

Jixiang Deng, Yong Deng, and Kang Hao Cheong. 2021. Combining conflicting evidence based on Pearson correlation coeficient and weighted graph. International Journal of Intelligent Systems 36, 12 ( 2021 ), 7443-7460.

Digital Library

[8]

Chris Dyer, Victor Chahuneau, and Noah A. Smith. 2013. A Simple, Fast, and Efective Reparameterization of IBM Model 2. In Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, June 9-14, 2013, Westin Peachtree Plaza Hotel, Atlanta, Georgia, USA, Lucy Vanderwende, Hal Daumé III, and Katrin Kirchhof (Eds.). hTe Association for Computational Linguistics, 644-648. https://aclanthology. org/N13-1073/

[9]

Denis Emelin, Ivan Titov, and Rico Sennrich. 2020. Detecting Word Sense Disambiguation Biases in Machine Translation for Model-Agnostic Adversarial Attacks. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 7635-7653. https://doi.org/10.18653/v1/ 2020.emnlp-main. 616

[10]

Hugging Face. 2020. opus-mt-en-zh. https://huggingface.co/Helsinki-NLP/ opus-mt-en-zh

[11]

Joel Escudé Font and Marta R. Costa-jussà. 2019. Equalizing Gender Biases in Neural Machine Translation with Word Embeddings Techniques. CoRR abs/ 1901.03116 ( 2019 ). arXiv: 1901.03116 http://arxiv.org/abs/ 1901.03116

[12]

Google. 2022. Google Translate. Retrieved March 25-March 30, 2022 from http://translate.google.com/

[13]

Barry Haddow, Rachel Bawden, Antonio Valerio Miceli Barone, Jindřich Helcl, and Alexandra Birch. 2022. Survey of low-resource machine translation. Computational Linguistics 48, 3 ( 2022 ), 673-732.

[14]

Pinjia He, Clara Meister, and Zhendong Su. 2020. Structure-invariant testing for machine translation. In ICSE '20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June-19 July, 2020, Gregg Rothermel and Doo-Hwan Bae (Eds.). ACM, 961-973. https://doi.org/10.1145/3377811.3380339

Digital Library

[15]

Mathew Honnibal and Ines Montani. 2017. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. To appear ( 2017 ).

[16]

Pierre Lison, Jörg Tiedemann, and Milen Kouylekov. 2018. OpenSubtitles2018: Statistical Rescoring of Sentence Alignments in Large, Noisy Parallel Corpora. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7-12, 2018, Nicoleta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Kôiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, Stelios Piperidis, and Takenobu Tokunaga (Eds.). European Language Resources Association (ELRA). http://www.lrec-conf.org/ proceedings/lrec2018/summaries/294.html

[17]

Adam Lopez. 2008. Statistical machine translation. ACM Comput. Surv. 40, 3 ( 2008 ), 8 : 1-8 : 49. https://doi.org/10.1145/1380584.1380586

Digital Library

[18]

Microsoft. 2022. Bing Microsoft Translator. Retrieved March 26-April 1, 2022 from https://www.bing.com/translator

[19]

Roberto Navigli and Simone Paolo Ponzeto. 2010. BabelNet: Building a Very Large Multilingual Semantic Network. In ACL 2010, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, July 11-16, 2010, Uppsala, Sweden, Jan Hajic, Sandra Carberry, and Stephen Clark (Eds.). The Association for Computer Linguistics, 216-225. https://aclanthology.org/P10-1023/

[20]

Marcelo O. R. Prates, Pedro H. C. Avelar, and Luís C. Lamb. 2020. Assessing gender bias in machine translation: a case study with Google Translate. Neural Comput. Appl. 32, 10 ( 2020 ), 6363-6381. https://doi.org/10.1007/s00521-019-04144-6

Digital Library

[21]

Alessandro Raganato, Yves Scherrer, and Jörg Tiedemann. 2019. The MuCoW Test Suite at WMT 2019: Automatically Harvested Multilingual Contrastive Word Sense Disambiguation Test Sets for Machine Translation. In Proceedings of the Fourth Conference on Machine Translation, WMT 2019, Florence, Italy, August 1-2, 2019-Volume 2 : Shared Task Papers, Day 1, Ondrej Bojar, Rajen Chaterjee, Christian Federmann, Mark Fishel, Yvete Graham, Barry Haddow, Mathias Huck, Antonio Jimeno-Yepes, Philipp Koehn, André Martins, Christof Monz, Mateo Negri, Aurélie Névéol, Mariana L. Neves, Mat Post, Marco Turchi, and Karin Verspoor (Eds.). Association for Computational Linguistics, 470-480. https://doi.org/10.18653/v1/w19-5354

[22]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.). Association for Computational Linguistics, 3980-3990. https: //doi.org/10.18653/v1/ D19-1410

[23]

Rachel Rudinger, Chandler May, and Benjamin Van Durme. 2017. Social Bias in Elicited Natural Language Inferences. In Proceedings of the First ACL Workshop on Ethics in Natural Language Processing, EthNLP@EACL, Valencia, Spain, April 4, 2017, Dirk Hovy, Shannon L. Spruit, Margaret Mitchell, Emily M. Bender, Michael Strube, and Hanna M. Wallach (Eds.). Association for Computational Linguistics, 74-79. https://doi.org/10.18653/v1/w17-1609

[24]

Danielle Saunders and Bill Byrne. 2020. Reducing Gender Bias in Neural Machine Translation as a Domain Adaptation Problem. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault (Eds.). Association for Computational Linguistics, 7724-7736. https: //doi.org/10.18653/v1/ 2020.acl-main. 690

[25]

Beatrice Savoldi, Marco Gaido, Luisa Bentivogli, Mateo Negri, and Marco Turchi. 2021. Gender Bias in Machine Translation. CoRR abs/2104.06001 ( 2021 ). arXiv: 2104.06001 https://arxiv.org/abs/2104.06001

[26]

Qingchao Shen, Haoyang Ma, Junjie Chen, Yongqiang Tian, Shing-Chi Cheung, and Xiang Chen. 2021. A comprehensive study of deep learning compiler bugs. In Proceedings of the 29th ACM Joint meeting on european software engineering conference and symposium on the foundations of software engineering. 968-980.

Digital Library

[27]

Felix Stahlberg. 2020. Neural machine translation: A review. Journal of Artificial Intelligence Research 69 ( 2020 ), 343-418.

[28]

Gabriel Stanovsky, Noah A. Smith, and Luke Zetlemoyer. 2019. Evaluating Gender Bias in Machine Translation. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28-August 2, 2019, Volume 1 : Long Papers, Anna Korhonen, David R. Traum, and Lluís Màrquez (Eds.). Association for Computational Linguistics, 1679-1684. https://doi.org/10.18653/v1/p19-1164

[29]

Zeyu Sun, Jie M. Zhang, Mark Harman, Mike Papadakis, and Lu Zhang. 2020. Automatic testing and improvement of machine translation. In ICSE '20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June-19 July, 2020, Gregg Rothermel and Doo-Hwan Bae (Eds.). ACM, 974-985. https: //doi.org/10.1145/3377811.3380420

Digital Library

[30]

Zeyu Sun, Jie M Zhang, Yingfei Xiong, Mark Harman, Mike Papadakis, and Lu Zhang. 2022. Improving machine translation systems via isotopic replacement. In Proceedings of the 2022 International Conference on Software Engineering, ICSE.

Digital Library

[31]

Liang Tian, Derek F. Wong, Lidia S. Chao, Paulo Quaresma, Francisco Oliveira, and Lu Yi. 2014. UM-Corpus: A Large English-Chinese Parallel Corpus for Statistical Machine Translation. In Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC 2014, Reykjavik, Iceland, May 26-31, 2014, Nicoleta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asunción Moreno, Jan Odijk, and Stelios Piperidis (Eds.). European Language Resources Association (ELRA), 1837-1842. http://www.lrec-conf.org/proceedings/lrec2014/summaries/774.html

[32]

Jörg Tiedemann and Santhosh Thotingal. 2020. OPUS-MT-Building open translation services for the World. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, EAMT 2020, Lisboa, Portugal, November 3-5, 2020, Mikel L. Forcada, André Martins, Helena Moniz, Marco Turchi, Arianna Bisazza, Joss Moorkens, Ana Guerberof Arenas, Mary Nurminen, Lena Marg, Sara Fumega, Bruno Martins, Fernando Batista, Luísa Coheur, Carla Parra Escartín, and Isabel Trancoso (Eds.). European Association for Machine Translation, 479-480. https://aclanthology.org/ 2020.eamt-1.61/

[33]

David Vickrey, Luke Biewald, Marc Teyssier, and Daphne Koller. 2005. WordSense Disambiguation for Machine Translation. In HLT/EMNLP 2005, Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 6-8 October 2005, Vancouver, British Columbia, Canada. The Association for Computational Linguistics, 771-778. https://aclanthology.org/H05-1097/

[34]

Qi Wang, Jungang Xu, Hong Chen, and Ben He. 2017. Two improved continuous bag-of-word models. In 2017 International Joint Conference on Neural Networks (IJCNN). 2851-2856. https://doi.org/10.1109/IJCNN. 2017.7966208

[35]

Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. 2017. Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017, Martha Palmer, Rebecca Hwa, and Sebastian Riedel (Eds.). Association for Computational Linguistics, 2979-2989. https://doi.org/10.18653/ v1/d17-1323

[36]

Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. 2018. Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACLHLT, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 2 ( Short Papers), Marilyn A. Walker, Heng Ji, and Amanda Stent (Eds.). Association for Computational Linguistics, 15-20. https://doi.org/10.18653/v1/n18-2003

[37]

Michal Ziemski, Marcin Junczys-Dowmunt, and Bruno Pouliquen. 2016. The United Nations Parallel Corpus v1.0. In Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, Portorož, Slovenia, May 23-28, 2016, Nicoleta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, and Stelios Piperidis (Eds.). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2016/ summaries/1195.html

Cited By

Ji PFeng YZhang RXue RZhang YHuang WLiu JZhao Z(2024)NLPLego: Assembling Test Generation for Natural Language Processing ApplicationsACM Transactions on Software Engineering and Methodology10.1145/369163134:2(1-36)Online publication date: 5-Oct-2024
https://dl.acm.org/doi/10.1145/3691631
Xu YLi YWang JZhang XFilkov VRay BZhou M(2024)Evaluating Terminology Translation in Machine Translation Systems via Metamorphic TestingProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695069(758-769)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695069
Sun ZChen ZZhang JHao D(2024)Fairness Testing of Machine Translation SystemsACM Transactions on Software Engineering and Methodology10.1145/366460833:6(1-27)Online publication date: 27-Jun-2024
https://dl.acm.org/doi/10.1145/3664608

Index Terms

Back Deduction Based Testing for Word Sense Disambiguation Ability of Machine Translation Systems
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Machine translation
2. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

Word Sense Based Hindi-Tamil Statistical Machine Translation

Corpus based natural language processing has emerged with great success in recent years. It is not only used for languages like English, French, Spanish, and Hindi but also is widely used for languages like Tamil, Telugu etc. This paper focuses to ...
Unsupervised translated word sense disambiguation in constructing bilingual lexical database
SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing

The performance of a machine translation system depends on the availability of bilingual lexical dictionary and completion of its word sense disambiguation performance. Word sense disambiguation plays a vital role in several applications such as machine ...
A Sense Annotated Corpus for All-Words Urdu Word Sense Disambiguation

Word Sense Disambiguation (WSD) aims to automatically predict the correct sense of a word used in a given context. All human languages exhibit word sense ambiguity, and resolving this ambiguity can be difficult. Standard benchmark resources are required ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISSTA 2023: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis

July 2023

1554 pages

ISBN:9798400702211

DOI:10.1145/3597926

General Chair:
René Just
University of Washington, USA
,
Program Chair:
Gordon Fraser
University of Passau, Germany

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 July 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
Major Program of the Natural Science Foundation of Jiangsu Higher Education Institutions of China

Conference

ISSTA '23

Sponsor:

SIGSOFT

ISSTA '23: 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis

July 17 - 21, 2023

WA, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 58 of 213 submissions, 27%

Upcoming Conference

ISSTA '25

Sponsor:
sigsoft

34th ACM SIGSOFT International Symposium on Software Testing and Analysis

June 25 - 28, 2025

Trondheim , Norway

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
206
Total Downloads

Downloads (Last 12 months)69
Downloads (Last 6 weeks)3

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ji PFeng YZhang RXue RZhang YHuang WLiu JZhao Z(2024)NLPLego: Assembling Test Generation for Natural Language Processing ApplicationsACM Transactions on Software Engineering and Methodology10.1145/369163134:2(1-36)Online publication date: 5-Oct-2024
https://dl.acm.org/doi/10.1145/3691631
Xu YLi YWang JZhang XFilkov VRay BZhou M(2024)Evaluating Terminology Translation in Machine Translation Systems via Metamorphic TestingProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695069(758-769)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695069
Sun ZChen ZZhang JHao D(2024)Fairness Testing of Machine Translation SystemsACM Transactions on Software Engineering and Methodology10.1145/366460833:6(1-27)Online publication date: 27-Jun-2024
https://dl.acm.org/doi/10.1145/3664608

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten