skip to main content
10.1145/3597926.3598081acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
research-article

Back Deduction Based Testing for Word Sense Disambiguation Ability of Machine Translation Systems

Published: 13 July 2023 Publication History

Abstract

Machine translation systems have penetrated our daily lives, providing translation services from source language to target language to millions of users online daily. Word Sense Disambiguation (WSD) is one of the essential functional requirements of machine translation systems, which aims to determine the exact sense of polysemes in the given context. Commercial machine translation systems (e.g., Google Translate) have been shown to fail in identifying the proper sense and consequently cause translation errors. However, to our knowledge, no prior studies focus on testing such WSD bugs for machine translation systems.
To tackle this challenge, we propose a novel testing method Back Deduction based Testing for Word Sense Disambiguation (BDTD). Our method’s main idea is to obtain the hidden senses of source words via back deduction from the target language, i.e., employ translation words in the target language to deduce senses of original words identified in the translation procedure. To evaluate BDTD, we conduct an extensive empirical study with millions of sentences under three popular translators, including Google Translate and Bing Microsoft Translator. The experimental results indicate that BDTD can identify a considerable number of WSD bugs with high accuracy, more than 80%, under all three translators.

References

[1]
Muhammad Hilmi Asyrofi, Imam Nur Bani Yusuf, Hong Jin Kang, Ferdian Thung, Zhou Yang, and David Lo. 2021. BiasFinder: Metamorphic Test Generation to Uncover Bias for Sentiment Analysis Systems. CoRR abs/2102. 01859 ( 2021 ). arXiv:2102. 01859 https://arxiv.org/abs/2102.01859
[2]
Loïc Barrault, Ondřej Bojar, Marta R. Costa-jussà, Christian Federmann, Mark Fishel, Yvete Graham, Barry Haddow, Mathias Huck, Philipp Koehn, Shervin Malmasi, Christof Monz, Mathias Müller, Santanu Pal, Mat Post, and Marcos Zampieri. 2019. Findings of the 2019 Conference on Machine Translation (WMT19). In Proceedings of the Fourth Conference on Machine Translation (Volume 2 : Shared Task Papers, Day 1 ). Association for Computational Linguistics, Florence, Italy, 1-61. https://doi.org/10.18653/v1/ W19-5301
[3]
Michele Bevilacqua, Tommaso Pasini, Alessandro Raganato, Roberto Navigli, et al. 2021. Recent trends in word sense disambiguation: A survey. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21. International Joint Conference on Artificial Intelligence, Inc.
[4]
Tolga Bolukbasi, Kai-Wei Chang, James Y. Zou, Venkatesh Saligrama, and Adam Tauman Kalai. 2016. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnet (Eds.). 4349-4357. https://proceedings.neurips.cc/paper/2016/hash/ a486cd07e4ac3d270571622f4f316ec5-Abstract.html
[5]
Niccolò Campolungo, Federico Martelli, Francesco Saina, and Roberto Navigli. 2022. DiBiMT: A Novel Benchmark for Measuring Word Sense Disambiguation Biases in Machine Translation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1 : Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguistics, 4331-4352. https://doi.org/10.18653/v1/ 2022. acl-long.298
[6]
CWMT. 2018. CWMT Dataset. http://nlp.nju.edu.cn/cwmt-wmt/
[7]
Jixiang Deng, Yong Deng, and Kang Hao Cheong. 2021. Combining conflicting evidence based on Pearson correlation coeficient and weighted graph. International Journal of Intelligent Systems 36, 12 ( 2021 ), 7443-7460.
[8]
Chris Dyer, Victor Chahuneau, and Noah A. Smith. 2013. A Simple, Fast, and Efective Reparameterization of IBM Model 2. In Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, June 9-14, 2013, Westin Peachtree Plaza Hotel, Atlanta, Georgia, USA, Lucy Vanderwende, Hal Daumé III, and Katrin Kirchhof (Eds.). hTe Association for Computational Linguistics, 644-648. https://aclanthology. org/N13-1073/
[9]
Denis Emelin, Ivan Titov, and Rico Sennrich. 2020. Detecting Word Sense Disambiguation Biases in Machine Translation for Model-Agnostic Adversarial Attacks. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 7635-7653. https://doi.org/10.18653/v1/ 2020.emnlp-main. 616
[10]
Hugging Face. 2020. opus-mt-en-zh. https://huggingface.co/Helsinki-NLP/ opus-mt-en-zh
[11]
Joel Escudé Font and Marta R. Costa-jussà. 2019. Equalizing Gender Biases in Neural Machine Translation with Word Embeddings Techniques. CoRR abs/ 1901.03116 ( 2019 ). arXiv: 1901.03116 http://arxiv.org/abs/ 1901.03116
[12]
Google. 2022. Google Translate. Retrieved March 25-March 30, 2022 from http://translate.google.com/
[13]
Barry Haddow, Rachel Bawden, Antonio Valerio Miceli Barone, Jindřich Helcl, and Alexandra Birch. 2022. Survey of low-resource machine translation. Computational Linguistics 48, 3 ( 2022 ), 673-732.
[14]
Pinjia He, Clara Meister, and Zhendong Su. 2020. Structure-invariant testing for machine translation. In ICSE '20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June-19 July, 2020, Gregg Rothermel and Doo-Hwan Bae (Eds.). ACM, 961-973. https://doi.org/10.1145/3377811.3380339
[15]
Mathew Honnibal and Ines Montani. 2017. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. To appear ( 2017 ).
[16]
Pierre Lison, Jörg Tiedemann, and Milen Kouylekov. 2018. OpenSubtitles2018: Statistical Rescoring of Sentence Alignments in Large, Noisy Parallel Corpora. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7-12, 2018, Nicoleta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Kôiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, Stelios Piperidis, and Takenobu Tokunaga (Eds.). European Language Resources Association (ELRA). http://www.lrec-conf.org/ proceedings/lrec2018/summaries/294.html
[17]
Adam Lopez. 2008. Statistical machine translation. ACM Comput. Surv. 40, 3 ( 2008 ), 8 : 1-8 : 49. https://doi.org/10.1145/1380584.1380586
[18]
Microsoft. 2022. Bing Microsoft Translator. Retrieved March 26-April 1, 2022 from https://www.bing.com/translator
[19]
Roberto Navigli and Simone Paolo Ponzeto. 2010. BabelNet: Building a Very Large Multilingual Semantic Network. In ACL 2010, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, July 11-16, 2010, Uppsala, Sweden, Jan Hajic, Sandra Carberry, and Stephen Clark (Eds.). The Association for Computer Linguistics, 216-225. https://aclanthology.org/P10-1023/
[20]
Marcelo O. R. Prates, Pedro H. C. Avelar, and Luís C. Lamb. 2020. Assessing gender bias in machine translation: a case study with Google Translate. Neural Comput. Appl. 32, 10 ( 2020 ), 6363-6381. https://doi.org/10.1007/s00521-019-04144-6
[21]
Alessandro Raganato, Yves Scherrer, and Jörg Tiedemann. 2019. The MuCoW Test Suite at WMT 2019: Automatically Harvested Multilingual Contrastive Word Sense Disambiguation Test Sets for Machine Translation. In Proceedings of the Fourth Conference on Machine Translation, WMT 2019, Florence, Italy, August 1-2, 2019-Volume 2 : Shared Task Papers, Day 1, Ondrej Bojar, Rajen Chaterjee, Christian Federmann, Mark Fishel, Yvete Graham, Barry Haddow, Mathias Huck, Antonio Jimeno-Yepes, Philipp Koehn, André Martins, Christof Monz, Mateo Negri, Aurélie Névéol, Mariana L. Neves, Mat Post, Marco Turchi, and Karin Verspoor (Eds.). Association for Computational Linguistics, 470-480. https://doi.org/10.18653/v1/w19-5354
[22]
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.). Association for Computational Linguistics, 3980-3990. https: //doi.org/10.18653/v1/ D19-1410
[23]
Rachel Rudinger, Chandler May, and Benjamin Van Durme. 2017. Social Bias in Elicited Natural Language Inferences. In Proceedings of the First ACL Workshop on Ethics in Natural Language Processing, EthNLP@EACL, Valencia, Spain, April 4, 2017, Dirk Hovy, Shannon L. Spruit, Margaret Mitchell, Emily M. Bender, Michael Strube, and Hanna M. Wallach (Eds.). Association for Computational Linguistics, 74-79. https://doi.org/10.18653/v1/w17-1609
[24]
Danielle Saunders and Bill Byrne. 2020. Reducing Gender Bias in Neural Machine Translation as a Domain Adaptation Problem. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault (Eds.). Association for Computational Linguistics, 7724-7736. https: //doi.org/10.18653/v1/ 2020.acl-main. 690
[25]
Beatrice Savoldi, Marco Gaido, Luisa Bentivogli, Mateo Negri, and Marco Turchi. 2021. Gender Bias in Machine Translation. CoRR abs/2104.06001 ( 2021 ). arXiv: 2104.06001 https://arxiv.org/abs/2104.06001
[26]
Qingchao Shen, Haoyang Ma, Junjie Chen, Yongqiang Tian, Shing-Chi Cheung, and Xiang Chen. 2021. A comprehensive study of deep learning compiler bugs. In Proceedings of the 29th ACM Joint meeting on european software engineering conference and symposium on the foundations of software engineering. 968-980.
[27]
Felix Stahlberg. 2020. Neural machine translation: A review. Journal of Artificial Intelligence Research 69 ( 2020 ), 343-418.
[28]
Gabriel Stanovsky, Noah A. Smith, and Luke Zetlemoyer. 2019. Evaluating Gender Bias in Machine Translation. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28-August 2, 2019, Volume 1 : Long Papers, Anna Korhonen, David R. Traum, and Lluís Màrquez (Eds.). Association for Computational Linguistics, 1679-1684. https://doi.org/10.18653/v1/p19-1164
[29]
Zeyu Sun, Jie M. Zhang, Mark Harman, Mike Papadakis, and Lu Zhang. 2020. Automatic testing and improvement of machine translation. In ICSE '20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June-19 July, 2020, Gregg Rothermel and Doo-Hwan Bae (Eds.). ACM, 974-985. https: //doi.org/10.1145/3377811.3380420
[30]
Zeyu Sun, Jie M Zhang, Yingfei Xiong, Mark Harman, Mike Papadakis, and Lu Zhang. 2022. Improving machine translation systems via isotopic replacement. In Proceedings of the 2022 International Conference on Software Engineering, ICSE.
[31]
Liang Tian, Derek F. Wong, Lidia S. Chao, Paulo Quaresma, Francisco Oliveira, and Lu Yi. 2014. UM-Corpus: A Large English-Chinese Parallel Corpus for Statistical Machine Translation. In Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC 2014, Reykjavik, Iceland, May 26-31, 2014, Nicoleta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asunción Moreno, Jan Odijk, and Stelios Piperidis (Eds.). European Language Resources Association (ELRA), 1837-1842. http://www.lrec-conf.org/proceedings/lrec2014/summaries/774.html
[32]
Jörg Tiedemann and Santhosh Thotingal. 2020. OPUS-MT-Building open translation services for the World. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, EAMT 2020, Lisboa, Portugal, November 3-5, 2020, Mikel L. Forcada, André Martins, Helena Moniz, Marco Turchi, Arianna Bisazza, Joss Moorkens, Ana Guerberof Arenas, Mary Nurminen, Lena Marg, Sara Fumega, Bruno Martins, Fernando Batista, Luísa Coheur, Carla Parra Escartín, and Isabel Trancoso (Eds.). European Association for Machine Translation, 479-480. https://aclanthology.org/ 2020.eamt-1.61/
[33]
David Vickrey, Luke Biewald, Marc Teyssier, and Daphne Koller. 2005. WordSense Disambiguation for Machine Translation. In HLT/EMNLP 2005, Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 6-8 October 2005, Vancouver, British Columbia, Canada. The Association for Computational Linguistics, 771-778. https://aclanthology.org/H05-1097/
[34]
Qi Wang, Jungang Xu, Hong Chen, and Ben He. 2017. Two improved continuous bag-of-word models. In 2017 International Joint Conference on Neural Networks (IJCNN). 2851-2856. https://doi.org/10.1109/IJCNN. 2017.7966208
[35]
Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. 2017. Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017, Martha Palmer, Rebecca Hwa, and Sebastian Riedel (Eds.). Association for Computational Linguistics, 2979-2989. https://doi.org/10.18653/ v1/d17-1323
[36]
Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. 2018. Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACLHLT, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 2 ( Short Papers), Marilyn A. Walker, Heng Ji, and Amanda Stent (Eds.). Association for Computational Linguistics, 15-20. https://doi.org/10.18653/v1/n18-2003
[37]
Michal Ziemski, Marcin Junczys-Dowmunt, and Bruno Pouliquen. 2016. The United Nations Parallel Corpus v1.0. In Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, Portorož, Slovenia, May 23-28, 2016, Nicoleta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, and Stelios Piperidis (Eds.). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2016/ summaries/1195.html

Cited By

View all
  • (2024)NLPLego: Assembling Test Generation for Natural Language Processing ApplicationsACM Transactions on Software Engineering and Methodology10.1145/369163134:2(1-36)Online publication date: 5-Oct-2024
  • (2024)Evaluating Terminology Translation in Machine Translation Systems via Metamorphic TestingProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695069(758-769)Online publication date: 27-Oct-2024
  • (2024)Fairness Testing of Machine Translation SystemsACM Transactions on Software Engineering and Methodology10.1145/366460833:6(1-27)Online publication date: 27-Jun-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISSTA 2023: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis
July 2023
1554 pages
ISBN:9798400702211
DOI:10.1145/3597926
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 July 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Back Deduction
  2. Machine Translation
  3. Software Testing
  4. Word Sense Disambiguation

Qualifiers

  • Research-article

Funding Sources

Conference

ISSTA '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 58 of 213 submissions, 27%

Upcoming Conference

ISSTA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)69
  • Downloads (Last 6 weeks)3
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)NLPLego: Assembling Test Generation for Natural Language Processing ApplicationsACM Transactions on Software Engineering and Methodology10.1145/369163134:2(1-36)Online publication date: 5-Oct-2024
  • (2024)Evaluating Terminology Translation in Machine Translation Systems via Metamorphic TestingProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695069(758-769)Online publication date: 27-Oct-2024
  • (2024)Fairness Testing of Machine Translation SystemsACM Transactions on Software Engineering and Methodology10.1145/366460833:6(1-27)Online publication date: 27-Jun-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media