research-article

Machine translation testing via pathological invariance

Authors:
Shashij Gupta

IIT Bombay, India

IIT Bombay, India
View Profile

,
Pinjia He

ETH Zurich, Switzerland

ETH Zurich, Switzerland
View Profile

,
Clara Meister

ETH Zurich, Switzerland

ETH Zurich, Switzerland
View Profile

,
Zhendong Su

ETH Zurich, Switzerland

ETH Zurich, Switzerland
View Profile

ESEC/FSE 2020: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software EngineeringNovember 2020Pages 863–875https://doi.org/10.1145/3368089.3409756

Published:08 November 2020Publication History

ESEC/FSE 2020: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Pages 863–875

ABSTRACT

Machine translation software has become heavily integrated into our daily lives due to the recent improvement in the performance of deep neural networks. However, machine translation software has been shown to regularly return erroneous translations, which can lead to harmful consequences such as economic loss and political conflicts. Additionally, due to the complexity of the underlying neural models, testing machine translation systems presents new challenges. To address this problem, we introduce a novel methodology called PatInv. The main intuition behind PatInv is that sentences with different meanings should not have the same translation. Under this general idea, we provide two realizations of PatInv that given an arbitrary sentence, generate syntactically similar but semantically different sentences by: (1) replacing one word in the sentence using a masked language model or (2) removing one word or phrase from the sentence based on its constituency structure. We then test whether the returned translations are the same for the original and modified sentences. We have applied PatInv to test Google Translate and Bing Microsoft Translator using 200 English sentences. Two language settings are considered: English-Hindi (En-Hi) and English-Chinese (En-Zh). The results show that PatInv can accurately find 308 erroneous translations in Google Translate and 223 erroneous translations in Bing Microsoft Translator, most of which cannot be found by the state-of-the-art approaches.

Supplemental Material

fse20main-p668-p-teaser.mp4

mp4

22.2 MB

Download

fse20main-p668-p-video.mp4

mp4

195.5 MB

Download

References

[n.d.]. fairseq: A Fast, Extensible Toolkit for Sequence Modeling. https://github. com//pytorch/fairseqGoogle Scholar
[n.d.]. Google Translate. https://translate.google.comGoogle Scholar
[n.d.]. Thesaurus. https://www.thesaurus.com/Google Scholar
[n.d.]. WordsAPI. https://www.wordsapi.com/Google Scholar
2018. 15 ,000 Eggs Delivered to Norwegian Olympic Team After Google Translate Error. https://www.nbcwashington.com/news/national-international/ googletranslate-fail-norway-olympic-team-gets-15k-eggs-delivered/2034392/Google Scholar
2018. Greedy, Britle, Opaque, and Shallow: The Downsides to Deep Learning. https://www.wired.com/story/greedy-brittle-opaque-and-shallow-thedownsides-to-deep-learning/Google Scholar
Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, and Kai-Wei Chang. 2018. Generating Natural Language Adversarial Examples. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.Google ScholarCross Ref
Anish Athalye, Nicholas Carlini, and David Wagner. 2018. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples. In Proceedings of the 35th International Conference on Machine Learning (ICML).Google Scholar
Yonatan Belinkov and Yonatan Bisk. 2018. Synthetic and Natural Noise Both Break Neural Machine Translation. In Proceedings of the 6th International Conference on Learning Representations (ICLR).Google Scholar
Edward Loper Bird, Steven and Ewan Klein. 2009. Natural Language Processing with Python. O'Reilly Media Inc.Google Scholar
Ond rej Bojar, Christian Federmann, Mark Fishel, Yvete Graham, Barry Haddow, Mathias Huck, Philipp Koehn, and Christof Monz. 2018. Findings of the 2018 Conference on Machine Translation (WMT18). In Proceedings of the Third Conference on Machine Translation, Volume 2 : Shared Task Papers. Association for Computational Linguistics, Belgium, Brussels, 272-307. http://www.aclweb. org/anthology/W18-6401Google Scholar
Nicholas Carlini, Pratyush Mishra, Tavish Vaidya, Yuankai Zhang, Micah Sherr, Clay Shields, David Wagner, and Wenchao Zhou. 2016. Hidden Voice Commands. In Proceedings of the 25th USENIX Security Symposium (USENIX Security).Google ScholarDigital Library
Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, et al. 2018. Universal sentence encoder. arXiv preprint arXiv: 1803. 11175 ( 2018 ).Google Scholar
Akshay Chaturvedi, Abijith KP, and Utpal Garain. 2019. Exploring the Robustness of NMT Systems to Nonsensical Inputs. arXiv preprint arXiv: 1908. 01165 ( 2019 ).Google Scholar
Tsong Y. Chen, Shing C. Cheung, and Shiu Ming Yiu. 1998. Metamorphic testing: a new approach for generating next test cases. Technical Report. Technical Report HKUST-CS98-01, Department of Computer Science, Hong Kong University of Science and Technology, Hong Kong.Google Scholar
Tsong Yueh Chen, Fei-Ching Kuo, Huai Liu, Pak-Lok Poon, Dave Towey, T. H. Tse, and Zhi Quan Zhou. 2018. Metamorphic Testing: A Review of Challenges and Opportunities. ACM Computing Surveys (CSUR) 51 ( 2018 ). Issue 1.Google ScholarDigital Library
N. Chomsky. 1957. Syntactic Structures. Mouton, The Hague.Google Scholar
Chenhui Chu, Raj Dabre, and Sadao Kurohashi. 2017. An empirical comparison of simple domain adaptation methods for neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL).Google Scholar
Gareth Davies. 2017. Palestinian man is arrested by police after posting 'Good morning' in Arabic on Facebook which was wrongly translated as 'attack them'. https://www.dailymail.co.uk/news/article-5005489/ Good-morningFacebook-post-leads-arrest-Palestinian.htmlGoogle Scholar
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv: 1810. 04805 ( 2018 ).Google Scholar
Yinpeng Dong, Qi-An Fu, Xiao Yang, Tianyu Pang, Hang Su, Zihao Xiao, and Jun Zhu. 2019. Benchmarking Adversarial Robustness. arXiv preprint arXiv: 1912. 11852 ( 2019 ).Google Scholar
Tianyu Du, Shouling Ji, Jinfeng Li, Qinchen Gu, Ting Wang, and Raheem Beyah. 2019. SirenAtack: Generating Adversarial Audio for End-to-End Acoustic Systems. arXiv preprint arXiv: 1901. 07846 ( 2019 ).Google Scholar
Xiaoning Du, Xiaofei Xie, Yi Li, Lei Ma, Yang Liu, and Jianjun Zhao. 2019. DeepStellar: Model-based quantitative analysis of stateful deep learning systems. In ESEC/FSE 2019-Proceedings of the 2019 27th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering.Google ScholarDigital Library
Javid Ebrahimi, Daniel Lowd, and Dejing Dou. 2018. On Adversarial Examples for Character-Level Neural Machine Translation. In Proceedings of the 27th International Conference on Computational Linguistics (COLING).Google Scholar
Hugging Face. [n.d.]. Transformers: State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch. https://github.com/huggingface/ transformersGoogle Scholar
Alessio Gambi, Marc Mueller, and Gordon Fraser. 2019. Automatically testing self-driving cars with search-based procedural content generation. In Proc. of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA).Google ScholarDigital Library
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and Harnessing Adversarial Examples. Proceedings of the 3rd International Conference on Learning Representations (ICLR).Google Scholar
Stanford NLP Group. [n.d.]. Stanford CoreNLP-Natural language software. https://stanfordnlp.github.io/CoreNLP/Google Scholar
Hany Hassan, Anthony Aue, Chang Chen, Vishal Chowdhary, Jonathan Clark, Christian Federmann, Xuedong Huang, Marcin Junczys-Dowmunt, William Lewis, Mu Li, et al. 2018. Achieving Human Parity on Automatic Chinese to English News Translation. arXiv preprint arXiv: 1803. 05567 ( 2018 ).Google Scholar
Pinjia He, Clara Meister, and Zhendong Su. 2020. Structure-Invariant Testing for Machine Translation. In Proc. of the 42nd International Conference on Software Engineering (ICSE).Google ScholarDigital Library
J. Henriksson, C. Berger, M. Borg, L. Tornberg, C. Englund, S. R. Sathyamoorthy, and S. Ursing. 2019. Towards Structured Evaluation of Deep Neural Network Supervisors. In 2019 IEEE International Conference On Artificial Intelligence Testing (AITest).Google Scholar
Mohit Iyyer, John Wieting, Kevin Gimpel, and Luke Zetlemoyer. 2018. Adversarial Example Generation with Syntactically Controlled Paraphrase Networks. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 ( Long Papers).Google ScholarCross Ref
Mohit Iyyer, John Wieting, Kevin Gimpel, and Luke Zetlemoyer. 2018. Adversarial Example Generation with Syntactically Controlled Paraphrase Networks. In Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT).Google ScholarCross Ref
Robin Jia and Percy Liang. 2017. Adversarial Examples for Evaluating Reading Comprehension Systems. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP).Google ScholarCross Ref
Harini Kannan, Alexey Kurakin, and Ian Goodfellow. 2018. Adversarial Logit Pairing. arXiv preprint arXiv: 1803. 06373 ( 2018 ).Google Scholar
Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding Deep Learning System Testing using Surprise Adequacy. In Proceedings of the 41st International Conference on Software Engineering (ICSE).Google ScholarDigital Library
Fred. Lambert. 2016. Understanding the fatal Tesla accident on Autopilot and the NHTSA probe. https://electrek.co/ 2016 /07/01/understanding-fatal-teslaaccident-autopilot-nhtsa-probe/Google Scholar
Vu Le, Mehrdad Afshari, and Zhendong Su. 2014. Compiler Validation via Equivalence Modulo Inputs. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI).Google ScholarDigital Library
Sam Levin. 2018. Tesla fatal crash: 'autopilot' mode sped up car before driver killed, report finds. https://www.theguardian.com/technology/2018/jun/ 07/tesla-fatal-crash-silicon-valley-autopilot-mode-reportGoogle Scholar
Jinfeng Li, Shouling Ji, Tianyu Du, Bo Li, and Ting Wang. 2019. TextBugger: Generating Adversarial Text Against Real-world Applications. In Proceedings of the 26th Annual Network and Distributed System Security Symposium (NDSS).Google ScholarCross Ref
Christopher Lidbury, Andrei Lascu, Nathan Chong, and Alastair F. Donaldson. 2015. Many-Core Compiler Fuzzing. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI).Google Scholar
Ji Lin, Chuang Gan, and Song Han. 2019. Defensive Quantization: When Eficiency Meets Robustness. In Proceedings of the 7th International Conference on Learning Representations (ICLR).Google Scholar
Mikael Lindvall, Dharmalingam Ganesan, Ragnar Árdal, and Robert E. Wiegand. 2015. Metamorphic Model-based Testing Applied on NASA DAT-an experience report. In Proceedings of the 37th International Conference on Software Engineering (ICSE).Google Scholar
Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, et al. 2018. Deepgauge: Multi-Granularity Testing Criteria for Deep Learning Systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE).Google ScholarDigital Library
Shiqing Ma, Yingqi Liu, Wen-Chuan Lee, Xiangyu Zhang, and Ananth Grama. 2018. MODE: Automated Neural Network Model Debugging via State Diferential Analysis and Input Selection. In Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).Google ScholarDigital Library
Shiqing Ma, Yingqi Liu, Guanhong Tao, Wen-Chuan Lee, and Xiangyu Zhang. 2019. NIC: Detecting Adversarial Samples with Neural Network Invariant Checking. In Proceedings of the 26th Annual Network and Distributed System Security Symposium (NDSS).Google ScholarCross Ref
Fiona Macdonald. 2015. The Greatest Mistranslations Ever. http://www.bbc. com/culture/story/20150202-the-greatest-mistranslations-everGoogle Scholar
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2018. Towards Deep Learning Models Resistant to Adversarial Atacks. In Proceedings of the 6th International Conference on Learning Representations (ICLR).Google Scholar
Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In Association for Computational Linguistics (ACL) System Demonstrations.Google Scholar
Paul Michel, Xian Li, Graham Neubig, and Juan Miguel Pino. 2019. On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models. In Proceedings of the 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACLHLT).Google ScholarCross Ref
Tomas Mikolov, Kai Chen, Greg Corrado, and Jefrey Dean. 2013. Eficient Estimation of Word Representations in Vector Space. arXiv e-prints ( 2013 ).Google Scholar
Pramod K. Mudrakarta, Ankur Taly, Mukund Sundararajan, and Kedar Dhamdhere. 2018. Did the Model Understand the Question?. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL).Google ScholarCross Ref
Christian Murphy, Gail E. Kaiser, Lifeng Hu, and Leon Wu. 2008. Properties of Machine Learning Applications for Use in Metamorphic Testing. In Proceedings of the 20th International Conference on Software Engineering and Knowledge Engineering (SEKE).Google Scholar
Arika Okrent. 2016. 9 Litle Translation Mistakes That Caused Big Problems. http://mentalfloss.com/article/48795/9-little-translation-mistakescaused-big-problemsGoogle Scholar
Thuy Ong. 2017. Facebook apologizes after wrong translation sees Palestinian man arrested for posting 'good morning'. https://www.theverge.com/usworld/2017/10/24/16533496/facebook-apology-wrong-translation-palestinianarrested-post-good-morningGoogle Scholar
Myle Ot, Michael Auli, David Grangier, and Marc'Aurelio Ranzato. 2018. Analyzing Uncertainty in Neural Machine Translation. arXiv:cs.CL/ 1803.00047Google Scholar
Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. 2016. Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks. In IEEE Symposium on Security and Privacy.Google ScholarCross Ref
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL).Google Scholar
Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. Deepxplore: Automated Whitebox Testing of Deep Learning Systems. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP).Google ScholarDigital Library
Jefrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP).Google ScholarCross Ref
Mathew E. Peters, Mark Neumann, Mohit Iyyer, Mat Gardner, Christopher Clark, Kenton Lee, and Luke Zetlemoyer. 2018. Deep contextualized word representations. arXiv e-prints ( 2018 ).Google Scholar
Hung Viet Pham, Thibaud Lutellier, Weizhen Qi, and Lin Tan. 2019. CRADLE: cross-backend validation to detect and localize bugs in deep learning libraries. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE.Google ScholarDigital Library
Danish Pruthi, Bhuwan Dhingra, and Zachary C. Lipton. 2019. Combating Adversarial Misspellings with Robust Word Recognition. In Proc. of the 57th Annual Meeting of the Association for Computational Linguistics (ACL).Google Scholar
Alec Radford. 2018. Improving Language Understanding by Generative PreTraining.Google Scholar
RobustNLP. 2020. A toolkit for testing machine translation. https://github.com/ RobustNLP/TestTranslationGoogle Scholar
Benny Royston. 2018. Israel Eurovision winner Neta called 'a real cow'by Prime Minister in auto-translate fail. https://metro.co.uk/ 2018 /05/13/israeleurovision-winner-netta-called-a-real-cow-by-prime-minister-in-autotranslate-fail-7541925/Google Scholar
Sergio Segura, Gordon Fraser, Ana B. Sanchez, and Antonio Ruiz-Cortés. 2016. A Survey on Metamorphic Testing. IEEE Transactions on Software Engineering (TSE) 42 ( 2016 ). Issue 9.Google ScholarCross Ref
Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Improving neural machine translation models with monolingual data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL).Google ScholarCross Ref
Zeyu Sun, Jie M Zhang, Mark Harman, Mike Papadakis, and Lu Zhang. 2020. Automatic Testing and Improvement of Machine Translation. In Proc. of the 42nd International Conference on Software Engineering (ICSE).Google ScholarDigital Library
Guanhong Tao, Shiqing Ma, Yingqi Liu, and Xiangyu Zhang. 2018. Atacks Meet Interpretability: Atribute-steered Detection of Adversarial Samples. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS).Google Scholar
Wilson L. Taylor. 1953. ”Cloze Procedure” : A New Tool for Measuring Readability. Journalism Bulletin 30, 4 ( 1953 ), 415-433.Google Scholar
Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. Deeptest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars. In Proceedings of the 40th International Conference on Software Engineering (ICSE).Google ScholarDigital Library
Barak Turovsky. 2016. Ten years of Google Translate. https://blog.google/ products/translate/ten-years-of-google-translate/Google Scholar
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, and Illia Kaiser, Lukasz abd Polosukhin. 2017. Atention is All you Need. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS).Google Scholar
Jingyi Wang, Guoliang Dong, Jun Sun, Xinyu Wang, and Peixin Zhang. 2019. Adversarial Sample Detection for Deep Neural Network through Model Mutation Testing. In Proceedings of the 41st International Conference on Software Engineering (ICSE).Google ScholarDigital Library
Wenyu Wang, Wujie Zheng, Dian Liu, Changrong Zhang, Qinsong Zeng, Yuetang Deng, Wei Yang, Pinjia He, and Tao Xie. 2019. Detecting Failures of Neural Machine Translation in the Absence of Reference Translations. In Proc. of the 49th IEEE/IFIP International Conference on Dependable Systems and Networks (industry track).Google ScholarCross Ref
Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google's Neural Machine Translation System: Bridging the Gap Between Human and Machine Translation. arXiv preprint arXiv:1609.08144 ( 2016 ).Google Scholar
Xiaoyuan Xie, Joshua WK Ho, Christian Murphy, Gail Kaiser, Baowen Xu, and Tsong Yueh Chen. 2011. Testing and Validating Machine Learning Classifiers by Metamorphic Testing. Journal of Systems and Software (JSS) 84 ( 2011 ). Issue 4.Google Scholar
Xiaofei Xie, Lei Ma, Felix Juefei-Xu, Minhui Xue, Hongxu Chen, Yang Liu, Jianjun Zhao, Bo Li, Jianxiong Yin, and Simon See. 2019. Deephunter: A coverageguided fuzz testing framework for deep neural networks. In ISSTA 2019-Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis.Google ScholarDigital Library
Chong Xiong, Charles R. Qi, and Bo Li. 2019. Generating 3D Adversarial Point Clouds. In Proceedings of the 2019 IEEE Conference on Computer Vision and Patern Recognition (CVPR).Google ScholarCross Ref
Weilin Xu, David Evans, and Yanjun Qi. 2018. Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks. In Proceedings of the 25th Annual Network and Distributed System Security Symposium (NDSS).Google ScholarCross Ref
Dawei Yang, Chaowei Xiao, Bo Li, Jia Deng, and Mingyan Liu. 2019. Realistic Adversarial Examples in 3D Meshes. In Proceedings of the 2019 IEEE Conference on Computer Vision and Patern Recognition (CVPR).Google Scholar
Fuyuan Zhang, Sankalan Pal Chowdhury, and Maria Christakis. 2019. DeepSearch: Simple and Efective Blackbox Fuzzing of Deep Neural Networks. arXiv preprint arXiv: 1910. 06296 ( 2019 ).Google Scholar
Jie Zhang, Junjie Chen, Dan Hao, Yingfei Xiong, Bing Xie, Lu Zhang, and Hong Mei. 2014. Search-Based Inference of Polynomial Metamorphic Relations. In Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering (ASE).Google ScholarDigital Library
Jie M. Zhang, Mark Harman, Lei Ma, and Yang Liu. 2019. Machine Learning Testing: Survey, Landscapes and Horizons. IEEE Transactions on Software Engineering ( 2019 ).Google Scholar
Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, and Sarfraz Khurshid. 2018. Deeproad: Gan-Based Metamorphic Autonomous Driving System Testing. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE).Google ScholarDigital Library
Zhengli Zhao, Dheeru Dua, and Sameer Singh. 2018. Generating natural adversarial examples. In Proceedings of the 6th International Conference on Learning Representations (ICLR).Google Scholar
Wujie Zheng, Wenyu Wang, Dian Liu, Changrong Zhang, Qinsong Zeng, Yuetang Deng, Wei Yang, Pinjia He, and Tao Xie. 2018. Testing Untestable Neural Machine Translation: An Industrial Case. arXiv preprint arXiv:1807. 02340 ( 2018 ).Google Scholar
Wujie Zheng, Wenyu Wang, Dian Liu, Changrong Zhang, Qinsong Zeng, Yuetang Deng, Wei Yang, Pinjia He, and Tao Xie. 2019. Testing untestable neural machine translation: an industrial case. In Proc. of the 41st International Conference on Software Engineering: Companion Proceedings.Google ScholarDigital Library
Zhi Quan Zhou, Shaowen Xiang, and Tsong Yueh Chen. 2016. Metamorphic Testing for Software Quality Assessment: A Study of Search Engines. IEEE Transactions on Software Engineering (TSE) 42 ( 2016 ). Issue 3.Google Scholar
Muhua Zhu, Yue Zhang, Wenliang Chen, Min Zhang, and Jingbo Zhu. 2013. Fast and Accurate Shift-Reduce Constituent Parsing. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1 : Long Papers). Association for Computational Linguistics.Google Scholar
Chris. Ziegler. 2016. A Google self-driving car caused a crash for the first time. https://www.theverge.com/ 2016 /2/29/11134344/google-self-driving-carcrash-reportGoogle Scholar

Index Terms

Machine translation testing via pathological invariance
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Machine translation
2. Software and its engineering
  1. Software creation and management
    1. Software verification and validation

Recommendations

Structure-invariant testing for machine translation
ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering

In recent years, machine translation software has increasingly been integrated into our daily lives. People routinely use machine translation for various applications, such as describing symptoms to a foreign doctor and reading political news in a ...
Read More
Language Modeling for Syntax-Based Machine Translation Using Tree Substitution Grammars: A Case Study on Chinese-English Translation

The poor grammatical output of Machine Translation (MT) systems appeals syntax-based approaches within language modeling. However, previous studies showed that syntax-based language modeling using (Context-Free) Treebank Grammars was not very helpful in ...
Read More
Large aligned treebanks for syntax-based machine translation

We present a collection of parallel treebanks that have been automatically aligned on both the terminal and the non-terminal constituent level for use in syntax-based machine translation. We describe how they were constructed and applied to a syntax- ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ESEC/FSE 2020: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
November 2020
1703 pages
ISBN:9781450370431
DOI:10.1145/3368089
General Chair:
Prem Devanbu
University of California at Davis, USA
,
Program Chairs:
Myra Cohen
Iowa State University, USA
,
Thomas Zimmermann
Microsoft Research, USA
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 November 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Machine translation
Pathological Invariance
Testing
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate112of543submissions,21%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 422
  Total Downloads
- Downloads (Last 12 months)60
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Machine translation testing via pathological invariance

ESEC/FSE 2020: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Structure-invariant testing for machine translation

Language Modeling for Syntax-Based Machine Translation Using Tree Substitution Grammars: A Case Study on Chinese-English Translation

Large aligned treebanks for syntax-based machine translation