skip to main content
10.1145/3368089.3409756acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Machine translation testing via pathological invariance

Published:08 November 2020Publication History

ABSTRACT

Machine translation software has become heavily integrated into our daily lives due to the recent improvement in the performance of deep neural networks. However, machine translation software has been shown to regularly return erroneous translations, which can lead to harmful consequences such as economic loss and political conflicts. Additionally, due to the complexity of the underlying neural models, testing machine translation systems presents new challenges. To address this problem, we introduce a novel methodology called PatInv. The main intuition behind PatInv is that sentences with different meanings should not have the same translation. Under this general idea, we provide two realizations of PatInv that given an arbitrary sentence, generate syntactically similar but semantically different sentences by: (1) replacing one word in the sentence using a masked language model or (2) removing one word or phrase from the sentence based on its constituency structure. We then test whether the returned translations are the same for the original and modified sentences. We have applied PatInv to test Google Translate and Bing Microsoft Translator using 200 English sentences. Two language settings are considered: English-Hindi (En-Hi) and English-Chinese (En-Zh). The results show that PatInv can accurately find 308 erroneous translations in Google Translate and 223 erroneous translations in Bing Microsoft Translator, most of which cannot be found by the state-of-the-art approaches.

Skip Supplemental Material Section

Supplemental Material

fse20main-p668-p-teaser.mp4

mp4

22.2 MB

fse20main-p668-p-video.mp4

mp4

195.5 MB

References

  1. [n.d.]. fairseq: A Fast, Extensible Toolkit for Sequence Modeling. https://github. com//pytorch/fairseqGoogle ScholarGoogle Scholar
  2. [n.d.]. Google Translate. https://translate.google.comGoogle ScholarGoogle Scholar
  3. [n.d.]. Thesaurus. https://www.thesaurus.com/Google ScholarGoogle Scholar
  4. [n.d.]. WordsAPI. https://www.wordsapi.com/Google ScholarGoogle Scholar
  5. 2018. 15 ,000 Eggs Delivered to Norwegian Olympic Team After Google Translate Error. https://www.nbcwashington.com/news/national-international/ googletranslate-fail-norway-olympic-team-gets-15k-eggs-delivered/2034392/Google ScholarGoogle Scholar
  6. 2018. Greedy, Britle, Opaque, and Shallow: The Downsides to Deep Learning. https://www.wired.com/story/greedy-brittle-opaque-and-shallow-thedownsides-to-deep-learning/Google ScholarGoogle Scholar
  7. Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, and Kai-Wei Chang. 2018. Generating Natural Language Adversarial Examples. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.Google ScholarGoogle ScholarCross RefCross Ref
  8. Anish Athalye, Nicholas Carlini, and David Wagner. 2018. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples. In Proceedings of the 35th International Conference on Machine Learning (ICML).Google ScholarGoogle Scholar
  9. Yonatan Belinkov and Yonatan Bisk. 2018. Synthetic and Natural Noise Both Break Neural Machine Translation. In Proceedings of the 6th International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  10. Edward Loper Bird, Steven and Ewan Klein. 2009. Natural Language Processing with Python. O'Reilly Media Inc.Google ScholarGoogle Scholar
  11. Ond rej Bojar, Christian Federmann, Mark Fishel, Yvete Graham, Barry Haddow, Mathias Huck, Philipp Koehn, and Christof Monz. 2018. Findings of the 2018 Conference on Machine Translation (WMT18). In Proceedings of the Third Conference on Machine Translation, Volume 2 : Shared Task Papers. Association for Computational Linguistics, Belgium, Brussels, 272-307. http://www.aclweb. org/anthology/W18-6401Google ScholarGoogle Scholar
  12. Nicholas Carlini, Pratyush Mishra, Tavish Vaidya, Yuankai Zhang, Micah Sherr, Clay Shields, David Wagner, and Wenchao Zhou. 2016. Hidden Voice Commands. In Proceedings of the 25th USENIX Security Symposium (USENIX Security).Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, et al. 2018. Universal sentence encoder. arXiv preprint arXiv: 1803. 11175 ( 2018 ).Google ScholarGoogle Scholar
  14. Akshay Chaturvedi, Abijith KP, and Utpal Garain. 2019. Exploring the Robustness of NMT Systems to Nonsensical Inputs. arXiv preprint arXiv: 1908. 01165 ( 2019 ).Google ScholarGoogle Scholar
  15. Tsong Y. Chen, Shing C. Cheung, and Shiu Ming Yiu. 1998. Metamorphic testing: a new approach for generating next test cases. Technical Report. Technical Report HKUST-CS98-01, Department of Computer Science, Hong Kong University of Science and Technology, Hong Kong.Google ScholarGoogle Scholar
  16. Tsong Yueh Chen, Fei-Ching Kuo, Huai Liu, Pak-Lok Poon, Dave Towey, T. H. Tse, and Zhi Quan Zhou. 2018. Metamorphic Testing: A Review of Challenges and Opportunities. ACM Computing Surveys (CSUR) 51 ( 2018 ). Issue 1.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. N. Chomsky. 1957. Syntactic Structures. Mouton, The Hague.Google ScholarGoogle Scholar
  18. Chenhui Chu, Raj Dabre, and Sadao Kurohashi. 2017. An empirical comparison of simple domain adaptation methods for neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL).Google ScholarGoogle Scholar
  19. Gareth Davies. 2017. Palestinian man is arrested by police after posting 'Good morning' in Arabic on Facebook which was wrongly translated as 'attack them'. https://www.dailymail.co.uk/news/article-5005489/ Good-morningFacebook-post-leads-arrest-Palestinian.htmlGoogle ScholarGoogle Scholar
  20. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv: 1810. 04805 ( 2018 ).Google ScholarGoogle Scholar
  21. Yinpeng Dong, Qi-An Fu, Xiao Yang, Tianyu Pang, Hang Su, Zihao Xiao, and Jun Zhu. 2019. Benchmarking Adversarial Robustness. arXiv preprint arXiv: 1912. 11852 ( 2019 ).Google ScholarGoogle Scholar
  22. Tianyu Du, Shouling Ji, Jinfeng Li, Qinchen Gu, Ting Wang, and Raheem Beyah. 2019. SirenAtack: Generating Adversarial Audio for End-to-End Acoustic Systems. arXiv preprint arXiv: 1901. 07846 ( 2019 ).Google ScholarGoogle Scholar
  23. Xiaoning Du, Xiaofei Xie, Yi Li, Lei Ma, Yang Liu, and Jianjun Zhao. 2019. DeepStellar: Model-based quantitative analysis of stateful deep learning systems. In ESEC/FSE 2019-Proceedings of the 2019 27th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Javid Ebrahimi, Daniel Lowd, and Dejing Dou. 2018. On Adversarial Examples for Character-Level Neural Machine Translation. In Proceedings of the 27th International Conference on Computational Linguistics (COLING).Google ScholarGoogle Scholar
  25. Hugging Face. [n.d.]. Transformers: State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch. https://github.com/huggingface/ transformersGoogle ScholarGoogle Scholar
  26. Alessio Gambi, Marc Mueller, and Gordon Fraser. 2019. Automatically testing self-driving cars with search-based procedural content generation. In Proc. of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA).Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and Harnessing Adversarial Examples. Proceedings of the 3rd International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  28. Stanford NLP Group. [n.d.]. Stanford CoreNLP-Natural language software. https://stanfordnlp.github.io/CoreNLP/Google ScholarGoogle Scholar
  29. Hany Hassan, Anthony Aue, Chang Chen, Vishal Chowdhary, Jonathan Clark, Christian Federmann, Xuedong Huang, Marcin Junczys-Dowmunt, William Lewis, Mu Li, et al. 2018. Achieving Human Parity on Automatic Chinese to English News Translation. arXiv preprint arXiv: 1803. 05567 ( 2018 ).Google ScholarGoogle Scholar
  30. Pinjia He, Clara Meister, and Zhendong Su. 2020. Structure-Invariant Testing for Machine Translation. In Proc. of the 42nd International Conference on Software Engineering (ICSE).Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Henriksson, C. Berger, M. Borg, L. Tornberg, C. Englund, S. R. Sathyamoorthy, and S. Ursing. 2019. Towards Structured Evaluation of Deep Neural Network Supervisors. In 2019 IEEE International Conference On Artificial Intelligence Testing (AITest).Google ScholarGoogle Scholar
  32. Mohit Iyyer, John Wieting, Kevin Gimpel, and Luke Zetlemoyer. 2018. Adversarial Example Generation with Syntactically Controlled Paraphrase Networks. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 ( Long Papers).Google ScholarGoogle ScholarCross RefCross Ref
  33. Mohit Iyyer, John Wieting, Kevin Gimpel, and Luke Zetlemoyer. 2018. Adversarial Example Generation with Syntactically Controlled Paraphrase Networks. In Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT).Google ScholarGoogle ScholarCross RefCross Ref
  34. Robin Jia and Percy Liang. 2017. Adversarial Examples for Evaluating Reading Comprehension Systems. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP).Google ScholarGoogle ScholarCross RefCross Ref
  35. Harini Kannan, Alexey Kurakin, and Ian Goodfellow. 2018. Adversarial Logit Pairing. arXiv preprint arXiv: 1803. 06373 ( 2018 ).Google ScholarGoogle Scholar
  36. Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding Deep Learning System Testing using Surprise Adequacy. In Proceedings of the 41st International Conference on Software Engineering (ICSE).Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Fred. Lambert. 2016. Understanding the fatal Tesla accident on Autopilot and the NHTSA probe. https://electrek.co/ 2016 /07/01/understanding-fatal-teslaaccident-autopilot-nhtsa-probe/Google ScholarGoogle Scholar
  38. Vu Le, Mehrdad Afshari, and Zhendong Su. 2014. Compiler Validation via Equivalence Modulo Inputs. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI).Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Sam Levin. 2018. Tesla fatal crash: 'autopilot' mode sped up car before driver killed, report finds. https://www.theguardian.com/technology/2018/jun/ 07/tesla-fatal-crash-silicon-valley-autopilot-mode-reportGoogle ScholarGoogle Scholar
  40. Jinfeng Li, Shouling Ji, Tianyu Du, Bo Li, and Ting Wang. 2019. TextBugger: Generating Adversarial Text Against Real-world Applications. In Proceedings of the 26th Annual Network and Distributed System Security Symposium (NDSS).Google ScholarGoogle ScholarCross RefCross Ref
  41. Christopher Lidbury, Andrei Lascu, Nathan Chong, and Alastair F. Donaldson. 2015. Many-Core Compiler Fuzzing. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI).Google ScholarGoogle Scholar
  42. Ji Lin, Chuang Gan, and Song Han. 2019. Defensive Quantization: When Eficiency Meets Robustness. In Proceedings of the 7th International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  43. Mikael Lindvall, Dharmalingam Ganesan, Ragnar Árdal, and Robert E. Wiegand. 2015. Metamorphic Model-based Testing Applied on NASA DAT-an experience report. In Proceedings of the 37th International Conference on Software Engineering (ICSE).Google ScholarGoogle Scholar
  44. Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, et al. 2018. Deepgauge: Multi-Granularity Testing Criteria for Deep Learning Systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE).Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Shiqing Ma, Yingqi Liu, Wen-Chuan Lee, Xiangyu Zhang, and Ananth Grama. 2018. MODE: Automated Neural Network Model Debugging via State Diferential Analysis and Input Selection. In Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Shiqing Ma, Yingqi Liu, Guanhong Tao, Wen-Chuan Lee, and Xiangyu Zhang. 2019. NIC: Detecting Adversarial Samples with Neural Network Invariant Checking. In Proceedings of the 26th Annual Network and Distributed System Security Symposium (NDSS).Google ScholarGoogle ScholarCross RefCross Ref
  47. Fiona Macdonald. 2015. The Greatest Mistranslations Ever. http://www.bbc. com/culture/story/20150202-the-greatest-mistranslations-everGoogle ScholarGoogle Scholar
  48. Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2018. Towards Deep Learning Models Resistant to Adversarial Atacks. In Proceedings of the 6th International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  49. Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In Association for Computational Linguistics (ACL) System Demonstrations.Google ScholarGoogle Scholar
  50. Paul Michel, Xian Li, Graham Neubig, and Juan Miguel Pino. 2019. On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models. In Proceedings of the 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACLHLT).Google ScholarGoogle ScholarCross RefCross Ref
  51. Tomas Mikolov, Kai Chen, Greg Corrado, and Jefrey Dean. 2013. Eficient Estimation of Word Representations in Vector Space. arXiv e-prints ( 2013 ).Google ScholarGoogle Scholar
  52. Pramod K. Mudrakarta, Ankur Taly, Mukund Sundararajan, and Kedar Dhamdhere. 2018. Did the Model Understand the Question?. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL).Google ScholarGoogle ScholarCross RefCross Ref
  53. Christian Murphy, Gail E. Kaiser, Lifeng Hu, and Leon Wu. 2008. Properties of Machine Learning Applications for Use in Metamorphic Testing. In Proceedings of the 20th International Conference on Software Engineering and Knowledge Engineering (SEKE).Google ScholarGoogle Scholar
  54. Arika Okrent. 2016. 9 Litle Translation Mistakes That Caused Big Problems. http://mentalfloss.com/article/48795/9-little-translation-mistakescaused-big-problemsGoogle ScholarGoogle Scholar
  55. Thuy Ong. 2017. Facebook apologizes after wrong translation sees Palestinian man arrested for posting 'good morning'. https://www.theverge.com/usworld/2017/10/24/16533496/facebook-apology-wrong-translation-palestinianarrested-post-good-morningGoogle ScholarGoogle Scholar
  56. Myle Ot, Michael Auli, David Grangier, and Marc'Aurelio Ranzato. 2018. Analyzing Uncertainty in Neural Machine Translation. arXiv:cs.CL/ 1803.00047Google ScholarGoogle Scholar
  57. Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. 2016. Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks. In IEEE Symposium on Security and Privacy.Google ScholarGoogle ScholarCross RefCross Ref
  58. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL).Google ScholarGoogle Scholar
  59. Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. Deepxplore: Automated Whitebox Testing of Deep Learning Systems. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP).Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Jefrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP).Google ScholarGoogle ScholarCross RefCross Ref
  61. Mathew E. Peters, Mark Neumann, Mohit Iyyer, Mat Gardner, Christopher Clark, Kenton Lee, and Luke Zetlemoyer. 2018. Deep contextualized word representations. arXiv e-prints ( 2018 ).Google ScholarGoogle Scholar
  62. Hung Viet Pham, Thibaud Lutellier, Weizhen Qi, and Lin Tan. 2019. CRADLE: cross-backend validation to detect and localize bugs in deep learning libraries. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Danish Pruthi, Bhuwan Dhingra, and Zachary C. Lipton. 2019. Combating Adversarial Misspellings with Robust Word Recognition. In Proc. of the 57th Annual Meeting of the Association for Computational Linguistics (ACL).Google ScholarGoogle Scholar
  64. Alec Radford. 2018. Improving Language Understanding by Generative PreTraining.Google ScholarGoogle Scholar
  65. RobustNLP. 2020. A toolkit for testing machine translation. https://github.com/ RobustNLP/TestTranslationGoogle ScholarGoogle Scholar
  66. Benny Royston. 2018. Israel Eurovision winner Neta called 'a real cow'by Prime Minister in auto-translate fail. https://metro.co.uk/ 2018 /05/13/israeleurovision-winner-netta-called-a-real-cow-by-prime-minister-in-autotranslate-fail-7541925/Google ScholarGoogle Scholar
  67. Sergio Segura, Gordon Fraser, Ana B. Sanchez, and Antonio Ruiz-Cortés. 2016. A Survey on Metamorphic Testing. IEEE Transactions on Software Engineering (TSE) 42 ( 2016 ). Issue 9.Google ScholarGoogle ScholarCross RefCross Ref
  68. Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Improving neural machine translation models with monolingual data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL).Google ScholarGoogle ScholarCross RefCross Ref
  69. Zeyu Sun, Jie M Zhang, Mark Harman, Mike Papadakis, and Lu Zhang. 2020. Automatic Testing and Improvement of Machine Translation. In Proc. of the 42nd International Conference on Software Engineering (ICSE).Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Guanhong Tao, Shiqing Ma, Yingqi Liu, and Xiangyu Zhang. 2018. Atacks Meet Interpretability: Atribute-steered Detection of Adversarial Samples. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS).Google ScholarGoogle Scholar
  71. Wilson L. Taylor. 1953. ”Cloze Procedure” : A New Tool for Measuring Readability. Journalism Bulletin 30, 4 ( 1953 ), 415-433.Google ScholarGoogle Scholar
  72. Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. Deeptest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars. In Proceedings of the 40th International Conference on Software Engineering (ICSE).Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Barak Turovsky. 2016. Ten years of Google Translate. https://blog.google/ products/translate/ten-years-of-google-translate/Google ScholarGoogle Scholar
  74. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, and Illia Kaiser, Lukasz abd Polosukhin. 2017. Atention is All you Need. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS).Google ScholarGoogle Scholar
  75. Jingyi Wang, Guoliang Dong, Jun Sun, Xinyu Wang, and Peixin Zhang. 2019. Adversarial Sample Detection for Deep Neural Network through Model Mutation Testing. In Proceedings of the 41st International Conference on Software Engineering (ICSE).Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Wenyu Wang, Wujie Zheng, Dian Liu, Changrong Zhang, Qinsong Zeng, Yuetang Deng, Wei Yang, Pinjia He, and Tao Xie. 2019. Detecting Failures of Neural Machine Translation in the Absence of Reference Translations. In Proc. of the 49th IEEE/IFIP International Conference on Dependable Systems and Networks (industry track).Google ScholarGoogle ScholarCross RefCross Ref
  77. Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google's Neural Machine Translation System: Bridging the Gap Between Human and Machine Translation. arXiv preprint arXiv:1609.08144 ( 2016 ).Google ScholarGoogle Scholar
  78. Xiaoyuan Xie, Joshua WK Ho, Christian Murphy, Gail Kaiser, Baowen Xu, and Tsong Yueh Chen. 2011. Testing and Validating Machine Learning Classifiers by Metamorphic Testing. Journal of Systems and Software (JSS) 84 ( 2011 ). Issue 4.Google ScholarGoogle Scholar
  79. Xiaofei Xie, Lei Ma, Felix Juefei-Xu, Minhui Xue, Hongxu Chen, Yang Liu, Jianjun Zhao, Bo Li, Jianxiong Yin, and Simon See. 2019. Deephunter: A coverageguided fuzz testing framework for deep neural networks. In ISSTA 2019-Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis.Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Chong Xiong, Charles R. Qi, and Bo Li. 2019. Generating 3D Adversarial Point Clouds. In Proceedings of the 2019 IEEE Conference on Computer Vision and Patern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  81. Weilin Xu, David Evans, and Yanjun Qi. 2018. Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks. In Proceedings of the 25th Annual Network and Distributed System Security Symposium (NDSS).Google ScholarGoogle ScholarCross RefCross Ref
  82. Dawei Yang, Chaowei Xiao, Bo Li, Jia Deng, and Mingyan Liu. 2019. Realistic Adversarial Examples in 3D Meshes. In Proceedings of the 2019 IEEE Conference on Computer Vision and Patern Recognition (CVPR).Google ScholarGoogle Scholar
  83. Fuyuan Zhang, Sankalan Pal Chowdhury, and Maria Christakis. 2019. DeepSearch: Simple and Efective Blackbox Fuzzing of Deep Neural Networks. arXiv preprint arXiv: 1910. 06296 ( 2019 ).Google ScholarGoogle Scholar
  84. Jie Zhang, Junjie Chen, Dan Hao, Yingfei Xiong, Bing Xie, Lu Zhang, and Hong Mei. 2014. Search-Based Inference of Polynomial Metamorphic Relations. In Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering (ASE).Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Jie M. Zhang, Mark Harman, Lei Ma, and Yang Liu. 2019. Machine Learning Testing: Survey, Landscapes and Horizons. IEEE Transactions on Software Engineering ( 2019 ).Google ScholarGoogle Scholar
  86. Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, and Sarfraz Khurshid. 2018. Deeproad: Gan-Based Metamorphic Autonomous Driving System Testing. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE).Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Zhengli Zhao, Dheeru Dua, and Sameer Singh. 2018. Generating natural adversarial examples. In Proceedings of the 6th International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  88. Wujie Zheng, Wenyu Wang, Dian Liu, Changrong Zhang, Qinsong Zeng, Yuetang Deng, Wei Yang, Pinjia He, and Tao Xie. 2018. Testing Untestable Neural Machine Translation: An Industrial Case. arXiv preprint arXiv:1807. 02340 ( 2018 ).Google ScholarGoogle Scholar
  89. Wujie Zheng, Wenyu Wang, Dian Liu, Changrong Zhang, Qinsong Zeng, Yuetang Deng, Wei Yang, Pinjia He, and Tao Xie. 2019. Testing untestable neural machine translation: an industrial case. In Proc. of the 41st International Conference on Software Engineering: Companion Proceedings.Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Zhi Quan Zhou, Shaowen Xiang, and Tsong Yueh Chen. 2016. Metamorphic Testing for Software Quality Assessment: A Study of Search Engines. IEEE Transactions on Software Engineering (TSE) 42 ( 2016 ). Issue 3.Google ScholarGoogle Scholar
  91. Muhua Zhu, Yue Zhang, Wenliang Chen, Min Zhang, and Jingbo Zhu. 2013. Fast and Accurate Shift-Reduce Constituent Parsing. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1 : Long Papers). Association for Computational Linguistics.Google ScholarGoogle Scholar
  92. Chris. Ziegler. 2016. A Google self-driving car caused a crash for the first time. https://www.theverge.com/ 2016 /2/29/11134344/google-self-driving-carcrash-reportGoogle ScholarGoogle Scholar

Index Terms

  1. Machine translation testing via pathological invariance

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ESEC/FSE 2020: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
        November 2020
        1703 pages
        ISBN:9781450370431
        DOI:10.1145/3368089

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 8 November 2020

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate112of543submissions,21%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader