skip to main content
10.1145/3611643.3616258acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article
Artifacts Available / v1.1

Testing Coreference Resolution Systems without Labeled Test Sets

Authors Info & Claims
Published:30 November 2023Publication History

ABSTRACT

Coreference resolution (CR) is a task to resolve different expressions (e.g., named entities, pronouns) that refer to the same real-world en- tity/event. It is a core natural language processing (NLP) component that underlies and empowers major downstream NLP applications such as machine translation, chatbots, and question-answering. De- spite its broad impact, the problem of testing CR systems has rarely been studied. A major difficulty is the shortage of a labeled dataset for testing. While it is possible to feed arbitrary sentences as test inputs to a CR system, a test oracle that captures their expected test outputs (coreference relations) is hard to define automatically. To address the challenge, we propose Crest, an automated testing methodology for CR systems. Crest uses constituency and depen- dency relations to construct pairs of test inputs subject to the same coreference. These relations can be leveraged to define the meta- morphic relation for metamorphic testing. We compare Crest with five state-of-the-art test generation baselines on two popular CR systems, and apply them to generate tests from 1,000 sentences randomly sampled from CoNLL-2012, a popular dataset for corefer- ence resolution. Experimental results show that Crest outperforms baselines significantly. The issues reported by Crest are all true positives (i.e., 100% precision), compared with 63% to 75% achieved by the baselines.

References

  1. Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. code2vec: Learning distributed representations of code. Proceedings of the ACM on Programming Languages, 3, POPL (2019), 1–29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Anonymous. 2023. CREST. https://anonymous.4open.science/r/Crest_FSE2023/ Google ScholarGoogle Scholar
  3. Rahul Aralikatte, Heather Lent, Ana Valeria Gonzalez, Daniel Herschcovich, Chen Qiu, Anders Sandholm, Michael Ringaard, and Anders Søgaard. 2019. Rewarding Coreference Resolvers for Being Consistent with World Knowledge. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China. 1229–1235. https://doi.org/10.18653/v1/D19-1118 Google ScholarGoogle ScholarCross RefCross Ref
  4. Saliha Azzam, Kevin Humphreys, and Robert Gaizauskas. 1999. Using Coreference Chains for Text Summarization. In Proceedings of the Workshop on Coreference and Its Applications (CorefApp ’99). Association for Computational Linguistics, College Park, Maryland. 77–84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Eric Bengtson and Dan Roth. 2008. Understanding the Value of Features for Coreference Resolution. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Honolulu, Hawaii. 294–303. https://aclanthology.org/D08-1031 Google ScholarGoogle ScholarCross RefCross Ref
  6. Jialun Cao, Meiziniu Li, Yeting Li, Ming Wen, Shing-Chi Cheung, and Haiming Chen. 2022. SemMT: a semantic-based testing approach for machine translation systems. ACM Transactions on Software Engineering and Methodology (TOSEM), 31, 2 (2022), 1–36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Haixia Chai, Wei Zhao, Steffen Eger, and Michael Strube. 2020. Evaluation of Coreference Resolution Systems Under Adversarial Attacks. In Proceedings of the First Workshop on Computational Approaches to Discourse. Association for Computational Linguistics, Online. 154–159. https://doi.org/10.18653/v1/2020.codi-1.16 Google ScholarGoogle ScholarCross RefCross Ref
  8. Songqiang Chen, Shuo Jin, and Xiaoyuan Xie. 2021. Validation on Machine Reading Comprehension Software without Annotated Labels: A Property-Based Method. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2021). Association for Computing Machinery, New York, NY, USA. 590–602. isbn:9781450385626 https://doi.org/10.1145/3468264.3468569 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Tsong Yueh Chen, S. C. Cheung, and Siu-Ming Yiu. 2020. Metamorphic Testing: A New Approach for Generating Next Test Cases. In Technical Report HKUST-CS98-01. CoRR, abs/2002.12543, 11. arXiv:2002.12543. arxiv:2002.12543 Google ScholarGoogle Scholar
  10. Yu-Hsin Chen and Jinho D. Choi. 2016. Character Identification on Multiparty Conversation: Identifying Mentions of Characters in TV Shows. In Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Association for Computational Linguistics, Los Angeles. 90–100. https://doi.org/10.18653/v1/W16-3612 Google ScholarGoogle ScholarCross RefCross Ref
  11. Kevin Clark and Christopher D. Manning. 2015. Entity-Centric Coreference Resolution with Model Stacking. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Beijing, China. 1405–1415. https://doi.org/10.3115/v1/P15-1136 Google ScholarGoogle ScholarCross RefCross Ref
  12. Kevin Clark and Christopher D. Manning. 2016. Deep Reinforcement Learning for Mention-Ranking Coreference Models. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Austin, Texas. 2256–2262. https://doi.org/10.18653/v1/D16-1245 Google ScholarGoogle ScholarCross RefCross Ref
  13. Kevin Clark and Christopher D. Manning. 2016. Improving Coreference Resolution by Learning Entity-Level Distributed Representations. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany. 643–653. Google ScholarGoogle Scholar
  14. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT. Association for Computational Linguistics, Online. 4171–4186. Google ScholarGoogle Scholar
  15. Greg Durrett and Dan Klein. 2013. Easy Victories and Uphill Battles in Coreference Resolution. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Seattle, Washington, USA. 1971–1982. https://aclanthology.org/D13-1203 Google ScholarGoogle Scholar
  16. Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Dejing Dou. 2018. HotFlip: White-Box Adversarial Examples for Text Classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Melbourne, Australia. 31–36. https://doi.org/10.18653/v1/P18-2006 Google ScholarGoogle ScholarCross RefCross Ref
  17. Steffen Eger, Gözde Gül Şahin, Andreas Rücklé, Ji-Ung Lee, Claudia Schulz, Mohsen Mesgar, Krishnkant Swarnkar, Edwin Simpson, and Iryna Gurevych. 2019. Text Processing Like Humans Do: Visually Attacking and Shielding NLP Systems. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota. 1634–1647. https://doi.org/10.18653/v1/N19-1165 Google ScholarGoogle ScholarCross RefCross Ref
  18. Pradheep Elango. 2005. Coreference resolution: A survey. University of Wisconsin, Madison, WI, 1, 12 (2005), 12. Google ScholarGoogle Scholar
  19. Max Glockner, Vered Shwartz, and Yoav Goldberg. 2018. Breaking NLI Systems with Sentences that Require Simple Lexical Inferences. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Melbourne, Australia. 650–655. https://doi.org/10.18653/v1/P18-2103 Google ScholarGoogle ScholarCross RefCross Ref
  20. Shashij Gupta, Pinjia He, Clara Meister, and Zhendong Su. 2020. Machine Translation Testing via Pathological Invariance. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2020). Association for Computing Machinery, New York, NY, USA. 863–875. isbn:9781450370431 https://doi.org/10.1145/3368089.3409756 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Aria Haghighi and Dan Klein. 2010. Coreference Resolution in a Modular, Entity-Centered Model. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, USA. 385–393. Google ScholarGoogle Scholar
  22. Pinjia He, Clara Meister, and Zhendong Su. 2020. Structure-Invariant Testing for Machine Translation. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (ICSE ’20). Association for Computing Machinery, New York, NY, USA. 961–973. isbn:9781450371216 https://doi.org/10.1145/3377811.3380339 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Pinjia He, Clara Meister, and Zhendong Su. 2021. Testing Machine Translation via Referential Transparency. In 43rd IEEE/ACM International Conference on Software Engineering, ICSE 2021, Madrid, Spain, 22-30 May 2021. IEEE, Madrid, Spain. 410–422. https://doi.org/10.1109/ICSE43902.2021.00047 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Xuanli He, Lingjuan Lyu, Lichao Sun, and Qiongkai Xu. 2021. Model Extraction and Adversarial Transferability, Your BERT is Vulnerable!. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online. 2006–2012. https://doi.org/10.18653/v1/2021.naacl-main.161 Google ScholarGoogle ScholarCross RefCross Ref
  25. Lynette Hirschman and Nancy Chinchor. 1998. Appendix F: MUC-7 Coreference Task Definition (version 3.0). In Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, Virginia, April 29 - May 1, 1998. Proceedings of a Conference Held in Fairfax, Virginia, Fairfax, Virginia. 17. https://aclanthology.org/M98-1029 Google ScholarGoogle Scholar
  26. J Hobbs. 1986. Resolving Pronoun References. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. 339–352. isbn:0934613117 Google ScholarGoogle Scholar
  27. Matthew Honnibal, Ines Montani, Sofie Van Landeghem, and Adriane Boyd. 2020. spaCy: Industrial-strength Natural Language Processing in Python. https://doi.org/10.5281/zenodo.1212303 Google ScholarGoogle ScholarCross RefCross Ref
  28. Jen-tse Huang, Jianping Zhang, Wenxuan Wang, Pinjia He, Yuxin Su, and Michael R. Lyu. 2022. AEON: A Method for Automatic Evaluation of NLP Test Cases. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2022). Association for Computing Machinery, New York, NY, USA. 202–214. isbn:9781450393799 https://doi.org/10.1145/3533767.3534394 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing source code using a neural attention model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany. 2073–2083. Google ScholarGoogle ScholarCross RefCross Ref
  30. Mohit Iyyer, John Wieting, Kevin Gimpel, and Luke Zettlemoyer. 2018. Adversarial Example Generation with Syntactically Controlled Paraphrase Networks. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, Melbourne, Australia. 1875–1885. https://doi.org/10.18653/v1/N18-1170 Google ScholarGoogle ScholarCross RefCross Ref
  31. Heng Ji and Joel Nothman. 2016. Overview of TAC-KBP2016 Tri-lingual EDL and Its Impact on End-to-End KBP. In Proceedings of the 2016 Text Analysis Conference, TAC 2016, Gaithersburg, Maryland, USA, November 14-15, 2016. NIST, USA. 15. Google ScholarGoogle Scholar
  32. Mandar Joshi, Omer Levy, Luke Zettlemoyer, and Daniel Weld. 2019. BERT for Coreference Resolution: Baselines and Analysis. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China. 5803–5808. https://doi.org/10.18653/v1/D19-1588 Google ScholarGoogle ScholarCross RefCross Ref
  33. Satwik Kottur, José M. F. Moura, Devi Parikh, Dhruv Batra, and Marcus Rohrbach. 2018. Visual Coreference Resolution in Visual Dialog Using Neural Module Networks. In Computer Vision – ECCV 2018, Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer International Publishing, Cham. 160–178. isbn:978-3-030-01267-0 Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Jonathan K. Kummerfeld and Dan Klein. 2013. Error-Driven Analysis of Challenges in Coreference Resolution. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Seattle, Washington, USA. 265–277. Google ScholarGoogle Scholar
  35. Heeyoung Lee, Yves Peirsman, Angel Chang, Nathanael Chambers, Mihai Surdeanu, and Dan Jurafsky. 2011. Stanford’s Multi-Pass Sieve Coreference Resolution System at the CoNLL-2011 Shared Task. In Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task. Association for Computational Linguistics, Portland, Oregon, USA. 28–34. https://aclanthology.org/W11-1902 Google ScholarGoogle Scholar
  36. Kenton Lee, Luheng He, Mike Lewis, and Luke Zettlemoyer. 2017. End-to-end Neural Coreference Resolution. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Copenhagen, Denmark. 188–197. https://doi.org/10.18653/v1/D17-1018 Google ScholarGoogle ScholarCross RefCross Ref
  37. Kenton Lee, Luheng He, and Luke Zettlemoyer. 2018. Higher-Order Coreference Resolution with Coarse-to-Fine Inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Association for Computational Linguistics, New Orleans, Louisiana. 687–692. https://doi.org/10.18653/v1/N18-2108 Google ScholarGoogle ScholarCross RefCross Ref
  38. Zhengyuan Liu, Ke Shi, and Nancy F. Chen. 2021. Coreference-Aware Dialogue Summarization. In Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue, SIGdial 2021, Singapore and Online, July 29-31, 2021, Haizhou Li, Gina-Anne Levow, Zhou Yu, Chitralekha Gupta, Berrak Sisman, Siqi Cai, David Vandyke, Nina Dethlefs, Yan Wu, and Junyi Jessy Li (Eds.). Association for Computational Linguistics, Singapore and Online. 509–519. https://aclanthology.org/2021.sigdial-1.53 Google ScholarGoogle ScholarCross RefCross Ref
  39. Jing Lu and Vincent Ng. 2020. Conundrums in Entity Coreference Resolution: Making Sense of the State of the Art. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online. 6620–6631. https://doi.org/10.18653/v1/2020.emnlp-main.536 Google ScholarGoogle ScholarCross RefCross Ref
  40. Christopher Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, Baltimore, Maryland. 55–60. https://doi.org/10.3115/v1/P14-5010 Google ScholarGoogle ScholarCross RefCross Ref
  41. Silverio Mart’inez-Fern’andez, Justus Bogner, Xavier Franch, Marc Oriol, Julien Siebert, Adam Trendowicz, Anna Maria Vollmer, and Stefan Wagner. 2022. Software Engineering for AI-Based Systems: A Survey. ACM Transactions on Software Engineering and Methodology (TOSEM), 31 (2022), 1 – 59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. George A. Miller. 1995. WordNet: A Lexical Database for English. Commun. ACM, 38, 11 (1995), nov, 39–41. issn:0001-0782 https://doi.org/10.1145/219717.219748 Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. John Morris, Eli Lifland, Jin Yong Yoo, Jake Grigsby, Di Jin, and Yanjun Qi. 2020. TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online. 119–126. Google ScholarGoogle ScholarCross RefCross Ref
  44. Thomas S. Morton. 1999. Using Coreference for Question Answering. In Proceedings of the Workshop on Coreference and Its Applications (CorefApp ’99). Association for Computational Linguistics, USA. 85–89. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Vincent Ng. 2017. Machine Learning for Entity Coreference Resolution: A Retrospective Look at Two Decades of Research. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI’17). AAAI Press, San Francisco, California, USA. 4877–4884. Google ScholarGoogle ScholarCross RefCross Ref
  46. Vincent Ng and Claire Cardie. 2002. Improving Machine Learning Approaches to Coreference Resolution. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Philadelphia, Pennsylvania, USA. 104–111. https://doi.org/10.3115/1073083.1073102 Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Yulei Niu, Hanwang Zhang, Manli Zhang, Jianhong Zhang, Zhiwu Lu, and Ji-Rong Wen. 2019. Recursive Visual Attention in Visual Dialog. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019. Computer Vision Foundation / IEEE, Long Beach, CA, USA. 6679–6688. https://doi.org/10.1109/CVPR.2019.00684 Google ScholarGoogle ScholarCross RefCross Ref
  48. Daniel Pesu, Zhi Quan Zhou, Jingfeng Zhen, and Dave Towey. 2018. A Monte Carlo Method for Metamorphic Testing of Machine Translation Services. In 3rd IEEE/ACM International Workshop on Metamorphic Testing MET. ACM, Gothenburg, Sweden. 38–45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Michael Pradel and Koushik Sen. 2018. Deepbugs: A learning approach to name-based bug detection. Proceedings of the ACM on Programming Languages, 2, OOPSLA (2018), 1–25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Sameer Pradhan, Alessandro Moschitti, Nianwen Xue, Hwee Tou Ng, Anders Björkelund, Olga Uryupina, Yuchen Zhang, and Zhi Zhong. 2013. Towards Robust Linguistic Analysis using OntoNotes. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning. Association for Computational Linguistics, Seattle, Washington, USA. 143–152. Google ScholarGoogle Scholar
  51. Sameer Pradhan, Alessandro Moschitti, Nianwen Xue, Olga Uryupina, and Yuchen Zhang. 2012. CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes. In Joint Conference on EMNLP and CoNLL - Shared Task (CoNLL ’12). Association for Computational Linguistics, USA. 1–40. Google ScholarGoogle Scholar
  52. Karthik Raghunathan, Heeyoung Lee, Sudarshan Rangarajan, Nathanael Chambers, Mihai Surdeanu, Dan Jurafsky, and Christopher Manning. 2010. A Multi-Pass Sieve for Coreference Resolution. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Cambridge, MA. 492–501. https://aclanthology.org/D10-1048 Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Marta Recasens, Marie-Catherine de Marneffe, and Christopher Potts. 2013. The Life and Death of Discourse Entities: Identifying Singleton Mentions. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Atlanta, Georgia. 627–633. https://aclanthology.org/N13-1071 Google ScholarGoogle Scholar
  54. M. Recasens and E. Hovy. 2011. Blanc: Implementing the Rand Index for Coreference Evaluation. Nat. Lang. Eng., 17, 4 (2011), oct, 485–510. issn:1351-3249 https://doi.org/10.1017/S135132491000029X Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019. Association for Computational Linguistics, Online. 3980–3990. Google ScholarGoogle ScholarCross RefCross Ref
  56. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Semantically Equivalent Adversarial Rules for Debugging NLP models. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia. 856–865. https://doi.org/10.18653/v1/P18-1079 Google ScholarGoogle ScholarCross RefCross Ref
  57. Marco Tulio Ribeiro, Tongshuang Wu, Carlos Guestrin, and Sameer Singh. 2020. Beyond Accuracy: Behavioral Testing of NLP Models with CheckList. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online. 4902–4912. https://doi.org/10.18653/v1/2020.acl-main.442 Google ScholarGoogle ScholarCross RefCross Ref
  58. Walter Simoncini and Gerasimos Spanakis. 2021. SeqAttack: On adversarial attacks for named entity recognition. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online. 308–318. Google ScholarGoogle Scholar
  59. Ezekiel Soremekun, Sakshi Udeshi, and Sudipta Chattopadhyay. 2022. Astraea: Grammar-Based Fairness Testing. IEEE Transactions on Software Engineering, 48, 12 (2022), 5188–5211. https://doi.org/10.1109/TSE.2022.3141758 Google ScholarGoogle ScholarCross RefCross Ref
  60. Dario Stojanovski and Alexander Fraser. 2018. Coreference and Coherence in Neural Machine Translation: A Study Using Oracle Experiments. In Proceedings of the Third Conference on Machine Translation: Research Papers. Association for Computational Linguistics, Brussels, Belgium. 49–60. https://doi.org/10.18653/v1/W18-6306 Google ScholarGoogle ScholarCross RefCross Ref
  61. Zeyu Sun, Jie M. Zhang, Mark Harman, Mike Papadakis, and Lu Zhang. 2020. Automatic Testing and Improvement of Machine Translation. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (ICSE ’20). Association for Computing Machinery, New York, NY, USA. 974–985. isbn:9781450371216 https://doi.org/10.1145/3377811.3380420 Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Zeyu Sun, Jie M. Zhang, Yingfei Xiong, Mark Harman, Mike Papadakis, and Lu Zhang. 2022. Improving Machine Translation Systems via Isotopic Replacement. In Proceedings of the 44th International Conference on Software Engineering (ICSE ’22). Association for Computing Machinery, New York, NY, USA. 1181–1192. isbn:9781450392211 https://doi.org/10.1145/3510003.3510206 Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. 2014. Intriguing properties of neural networks. In International Conference on Learning Representations. International Conference on Learning Representations, Banff, Canada. 10. Google ScholarGoogle Scholar
  64. Eric Wallace, Shi Feng, Nikhil Kandpal, Matt Gardner, and Sameer Singh. 2019. Universal Adversarial Triggers for Attacking and Analyzing NLP. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China. 2153–2162. https://doi.org/10.18653/v1/D19-1221 Google ScholarGoogle ScholarCross RefCross Ref
  65. Wenqi Wang, Run Wang, Lina Wang, Zhibo Wang, and Aoshuang Ye. 2023. Towards a Robust Deep Neural Network Against Adversarial Texts: A Survey. IEEE Transactions on Knowledge and Data Engineering, 35, 3 (2023), 3159–3179. https://doi.org/10.1109/TKDE.2021.3117608 Google ScholarGoogle ScholarCross RefCross Ref
  66. Xiao Wang, Qin Liu, Tao Gui, and Qi Zhang. 2021. TextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online. 347–355. https://doi.org/10.18653/v1/2021.acl-demo.41 Google ScholarGoogle ScholarCross RefCross Ref
  67. Xiao Wang, Qin Liu, Tao Gui, Qi Zhang, Yicheng Zou, Xin Zhou, Jiacheng Ye, Yongxin Zhang, Rui Zheng, Zexiong Pang, Qinzhuo Wu, Zhengyan Li, Chong Zhang, Ruotian Ma, Zichu Fei, Ruijian Cai, Jun Zhao, Xingwu Hu, Zhiheng Yan, Yiding Tan, Yuan Hu, Qiyuan Bian, Zhihua Liu, Shan Qin, Bolin Zhu, Xiaoyu Xing, Jinlan Fu, Yue Zhang, Minlong Peng, Xiaoqing Zheng, Yaqian Zhou, Zhongyu Wei, Xipeng Qiu, and Xuanjing Huang. 2021. TextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online. 347–355. https://doi.org/10.18653/v1/2021.acl-demo.41 Google ScholarGoogle ScholarCross RefCross Ref
  68. Han Xu, Yao Ma, Hao-Chen Liu, Debayan Deb, Hui Liu, Ji-Liang Tang, and Anil K Jain. 2020. Adversarial attacks and defenses in images, graphs and text: A review. International Journal of Automation and Computing, 17, 2 (2020), 151–178. Google ScholarGoogle ScholarCross RefCross Ref
  69. Xintong Yu, Hongming Zhang, Yangqiu Song, Changshui Zhang, Kun Xu, and Dong Yu. 2021. Exophoric Pronoun Resolution in Dialogues with Topic Regularization. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic. 3832–3845. https://doi.org/10.18653/v1/2021.emnlp-main.311 Google ScholarGoogle ScholarCross RefCross Ref
  70. Guoyang Zeng, Fanchao Qi, Qianrui Zhou, Tingji Zhang, Zixian Ma, Bairu Hou, Yuan Zang, Zhiyuan Liu, and Maosong Sun. 2021. OpenAttack: An Open-source Textual Adversarial Attack Toolkit. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online. 363–371. https://doi.org/10.18653/v1/2021.acl-demo.43 Google ScholarGoogle ScholarCross RefCross Ref
  71. Hongming Zhang, Xinran Zhao, and Yangqiu Song. 2021. A Brief Survey and Comparative Study of Recent Development of Pronoun Coreference Resolution in English. In Proceedings of the Fourth Workshop on Computational Models of Reference, Anaphora and Coreference. Association for Computational Linguistics, Punta Cana, Dominican Republic. 1–11. https://doi.org/10.18653/v1/2021.crac-1.1 Google ScholarGoogle ScholarCross RefCross Ref
  72. Huangzhao Zhang, Hao Zhou, Ning Miao, and Lei Li. 2019. Generating Fluent Adversarial Examples for Natural Languages. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, Anna Korhonen, David R. Traum, and Lluís Màrquez (Eds.). Association for Computational Linguistics, Florence, Italy. 5564–5569. https://doi.org/10.18653/v1/p19-1559 Google ScholarGoogle ScholarCross RefCross Ref
  73. Wei Emma Zhang, Quan Z. Sheng, Ahoud Alhazmi, and Chenliang Li. 2020. Adversarial Attacks on Deep-Learning Models in Natural Language Processing: A Survey. ACM Trans. Intell. Syst. Technol., 11, 3 (2020), Article 24, apr, 41 pages. issn:2157-6904 https://doi.org/10.1145/3374217 Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Zhi Quan Zhou and Liqun Sun. 2018. Metamorphic Testing for Machine Translations: MT4MT. In Proceedings of the 25th Australasian Software Engineering Conference (ASWEC). IEEE Computer Society, Adelaide, SA, Australia. 96–100. Google ScholarGoogle Scholar
  75. Enwei Zhu and Jinpeng Li. 2022. Boundary Smoothing for Named Entity Recognition. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Dublin, Ireland. 7096–7108. https://doi.org/10.18653/v1/2022.acl-long.490 Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Testing Coreference Resolution Systems without Labeled Test Sets

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering
          November 2023
          2215 pages
          ISBN:9798400703270
          DOI:10.1145/3611643

          Copyright © 2023 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 30 November 2023

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate112of543submissions,21%
        • Article Metrics

          • Downloads (Last 12 months)106
          • Downloads (Last 6 weeks)20

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader