skip to main content
10.1145/3589334.3645695acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

OODREB: Benchmarking State-of-the-Art Methods for Out-Of-Distribution Generalization on Relation Extraction

Published: 13 May 2024 Publication History

Abstract

Relation extraction (RE) methods have achieved striking performance when training and test data are independently and identically distributed (i.i.d). However, in real-world scenarios where RE models are trained to acquire knowledge in the wild, the assumption can hardly be satisfied due to the different and unknown testing distributions. In this paper, we serve as the first effort to study out-of-distribution (OOD) problems in RE by constructing an out-of-distribution relation extraction benchmark (OODREB) and then investigating the abilities of state-of-the-art (SOTA) RE methods on OODREB in both i.i.d. and OOD settings. Our proposed benchmark and analysis reveal new findings and insights: (1) Existing SOTA RE methods struggle to achieve satisfying performance on OODREB in both i.i.d. and OOD settings due to the complex training data and biased model selection method. Rethinking the developing protocols of RE methods is of great urgency. (2) The SOTA RE methods fail to learn causality due to the diverse linguistic expressions of causal information. The failure limits their robustness and generalization ability; (3) Current RE methods based on language models are far away from being deployed in real-world applications. We appeal to future work to take the OOD generalization and causality learning ability into consideration. We make our annotation and code publicly available at https://github.com/Hytn/OODREB.

Supplemental Material

MP4 File
Supplemental video

References

[1]
M. Bates. 1995. Models of Natural Language Understanding. Proceedings of the National Academy of Sciences, Vol. 92, 22 (Oct. 1995), 9977--9982. https://doi.org/10.1073/pnas.92.22.9977
[2]
Iz Beltagy, Kyle Lo, and Arman Cohan. 2019. SciBERT : A Pretrained Language Model for Scientific Text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP ). Association for Computational Linguistics, Hong Kong, China, 3613--3618. https://doi.org/10.18653/v1/D19--1371
[3]
Daniel Borkan, Lucas Dixon, Jeffrey Sorensen, Nithum Thain, and Lucy Vasserman. 2019. Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification. In Companion Proceedings of The 2019 World Wide Web Conference. ACM, San Francisco USA, 491--500. https://doi.org/10.1145/3308560.3317593
[4]
Oana-Maria Camburu, Eleonora Giunchiglia, Jakob Foerster, Thomas Lukasiewicz, and Phil Blunsom. 2020. The Struggles of Feature-Based Explanations : Shapley Values vs. Minimal Sufficient Subsets. arxiv: 2009.11023 [cs]
[5]
Brandon Carter, Jonas Mueller, Siddhartha Jain, and David Gifford. 2019. What Made You Do This? Understanding Black-Box Decisions with Sufficient Input Subsets. In Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research, Vol. 89), Kamalika Chaudhuri and Masashi Sugiyama (Eds.). PMLR, 567--576.
[6]
Haotian Chen, Bingsheng Chen, and Xiangdong Zhou. 2023. Did the Models Understand Documents ? Benchmarking Models for Language Understanding in Document-Level Relation Extraction. arxiv: 2306.11386 [cs]
[7]
Elliot Creager, Joern-Henrik Jacobsen, and Richard Zemel. 2021. Environment Inference for Invariant Learning. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 2189--2200.
[8]
Peng Cui and Susan Athey. 2022. Stable Learning Establishes Some Common Ground between Causal Inference and Machine Learning. Nature Machine Intelligence, Vol. 4, 2 (Feb. 2022), 110--115. https://doi.org/10.1038/s42256-022-00445-z
[9]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina N. Toutanova. 2018. BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics : Human Language Technologies, Volume 1 (Long and Short Papers ). 4171--4186.
[10]
George R Doddington, Alexis Mitchell, Mark Przybocki, Lance Ramshaw, Stephanie Strassel, and Ralph Weischedel. 2004. The Automatic Content Extraction (ACE ) Program textendashTasks, Data, and Evaluation. In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC '04).
[11]
Anthony Fader, Luke Zettlemoyer, and Oren Etzioni. 2014. Open Question Answering over Curated and Extracted Knowledge Bases. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York New York USA, 1156--1165. https://doi.org/10.1145/2623330.2623677
[12]
Vishal Gupta and Gurpreet Singh Lehal. 2010. A Survey of Text Summarization Extractive Techniques. Journal of Emerging Technologies in Web Intelligence, Vol. 2, 3 (Aug. 2010), 258--268. https://doi.org/10.4304/jetwi.2.3.258--268
[13]
Xu Han, Hao Zhu, Pengfei Yu, Ziyun Wang, Yuan Yao, Zhiyuan Liu, and Maosong Sun. 2018. FewRel : A Large-Scale Supervised Few-Shot Relation Classification Dataset with State-of-the-Art Evaluation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 4803--4809. https://doi.org/10.18653/v1/D18--1514
[14]
Iris Hendrickx, Su Nam Kim, Zornitsa Kozareva, Preslav Nakov, Diarmuid Ó Séaghdha, Sebastian Padó, Marco Pennacchiotti, Lorenza Romano, and Stan Szpakowicz. 2010. SemEval-2010 Task 8: Multi-way Classification of Semantic Relations between Pairs of Nominals. In Proceedings of the 5th International Workshop on Semantic Evaluation. 33--38.
[15]
Hyunjik Kim and Andriy Mnih. 2018. Disentangling by Factorising. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 80), Jennifer Dy and Andreas Krause (Eds.). PMLR, 2649--2658.
[16]
Kun Kuang, Ruoxuan Xiong, Peng Cui, Susan Athey, and Bo Li. 2020. Stable Prediction with Model Misspecification and Agnostic Distribution Shift. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 04 (April 2020), 4485--4492. https://doi.org/10.1609/aaai.v34i04.5876
[17]
Haoliang Li, Sinno Jialin Pan, Shiqi Wang, and Alex C. Kot. 2018. Domain Generalization with Adversarial Feature Learning. In 2018 IEEE /CVF Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, UT, 5400--5409. https://doi.org/10.1109/CVPR.2018.00566
[18]
Jiashuo Liu, Zheyuan Hu, Peng Cui, Bo Li, and Zheyan Shen. 2021. Heterogeneous Risk Minimization. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 6804--6814.
[19]
I. Loshchilov and F. Hutter. 2019. Decoupled Weight Decay Regularization. In ICLR.
[20]
Yi Luan, Luheng He, Mari Ostendorf, and Hannaneh Hajishirzi. 2018. Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 3219--3232. https://doi.org/10.18653/v1/D18--1360
[21]
Jianmo Ni, Jiacheng Li, and Julian McAuley. 2019. Justifying Recommendations Using Distantly-Labeled Reviews and Fine-Grained Aspects. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP ). Association for Computational Linguistics, Hong Kong, China, 188--197. https://doi.org/10.18653/v1/D19--1018
[22]
OpenAI. 2023. GPT-4 Technical Report. arxiv: 2303.08774 [cs]
[23]
Ramakanth Pasunuru, Mengwen Liu, Mohit Bansal, Sujith Ravi, and Markus Dreyer. 2021. Efficiently Summarizing Text and Graph Encodings of Multi-Document Clusters. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics : Human Language Technologies. Association for Computational Linguistics, Online, 4768--4779. https://doi.org/10.18653/v1/2021.naacl-main.380
[24]
Hao Peng, Tianyu Gao, Xu Han, Yankai Lin, Peng Li, Zhiyuan Liu, Maosong Sun, and Jie Zhou. 2020. Learning from Context or Names ? An Empirical Study on Neural Relation Extraction. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP ). Association for Computational Linguistics, Online, 3661--3672. https://doi.org/10.18653/v1/2020.emnlp-main.298
[25]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Anchors: High-Precision Model-Agnostic Explanations. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32, 1 (April 2018). https://doi.org/10.1609/aaai.v32i1.11491
[26]
Sebastian Riedel, Limin Yao, Andrew McCallum, and Benjamin M. Marlin. 2013. Relation Extraction with Matrix Factorization and Universal Schemas. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics : Human Language Technologies. Association for Computational Linguistics, Atlanta, Georgia, 74--84.
[27]
Dan Roth and Wen-tau Yih. 05 6 - 05 7 2004. A Linear Programming Formulation for Global Inference in Natural Language Tasks. In Proceedings of the Eighth Conference on Computational Natural Language Learning (CoNLL-2004 ) at HLT-NAACL 2004. Association for Computational Linguistics, Boston, Massachusetts, USA, 1--8.
[28]
Zheyan Shen, Peng Cui, Tong Zhang, and Kun Kunag. 2020. Stable Learning via Sample Reweighting. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 04 (April 2020), 5692--5699. https://doi.org/10.1609/aaai.v34i04.6024
[29]
Zheyan Shen, Jiashuo Liu, Yue He, Xingxuan Zhang, Renzhe Xu, Han Yu, and Peng Cui. 2021. Towards Out-Of-Distribution Generalization : A Survey. arxiv: 2108.13624 [cs]
[30]
Qingyu Tan, Ruidan He, Lidong Bing, and Hwee Tou Ng. 2022. Document-Level Relation Extraction with Adaptive Focal Loss and Knowledge Distillation. In Findings of the Association for Computational Linguistics : ACL 2022. Association for Computational Linguistics, Dublin, Ireland, 1672--1681. https://doi.org/10.18653/v1/2022.findings-acl.132
[31]
Xiaoyan Wang, Pavan Kapanipathi, Ryan Musa, Mo Yu, Kartik Talamadupula, Ibrahim Abdelaziz, Maria Chang, Achille Fokoue, Bassem Makni, Nicholas Mattei, and Michael Witbrock. 2019. Improving Natural Language Inference Using External Knowledge in the Science Questions Domain. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33 (July 2019), 7208--7215. https://doi.org/10.1609/aaai.v33i01.33017208
[32]
Yiwei Wang, Muhao Chen, Wenxuan Zhou, Yujun Cai, Yuxuan Liang, Dayiheng Liu, Baosong Yang, Juncheng Liu, and Bryan Hooi. 2022. Should We Rely on Entity Mentions for Relation Extraction ? Debiasing Relation Extraction with Counterfactual Analysis. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics : Human Language Technologies. Association for Computational Linguistics, Seattle, United States, 3071--3081. https://doi.org/10.18653/v1/2022.naacl-main.224
[33]
Yiqing Xie, Jiaming Shen, Sha Li, Yuning Mao, and Jiawei Han. 2022. Eider: Empowering Document-level Relation Extraction with Efficient Evidence Extraction and Inference-stage Fusion. In Findings of the Association for Computational Linguistics : ACL 2022. Association for Computational Linguistics, Dublin, Ireland, 257--268. https://doi.org/10.18653/v1/2022.findings-acl.23
[34]
Yuan Yao, Jiaju Du, Yankai Lin, Peng Li, Zhiyuan Liu, Jie Zhou, and Maosong Sun. 2021. CodRED : A Cross-Document Relation Extraction Dataset for Acquiring Knowledge in the Wild. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 4452--4472. https://doi.org/10.18653/v1/2021.emnlp-main.366
[35]
Yuan Yao, Deming Ye, Peng Li, Xu Han, Yankai Lin, Zhenghao Liu, Zhiyuan Liu, Lixin Huang, Jie Zhou, and Maosong Sun. 2019. DocRED : A Large-Scale Document-Level Relation Extraction Dataset. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 764--777. https://doi.org/10.18653/v1/P19--1074
[36]
Jinsung Yoon, James Jordon, and Mihaela van der Schaar. 2019. INVASE : Instance-wise Variable Selection Using Neural Networks. In International Conference on Learning Representations.
[37]
Xiangrong Zeng, Daojian Zeng, Shizhu He, Kang Liu, and Jun Zhao. 2018. Extracting Relational Facts by an End-to-End Neural Model with Copy Mechanism. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers ). Association for Computational Linguistics, Melbourne, Australia, 506--514. https://doi.org/10.18653/v1/P18--1047
[38]
Dongxu Zhang and Dong Wang. 2015. Relation Classification via Recurrent Neural Network. arXiv preprint arXiv:1508.01006 (2015). arxiv: 1508.01006
[39]
Ningyu Zhang, Xiang Chen, Xin Xie, Shumin Deng, Chuanqi Tan, Mosha Chen, Fei Huang, Luo Si, and Huajun Chen. 2021. Document-Level Relation Extraction as Semantic Segmentation. In IJCAI. https://doi.org/10.24963/ijcai.2021/551
[40]
Yuhao Zhang, Victor Zhong, Danqi Chen, Gabor Angeli, and Christopher D. Manning. 2017. Position-Aware Attention and Supervised Data Improve Slot Filling. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Copenhagen, Denmark, 35--45. https://doi.org/10.18653/v1/D17--1004
[41]
Wenxuan Zhou, Kevin Huang, Tengyu Ma, and Jing Huang. 2020. Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling. In AAAI. 14612--14620.

Index Terms

  1. OODREB: Benchmarking State-of-the-Art Methods for Out-Of-Distribution Generalization on Relation Extraction

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WWW '24: Proceedings of the ACM Web Conference 2024
    May 2024
    4826 pages
    ISBN:9798400701719
    DOI:10.1145/3589334
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 May 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. benchmark
    2. out-of-distribution generalization
    3. relation extraction

    Qualifiers

    • Research-article

    Conference

    WWW '24
    Sponsor:
    WWW '24: The ACM Web Conference 2024
    May 13 - 17, 2024
    Singapore, Singapore

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 129
      Total Downloads
    • Downloads (Last 12 months)129
    • Downloads (Last 6 weeks)11
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media