research-article

OODREB: Benchmarking State-of-the-Art Methods for Out-Of-Distribution Generalization on Relation Extraction

Authors:

Bingsheng Chen,

Xiangdong ZhouAuthors Info & Claims

WWW '24: Proceedings of the ACM Web Conference 2024

Pages 2294 - 2303

https://doi.org/10.1145/3589334.3645695

Published: 13 May 2024 Publication History

Abstract

Relation extraction (RE) methods have achieved striking performance when training and test data are independently and identically distributed (i.i.d). However, in real-world scenarios where RE models are trained to acquire knowledge in the wild, the assumption can hardly be satisfied due to the different and unknown testing distributions. In this paper, we serve as the first effort to study out-of-distribution (OOD) problems in RE by constructing an out-of-distribution relation extraction benchmark (OODREB) and then investigating the abilities of state-of-the-art (SOTA) RE methods on OODREB in both i.i.d. and OOD settings. Our proposed benchmark and analysis reveal new findings and insights: (1) Existing SOTA RE methods struggle to achieve satisfying performance on OODREB in both i.i.d. and OOD settings due to the complex training data and biased model selection method. Rethinking the developing protocols of RE methods is of great urgency. (2) The SOTA RE methods fail to learn causality due to the diverse linguistic expressions of causal information. The failure limits their robustness and generalization ability; (3) Current RE methods based on language models are far away from being deployed in real-world applications. We appeal to future work to take the OOD generalization and causality learning ability into consideration. We make our annotation and code publicly available at https://github.com/Hytn/OODREB.

Supplemental Material

MP4 File

Supplemental video

Download
76.07 MB

References

[1]

M. Bates. 1995. Models of Natural Language Understanding. Proceedings of the National Academy of Sciences, Vol. 92, 22 (Oct. 1995), 9977--9982. https://doi.org/10.1073/pnas.92.22.9977

[2]

Iz Beltagy, Kyle Lo, and Arman Cohan. 2019. SciBERT : A Pretrained Language Model for Scientific Text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP ). Association for Computational Linguistics, Hong Kong, China, 3613--3618. https://doi.org/10.18653/v1/D19--1371

[3]

Daniel Borkan, Lucas Dixon, Jeffrey Sorensen, Nithum Thain, and Lucy Vasserman. 2019. Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification. In Companion Proceedings of The 2019 World Wide Web Conference. ACM, San Francisco USA, 491--500. https://doi.org/10.1145/3308560.3317593

Digital Library

[4]

Oana-Maria Camburu, Eleonora Giunchiglia, Jakob Foerster, Thomas Lukasiewicz, and Phil Blunsom. 2020. The Struggles of Feature-Based Explanations : Shapley Values vs. Minimal Sufficient Subsets. arxiv: 2009.11023 [cs]

[5]

Brandon Carter, Jonas Mueller, Siddhartha Jain, and David Gifford. 2019. What Made You Do This? Understanding Black-Box Decisions with Sufficient Input Subsets. In Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research, Vol. 89), Kamalika Chaudhuri and Masashi Sugiyama (Eds.). PMLR, 567--576.

[6]

Haotian Chen, Bingsheng Chen, and Xiangdong Zhou. 2023. Did the Models Understand Documents ? Benchmarking Models for Language Understanding in Document-Level Relation Extraction. arxiv: 2306.11386 [cs]

[7]

Elliot Creager, Joern-Henrik Jacobsen, and Richard Zemel. 2021. Environment Inference for Invariant Learning. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 2189--2200.

[8]

Peng Cui and Susan Athey. 2022. Stable Learning Establishes Some Common Ground between Causal Inference and Machine Learning. Nature Machine Intelligence, Vol. 4, 2 (Feb. 2022), 110--115. https://doi.org/10.1038/s42256-022-00445-z

[9]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina N. Toutanova. 2018. BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics : Human Language Technologies, Volume 1 (Long and Short Papers ). 4171--4186.

[10]

George R Doddington, Alexis Mitchell, Mark Przybocki, Lance Ramshaw, Stephanie Strassel, and Ralph Weischedel. 2004. The Automatic Content Extraction (ACE ) Program textendashTasks, Data, and Evaluation. In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC '04).

[11]

Anthony Fader, Luke Zettlemoyer, and Oren Etzioni. 2014. Open Question Answering over Curated and Extracted Knowledge Bases. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York New York USA, 1156--1165. https://doi.org/10.1145/2623330.2623677

Digital Library

[12]

Vishal Gupta and Gurpreet Singh Lehal. 2010. A Survey of Text Summarization Extractive Techniques. Journal of Emerging Technologies in Web Intelligence, Vol. 2, 3 (Aug. 2010), 258--268. https://doi.org/10.4304/jetwi.2.3.258--268

[13]

Xu Han, Hao Zhu, Pengfei Yu, Ziyun Wang, Yuan Yao, Zhiyuan Liu, and Maosong Sun. 2018. FewRel : A Large-Scale Supervised Few-Shot Relation Classification Dataset with State-of-the-Art Evaluation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 4803--4809. https://doi.org/10.18653/v1/D18--1514

[14]

Iris Hendrickx, Su Nam Kim, Zornitsa Kozareva, Preslav Nakov, Diarmuid Ó Séaghdha, Sebastian Padó, Marco Pennacchiotti, Lorenza Romano, and Stan Szpakowicz. 2010. SemEval-2010 Task 8: Multi-way Classification of Semantic Relations between Pairs of Nominals. In Proceedings of the 5th International Workshop on Semantic Evaluation. 33--38.

Digital Library

[15]

Hyunjik Kim and Andriy Mnih. 2018. Disentangling by Factorising. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 80), Jennifer Dy and Andreas Krause (Eds.). PMLR, 2649--2658.

[16]

Kun Kuang, Ruoxuan Xiong, Peng Cui, Susan Athey, and Bo Li. 2020. Stable Prediction with Model Misspecification and Agnostic Distribution Shift. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 04 (April 2020), 4485--4492. https://doi.org/10.1609/aaai.v34i04.5876

[17]

Haoliang Li, Sinno Jialin Pan, Shiqi Wang, and Alex C. Kot. 2018. Domain Generalization with Adversarial Feature Learning. In 2018 IEEE /CVF Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, UT, 5400--5409. https://doi.org/10.1109/CVPR.2018.00566

[18]

Jiashuo Liu, Zheyuan Hu, Peng Cui, Bo Li, and Zheyan Shen. 2021. Heterogeneous Risk Minimization. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 6804--6814.

[19]

I. Loshchilov and F. Hutter. 2019. Decoupled Weight Decay Regularization. In ICLR.

[20]

Yi Luan, Luheng He, Mari Ostendorf, and Hannaneh Hajishirzi. 2018. Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 3219--3232. https://doi.org/10.18653/v1/D18--1360

[21]

Jianmo Ni, Jiacheng Li, and Julian McAuley. 2019. Justifying Recommendations Using Distantly-Labeled Reviews and Fine-Grained Aspects. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP ). Association for Computational Linguistics, Hong Kong, China, 188--197. https://doi.org/10.18653/v1/D19--1018

[22]

OpenAI. 2023. GPT-4 Technical Report. arxiv: 2303.08774 [cs]

[23]

Ramakanth Pasunuru, Mengwen Liu, Mohit Bansal, Sujith Ravi, and Markus Dreyer. 2021. Efficiently Summarizing Text and Graph Encodings of Multi-Document Clusters. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics : Human Language Technologies. Association for Computational Linguistics, Online, 4768--4779. https://doi.org/10.18653/v1/2021.naacl-main.380

[24]

Hao Peng, Tianyu Gao, Xu Han, Yankai Lin, Peng Li, Zhiyuan Liu, Maosong Sun, and Jie Zhou. 2020. Learning from Context or Names ? An Empirical Study on Neural Relation Extraction. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP ). Association for Computational Linguistics, Online, 3661--3672. https://doi.org/10.18653/v1/2020.emnlp-main.298

[25]

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Anchors: High-Precision Model-Agnostic Explanations. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32, 1 (April 2018). https://doi.org/10.1609/aaai.v32i1.11491

[26]

Sebastian Riedel, Limin Yao, Andrew McCallum, and Benjamin M. Marlin. 2013. Relation Extraction with Matrix Factorization and Universal Schemas. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics : Human Language Technologies. Association for Computational Linguistics, Atlanta, Georgia, 74--84.

[27]

Dan Roth and Wen-tau Yih. 05 6 - 05 7 2004. A Linear Programming Formulation for Global Inference in Natural Language Tasks. In Proceedings of the Eighth Conference on Computational Natural Language Learning (CoNLL-2004 ) at HLT-NAACL 2004. Association for Computational Linguistics, Boston, Massachusetts, USA, 1--8.

[28]

Zheyan Shen, Peng Cui, Tong Zhang, and Kun Kunag. 2020. Stable Learning via Sample Reweighting. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 04 (April 2020), 5692--5699. https://doi.org/10.1609/aaai.v34i04.6024

[29]

Zheyan Shen, Jiashuo Liu, Yue He, Xingxuan Zhang, Renzhe Xu, Han Yu, and Peng Cui. 2021. Towards Out-Of-Distribution Generalization : A Survey. arxiv: 2108.13624 [cs]

[30]

Qingyu Tan, Ruidan He, Lidong Bing, and Hwee Tou Ng. 2022. Document-Level Relation Extraction with Adaptive Focal Loss and Knowledge Distillation. In Findings of the Association for Computational Linguistics : ACL 2022. Association for Computational Linguistics, Dublin, Ireland, 1672--1681. https://doi.org/10.18653/v1/2022.findings-acl.132

[31]

Xiaoyan Wang, Pavan Kapanipathi, Ryan Musa, Mo Yu, Kartik Talamadupula, Ibrahim Abdelaziz, Maria Chang, Achille Fokoue, Bassem Makni, Nicholas Mattei, and Michael Witbrock. 2019. Improving Natural Language Inference Using External Knowledge in the Science Questions Domain. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33 (July 2019), 7208--7215. https://doi.org/10.1609/aaai.v33i01.33017208

Digital Library

[32]

Yiwei Wang, Muhao Chen, Wenxuan Zhou, Yujun Cai, Yuxuan Liang, Dayiheng Liu, Baosong Yang, Juncheng Liu, and Bryan Hooi. 2022. Should We Rely on Entity Mentions for Relation Extraction ? Debiasing Relation Extraction with Counterfactual Analysis. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics : Human Language Technologies. Association for Computational Linguistics, Seattle, United States, 3071--3081. https://doi.org/10.18653/v1/2022.naacl-main.224

[33]

Yiqing Xie, Jiaming Shen, Sha Li, Yuning Mao, and Jiawei Han. 2022. Eider: Empowering Document-level Relation Extraction with Efficient Evidence Extraction and Inference-stage Fusion. In Findings of the Association for Computational Linguistics : ACL 2022. Association for Computational Linguistics, Dublin, Ireland, 257--268. https://doi.org/10.18653/v1/2022.findings-acl.23

[34]

Yuan Yao, Jiaju Du, Yankai Lin, Peng Li, Zhiyuan Liu, Jie Zhou, and Maosong Sun. 2021. CodRED : A Cross-Document Relation Extraction Dataset for Acquiring Knowledge in the Wild. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 4452--4472. https://doi.org/10.18653/v1/2021.emnlp-main.366

[35]

Yuan Yao, Deming Ye, Peng Li, Xu Han, Yankai Lin, Zhenghao Liu, Zhiyuan Liu, Lixin Huang, Jie Zhou, and Maosong Sun. 2019. DocRED : A Large-Scale Document-Level Relation Extraction Dataset. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 764--777. https://doi.org/10.18653/v1/P19--1074

[36]

Jinsung Yoon, James Jordon, and Mihaela van der Schaar. 2019. INVASE : Instance-wise Variable Selection Using Neural Networks. In International Conference on Learning Representations.

[37]

Xiangrong Zeng, Daojian Zeng, Shizhu He, Kang Liu, and Jun Zhao. 2018. Extracting Relational Facts by an End-to-End Neural Model with Copy Mechanism. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers ). Association for Computational Linguistics, Melbourne, Australia, 506--514. https://doi.org/10.18653/v1/P18--1047

[38]

Dongxu Zhang and Dong Wang. 2015. Relation Classification via Recurrent Neural Network. arXiv preprint arXiv:1508.01006 (2015). arxiv: 1508.01006

[39]

Ningyu Zhang, Xiang Chen, Xin Xie, Shumin Deng, Chuanqi Tan, Mosha Chen, Fei Huang, Luo Si, and Huajun Chen. 2021. Document-Level Relation Extraction as Semantic Segmentation. In IJCAI. https://doi.org/10.24963/ijcai.2021/551

[40]

Yuhao Zhang, Victor Zhong, Danqi Chen, Gabor Angeli, and Christopher D. Manning. 2017. Position-Aware Attention and Supervised Data Improve Slot Filling. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Copenhagen, Denmark, 35--45. https://doi.org/10.18653/v1/D17--1004

[41]

Wenxuan Zhou, Kevin Huang, Tengyu Ma, and Jing Huang. 2020. Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling. In AAAI. 14612--14620.

Index Terms

OODREB: Benchmarking State-of-the-Art Methods for Out-Of-Distribution Generalization on Relation Extraction
1. Applied computing
  1. Document management and text processing

Recommendations

Learning Dual Retrieval Module for Semi-supervised Relation Extraction
WWW '19: The World Wide Web Conference

Relation extraction is an important task in structuring content of text data, and becomes especially challenging when learning with weak supervision-where only a limited number of labeled sentences are given and a large number of unlabeled sentences are ...
ESRE: handling repeated entities in distant supervised relation extraction
Abstract
Distant supervised relation extraction has been widely used to find novel relational facts from unstructured text. As far as we know, nearly all existing relation extraction models assume that each sentence contains precisely one entity pair, i.e.,...
Learning labeling functions in distantly supervised relation extraction

Distant supervision has become the leading method for training large-scale information extractors. It could be encoded in the form of labeling functions, which employ knowledge bases to provide labels for the data. However, most previous works use only ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '24: Proceedings of the ACM Web Conference 2024

May 2024

4826 pages

ISBN:9798400701719

DOI:10.1145/3589334

General Chairs:
Tat-Seng Chua
National University of Singapore
,
Chong-Wah Ngo
Singapore Management University
,
Proceedings Chair:
Roy Ka-Wei Lee
Singapore University of Technology and Design
,
Program Chairs:
Ravi Kumar
Google
,
Hady W. Lauw
Singapore Management University

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WWW '24

Sponsor:

SIGWEB

WWW '24: The ACM Web Conference 2024

May 13 - 17, 2024

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
129
Total Downloads

Downloads (Last 12 months)129
Downloads (Last 6 weeks)11

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten