Elsevier

Journal of Web Semantics

Volume 65, December 2020, 100598
Journal of Web Semantics

FAT-RE: A faster dependency-free model for relation extraction

https://doi.org/10.1016/j.websem.2020.100598Get rights and content

Abstract

Recent years have seen the dependency tree as effective information for relation extraction. Two problems still exist in previous methods: (1) dependency tree relies on external tools and needs to be carefully integrated with a trade-off between pruning noisy words and keeping semantic integrity; (2) dependency-based methods still have to encode sequential context as a supplement, which needs extra time. To tackle the two problems, we propose a faster dependency-free model in this paper: regarding the sentence as a fully-connected graph, we customize the vanilla transformer architecture to remove the irrelevant information via filtering mechanism and further aggregate the sentence information through the enhanced query. Our model yields comparable results on the SemEval2010 Task8 dataset and better results on the TACRED dataset, without requiring external information from the dependency tree but with improved time efficiency.

Introduction

Relation Extraction is crucial to lots of Natural Language Processing (NLP) tasks, such as knowledge graph completion, question answering, biomedical text mining, and so on. It mainly concerns about what kind of relationship exists between two entities in a sentence. Although many methods based on deep neural networks are successful in this task, there are many redundancies in the traditional transformer encoder structure. This redundancy is related to the dependency tree.

Tree-structured dependency information is reported to be critical to the relation extraction task [1]. Previous researches [2], [3], [4], [5] demonstrate the effectiveness of incorporating the shortest dependency path (SDP) between the two mentioned entities. Christopoulou et al. [6] states that the relation between a pair of interest can be extracted directly from the target entities and indirectly incorporated from other related pairs in the sentence, which indicates that pruning to get the SDP may hurt the semantic integrity. However, without pruning, less informative words may bring more noise if not being dealt with properly [7]. Besides, propagating information on trees needs proper architecture to ensure parallelizing. A more efficient method proposed by Zhang et al. [8] applies Contextualized Graph Convolution Network (C-GCN) to this task with a novel pruning strategy. This method keeps tokens that are up to distance K away from the dependency path in the Lowest Common Ancestor (LCA) subtree. Though C-GCN has achieved the best performance with a pruning strategy, we still find some problems with applying the dependency tree in this task. To be specific, Fig. 1 indicates that pruning with distance K = 1, as C-GCN suggests, loses crucial information and changes the semantic integrity.

Besides, pruning on trees in C-GCN needs extra time for preprocessing, and stacking Long Short Term Memory network (LSTM) with graph model is indispensable to capture context, which slows GCN itself. Moreover, the external tool of dependency parser leads to the domain-dependent models [6].

To alleviate the aforementioned problems, we introduce a faster dependency-free model for relation extraction. Specially, we treat a sequence as a fully connected graph and use position features (PFs) to model the sequence. Our model aims at identifying the indicative words between two mentions via self-attention to promote the relation extraction. We take the vanilla transformer encoder [9] as the main architecture of self-attention on a fully-connected graph. Our contributions are concluded as follows:

  • (1)

    We propose Filtering and Aggregation mechanisms to customize the Transformer encoder for Relation Extraction (FAT-RE), which achieves comparable or better results than the dependency-based methods.

  • (2)

    Our model does not require external information from dependency trees, nor does it need to be superimposed on sequential layers to enhance the contextual information, which makes it faster than previous methods.

  • (3)

    We compare the difference between the dependency-based method and our full-connection-based method, and explain how FAT-RE works and why it is superior via case study.

Section snippets

Related work

In fact, relation extraction is one of the basic tasks of natural language processing, and especially the dependency-based method for relation extraction is one of the mainstream methods.

Liu et al. [10] was the first to apply the Convolution Neural Network (CNN) to the relation classification task. With synonym coding, it yielded much better performance than the previous kernel-based method, giving a promising future of Deep Neural Network in this field. Zeng et al. [11] also exploited CNN to

FAT-RE model

In order to better understand the proposed framework in Fig. 2, we first present the definition of this task in Section 3.1, then describe the basic components of the transformer encoder in Section 3.2, and explain how to tailor the architecture to improve the performance of relation extraction in Section 3.3.

Datasets

TACRED is a dataset obtained from TAC Knowledge Base Population (TAC KBP) with 68 124 examples as the train set, 22 631 examples as the dev set, 15 509 examples as the test set. It covers 41 relations (e.g. per:schools_attended), one of which is labeled no_relation if the relation held by the two mentions is not defined. The size of the second dataset from Semeval2010 Task8 is much smaller, with only 8000 examples for training and 2717 examples for testing. With direction being considered and

Results and discussion

Micro-F1 on TACRED Micro-F1 on TACRED Table 2 shows Micro-F1 of our model and other baseline models on the TACRED test set. Our model is superior to the dependency-tree based models and performs much better in precision. Since Zhang et al. [8] gives the ensemble result (C-GCN+PA-LSTM), we also list it for a fair comparison. We rerun the source code of PA-LSTM3 and get the model with precision, recall, F1 score being 66.0, 65.6, 65.8 respectively.

Conclusion

This study figures out the problem of dependency trees as auxiliary information for relation extraction. With fully considering the task features, we present a faster dependency-free model, FAT-RE, tailored from transformer, achieving good performance. Head-mask and highway connection mechanisms in our experiment are effective to filter out irrelevant information. Such modification to the architecture reveals that there is still room for improvement in the vanilla transformer in specific tasks.

CRediT authorship contribution statement

Lifang Ding: Conceptualization, Writing - original draft, Methodology, Software. Zeyang Lei: Writing - reviewing, Validation. Guangxu Xun: Writing - review & editing. Yujiu Yang: Supervision, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research was partially supported by National Key Technologies Research and Development Program under Grant No. 2018YFB1601102, the Key Program of National Natural Science Foundation of China under Grant No. U1903213, the inflexion Lab in Tsinghua Shenzhen International Graduate School, the Guangdong Basic and Applied Basic Research Foundation (No. 2019A1515011387), the Dedicated Fund for Promoting High-Quality Economic Development in Guangdong Province (Marine Economic Development

References (33)

  • ZhangY. et al.

    Position-aware attention and supervised data improve slot filling

  • BachN. et al.

    A review of relation extraction

    Lit. Rev. Lang. Statist. II

    (2007)
  • XuK. et al.

    Semantic relation classification via convolutional neural networks with simple negative sampling

  • LiuY. et al.

    A dependency-based neural network for relation classification

  • CaiR. et al.

    Bidirectional recurrent convolutional neural network for relation classification

  • Y. Xu, R. Jia, L. Mou, G. Li, Y. Chen, Y. Lu, Z. Jin, Improved relation classification by deep recurrent neural...
  • ChristopoulouF. et al.

    A walk-based model on entity graphs for relation extraction

  • XuY. et al.

    Classifying relations via long short term memory networks along shortest dependency paths

  • ZhangY. et al.

    Graph convolution over pruned dependency trees improves relation extraction

  • A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, I. Polosukhin, Attention is all you need, in:...
  • LiuC. et al.

    Convolution neural network for relation extraction

  • D. Zeng, K. Liu, S. Lai, G. Zhou, J. Zhao, Relation classification via convolutional deep neural network, in: COLING...
  • NguyenT.H. et al.

    Relation extraction: Perspective from convolutional neural networks

  • ZhangS. et al.

    Bidirectional long short-term memory networks for relation classification

  • ShenY. et al.

    Attention-based convolutional neural network for semantic relation extraction

  • ZhouP. et al.

    Attention-based bidirectional long short-term memory networks for relation classification

  • Cited by (4)

    • NLIRE: A Natural Language Inference method for Relation Extraction

      2022, Journal of Web Semantics
      Citation Excerpt :

      Santos et al. [1] alleviated the impact of artificial classes utilizing a new pairwise ranking loss function. Besides, models adopted recursive neural network (RNN) [10] or transformer [11] as the encoder also have shown promising performance. However, irrelevant words in the sentence may introduce extra noise, leading to the reduction of performance.

    • Acronym Extraction with Hybrid Strategies

      2022, CEUR Workshop Proceedings
    View full text