FAT-RE: A faster dependency-free model for relation extraction
Introduction
Relation Extraction is crucial to lots of Natural Language Processing (NLP) tasks, such as knowledge graph completion, question answering, biomedical text mining, and so on. It mainly concerns about what kind of relationship exists between two entities in a sentence. Although many methods based on deep neural networks are successful in this task, there are many redundancies in the traditional transformer encoder structure. This redundancy is related to the dependency tree.
Tree-structured dependency information is reported to be critical to the relation extraction task [1]. Previous researches [2], [3], [4], [5] demonstrate the effectiveness of incorporating the shortest dependency path (SDP) between the two mentioned entities. Christopoulou et al. [6] states that the relation between a pair of interest can be extracted directly from the target entities and indirectly incorporated from other related pairs in the sentence, which indicates that pruning to get the SDP may hurt the semantic integrity. However, without pruning, less informative words may bring more noise if not being dealt with properly [7]. Besides, propagating information on trees needs proper architecture to ensure parallelizing. A more efficient method proposed by Zhang et al. [8] applies Contextualized Graph Convolution Network (C-GCN) to this task with a novel pruning strategy. This method keeps tokens that are up to distance K away from the dependency path in the Lowest Common Ancestor (LCA) subtree. Though C-GCN has achieved the best performance with a pruning strategy, we still find some problems with applying the dependency tree in this task. To be specific, Fig. 1 indicates that pruning with distance K 1, as C-GCN suggests, loses crucial information and changes the semantic integrity.
Besides, pruning on trees in C-GCN needs extra time for preprocessing, and stacking Long Short Term Memory network (LSTM) with graph model is indispensable to capture context, which slows GCN itself. Moreover, the external tool of dependency parser leads to the domain-dependent models [6].
To alleviate the aforementioned problems, we introduce a faster dependency-free model for relation extraction. Specially, we treat a sequence as a fully connected graph and use position features (PFs) to model the sequence. Our model aims at identifying the indicative words between two mentions via self-attention to promote the relation extraction. We take the vanilla transformer encoder [9] as the main architecture of self-attention on a fully-connected graph. Our contributions are concluded as follows:
- (1)
We propose Filtering and Aggregation mechanisms to customize the Transformer encoder for Relation Extraction (FAT-RE), which achieves comparable or better results than the dependency-based methods.
- (2)
Our model does not require external information from dependency trees, nor does it need to be superimposed on sequential layers to enhance the contextual information, which makes it faster than previous methods.
- (3)
We compare the difference between the dependency-based method and our full-connection-based method, and explain how FAT-RE works and why it is superior via case study.
Section snippets
Related work
In fact, relation extraction is one of the basic tasks of natural language processing, and especially the dependency-based method for relation extraction is one of the mainstream methods.
Liu et al. [10] was the first to apply the Convolution Neural Network (CNN) to the relation classification task. With synonym coding, it yielded much better performance than the previous kernel-based method, giving a promising future of Deep Neural Network in this field. Zeng et al. [11] also exploited CNN to
FAT-RE model
In order to better understand the proposed framework in Fig. 2, we first present the definition of this task in Section 3.1, then describe the basic components of the transformer encoder in Section 3.2, and explain how to tailor the architecture to improve the performance of relation extraction in Section 3.3.
Datasets
TACRED is a dataset obtained from TAC Knowledge Base Population (TAC KBP) with 68 124 examples as the train set, 22 631 examples as the dev set, 15 509 examples as the test set. It covers 41 relations (e.g. per:schools_attended), one of which is labeled no_relation if the relation held by the two mentions is not defined. The size of the second dataset from Semeval2010 Task8 is much smaller, with only 8000 examples for training and 2717 examples for testing. With direction being considered and
Results and discussion
Micro-F1 on TACRED Micro-F1 on TACRED Table 2 shows Micro-F1 of our model and other baseline models on the TACRED test set. Our model is superior to the dependency-tree based models and performs much better in precision. Since Zhang et al. [8] gives the ensemble result (C-GCNPA-LSTM), we also list it for a fair comparison. We rerun the source code of PA-LSTM3 and get the model with precision, recall, F1 score being 66.0, 65.6, 65.8 respectively.
Conclusion
This study figures out the problem of dependency trees as auxiliary information for relation extraction. With fully considering the task features, we present a faster dependency-free model, FAT-RE, tailored from transformer, achieving good performance. Head-mask and highway connection mechanisms in our experiment are effective to filter out irrelevant information. Such modification to the architecture reveals that there is still room for improvement in the vanilla transformer in specific tasks.
CRediT authorship contribution statement
Lifang Ding: Conceptualization, Writing - original draft, Methodology, Software. Zeyang Lei: Writing - reviewing, Validation. Guangxu Xun: Writing - review & editing. Yujiu Yang: Supervision, Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This research was partially supported by National Key Technologies Research and Development Program under Grant No. 2018YFB1601102, the Key Program of National Natural Science Foundation of China under Grant No. U1903213, the inflexion Lab in Tsinghua Shenzhen International Graduate School, the Guangdong Basic and Applied Basic Research Foundation (No. 2019A1515011387), the Dedicated Fund for Promoting High-Quality Economic Development in Guangdong Province (Marine Economic Development
References (33)
- et al.
Position-aware attention and supervised data improve slot filling
- et al.
A review of relation extraction
Lit. Rev. Lang. Statist. II
(2007) - et al.
Semantic relation classification via convolutional neural networks with simple negative sampling
- et al.
A dependency-based neural network for relation classification
- et al.
Bidirectional recurrent convolutional neural network for relation classification
- Y. Xu, R. Jia, L. Mou, G. Li, Y. Chen, Y. Lu, Z. Jin, Improved relation classification by deep recurrent neural...
- et al.
A walk-based model on entity graphs for relation extraction
- et al.
Classifying relations via long short term memory networks along shortest dependency paths
- et al.
Graph convolution over pruned dependency trees improves relation extraction
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, I. Polosukhin, Attention is all you need, in:...
Convolution neural network for relation extraction
Relation extraction: Perspective from convolutional neural networks
Bidirectional long short-term memory networks for relation classification
Attention-based convolutional neural network for semantic relation extraction
Attention-based bidirectional long short-term memory networks for relation classification
Cited by (4)
NLIRE: A Natural Language Inference method for Relation Extraction
2022, Journal of Web SemanticsCitation Excerpt :Santos et al. [1] alleviated the impact of artificial classes utilizing a new pairwise ranking loss function. Besides, models adopted recursive neural network (RNN) [10] or transformer [11] as the encoder also have shown promising performance. However, irrelevant words in the sentence may introduce extra noise, leading to the reduction of performance.
Acronym Extraction with Hybrid Strategies
2022, CEUR Workshop ProceedingsPrompt-based Model for Acronym Disambiguation via Negative Sampling
2022, CEUR Workshop Proceedings