Elsevier

Neurocomputing

Volume 537, 7 June 2023, Pages 12-21
Neurocomputing

Semantic piecewise convolutional neural network with adaptive negative training for distantly supervised relation extraction

https://doi.org/10.1016/j.neucom.2023.03.005Get rights and content

Abstract

Distantly Supervised Relation Extraction (DSRE) aligns existing knowledge bases with unstructured text to extract relation facts, and its automatically generated training data is inevitably noisy. Most existing works identify and reduce the impact of noise by enhancing semantic features. However, they only consider the semantic information in a single instance and ignore the semantic information between different instances. In this work, we propose a Semantic Piecewise Convolutional Neural Network (SPCNN), which uses the similarity between different entity pairs as semantic information to improve relation extraction. Specifically, to learn better semantic vector representations, we combine position features with entity pair features and entity similarity features in a high-dimensional space respectively, and generating two different semantic-aware representations. Then we unify these two representations to form a high-quality bag representation for training. Moreover, we design an Adaptive Negative Training (ANT) strategy, which facilitates the network to further exploit the rich semantic features to reduce the interference of noisy labels. Extensive experimental results on a large-scale benchmark dataset show that our method significantly outperforms other baselines.

Introduction

Relation Extraction(RE) is one of the most fundamental tasks in Natural Language Processing(NLP), which aims to extract the semantic relation between entity pairs in the given sentences. It can be used for downstream tasks in NLP such as knowledge graphs [1] and machine question answering [2], [3]. Conventional supervised relation extraction methods [4], [5] require large manually labeled data, which is extremely time-consuming and human-intensive. Thus, distantly supervised relation extraction is proposed to solve the problem of data scarcity, which automatically generate large-scale labeled data by aligning entities according to knowledge graphs [6].

Distant supervision is based on the assumption that sentences containing the same entity pair are marked with the same relation, which is inevitably accompanied by the mislabeling problem. An example shown in Fig. 1, this strong assumption introduces a lot of noise into the dataset, which severely impairs the performance of relation extraction models. Previous work [7] optimized it to the at-least-once assumption, it assumes that at least one of the sentences containing the same entity pair expresses a labeled relation. Further, a Multi-Instance Learning(MIL) framework was introduced to alleviate the distant supervision problem, which puts multiple instances in a bag and labels them with a unified label. Based on this MIL paradigm, early works [8], [9] proposed multi-instance multi-label learning and used linguistic features such as syntactic knowledge for traditional classification models. However, it is difficult to construct features manually to cope with complex and variable semantic relations.

With the development of neural networks, Convolutional Neural Networks (CNNs) was introduced for modeling sentences and has achieved great success in areas of NLP such as machine reading comprehension [10] and text classification [11], previous work [12] applied CNNs to DSRE and outperformed conventional manual feature-based methods. However, the noisy sentences in the dataset lessen the extraction capability of CNNs, some works [13], [14], [15] used advanced Piecewise Convolutional Neural Networks(PCNN) as the feature extractor, which can select more reliable sentences from the bags to train the model. Further, other studies [16], [17] suggested some noisy sentences are beneficial for relation extraction and cannot be simply removed, they tried to assign attention weights to all sentences in a bag, which can mitigate the effect of noisy sentences. Attention(ATT) mechanism also plays an important role between bags and bags [18], [19]. Based on ATT mechanism, other works [20], [21] improved the sentence encoder to learn better instance representations.

Semantic information facilitates relation extraction performance. Models can learn higher quality vector representations using semantic information, and their attention modules also perform better. Zeng et al. [13] first proposed entity relative position information to enhance word embeddings, and this work demonstrated that entity-related semantic information can effectively improve DSRE. Based on the position feature, Ji et al. [22] used entity description information to further enrich the semantic features, other works [23], [24] introduced external entity-related information, such as entity type, as a complement to sentence semantic information to improve model performance. However, these methods completely rely on information in the existing knowledge bases and the generalization is limited. Some works [25], [26] tried to utilize more information in sentences, and they employed entity pair embeddings as semantic information to enhance sentence representation. The overhead of introducing semantic information is significantly reduced, in contrast, it also leads to an excessive focus on the entities at the expense of the contextual information of the sentences.

However, all existing works ignore that the similarity between entity pairs can be used as semantic information to improve relation extraction. The similarity means that the head and tail entities of entity pairs are of the same type. For two dissimilar instances, both of their head entities are /person, but the tail entities are /location and /nationality respectively, which may produce consistent representations due to contextual similarity and eventually lead to bad classification results. Relatively, when the head entities of instances are /person and the tail entities are /location, even if the contexts are not similar, they both express the presence of someone at a certain place in a certain situation. Inspired by this syntactic phenomenon, we consider adding this semantic similarity for encoding sentences, which can generate more expressive sentence representations for relation extraction, and thus obtain more robust bag representations.

To summarize, by utilizing the similarity between entity pairs as semantic information, the network can learn from instances of similar relation, further enhance the relation representation and reduce the negative impact due to noisy instances. We propose Semantic Piecewise Convolutional Neural Network(SPCNN) to combine entity similarity features for better semantic vector representation. Specifically, at the sentence level, we combine position features with entity features and entity similarity features in the feature space, respectively, to generate entity-aware representations and similarity-aware representations. These two representations aggregate different semantic information. At the bag level, in order to unify the semantic information in both representations, we design a similarity gate mechanism to dynamically adjust the semantic focus of bag representations, and use KL distance constraints to facilitate their combination. Furthermore, attention mechanism prevents noise interference by reducing the weight of noisy sentences in the bags. However, when a bag contains only one sentence and is mislabeled, it is trained using bag labels, i.e., positive training, which will lead to the propagation of noisy representations. Negative training provides less noisy information, which have good performance in sentence-level DSRE [27]. Unlike the previous methods that only use one of the two training strategies, we consider integrating both positive and negative training into a unified training strategy. We design an Adaptive Negative Training(ANT) strategy to enhance the noise resistance of the model.

The contributions of this work can be summarized as follows:

  • We utilize entity pair similarity as semantic information and propose SPCNN to combine it with other semantic information to enhance instance representations, and finally unify the semantic features through a similarity gate mechanism.

  • We design ANT strategy to balance positive and negative training, which can further identify and reduce the effect of noisy labels in both multi-sentence and one-sentence bags.

  • Compared with previous state-of-art methods, extensive experiments demonstrate our SPCNN + ANT model achieves significant improvements on the widely used New York Times (NYT) dataset.

Section snippets

Related work

Relation extraction aims to identify the relation between two entities, which is expressed as selecting the correct relationship from a predefined set of relations for the given entity pairs. As a multi-class text classification task, supervised relation extraction suffers from the lack of large-scale manually labeled data. To alleviate this problem, Mintz et al. [6] proposed distant supervision to automatically generate data, which is achieved by aligning the knowledge base and textual

Methodology

We consider the similarity between the different entity pairs as semantic information and propose a new neural network framework, SPCNN + ANT, which consists of the following 5 modules: Embeddings, Encoder, Sentence-level Attention, Similarity Gate, and Adaptive Negative Training (as shown in Fig. 2).

Following the MIL framework, let bagi = {s1,s2,sn} denote a bag containing a set of sentences, which contain the same entity pair, and n is the number of sentences in bag bi. We input the above

Experiments

In this section, we compare our SPCNN + ANT model with current competitive baselines, and the comparative experiments are conducted on the current widely used dataset, which can confirm the state-of-the-art of our proposed model. To verify the effectiveness of each component of our model, we conduct ablation experiments. Further, we also evaluate the improvement of ANT on baselines to verify its generality.

Conclusion

In this work, we propose a novel Semantic Piecewise Convolutional Neural Network(SPCNN) for distantly supervised relation extraction, which considers the similarity between different entity pairs and unifies multiple semantic information at the sentence level and bag level. The proposed framework learns more semantic information from similar instances rather than relying exclusively on context or entity pairs, which generates high-quality representations to improve relation extraction. We also

CRediT authorship contribution statement

Mei Yu: Conceptualization, Methodology, Investigation, Writing - original draft. Yunke Chen: Conceptualization, Methodology, Software, Formal analysis. Mankun Zhao: Writing - original draft, Conceptualization. Tianyi Xu: Validation, Writing - original draft. Jian Yu: Investigation, Visualization. Ruiguo Yu: Writing - review & editing. Hongwei Liu: Supervision, Writing - review & editing. Xuewei Li: Writing - review & editing, Resources.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work is jointly supported by National Natural Science Foundation of China (61877043) and National Natural Science of China (61877044).

Mei Yu professor master’s tutor, Tianjin University Computer School Student Competition Guidance Center Competition Instructor, IT discipline Innovation and Entrepreneurship Training Base Instructor. Mainly engaged in data mining, artificial intelligence, computer networks, etc., and published many related academic papers. Participate in vertical projects such as the National Natural Science Foundation of China and Tianjin Science and Technology Major Special Projects. Obtained a number of

References (43)

  • S. Riedel et al.

    Modeling relations and their mentions without labeled text

  • M. Surdeanu, J. Tibshirani, R. Nallapati, C.D. Manning, Multi-instance multi-label learning for relation extraction,...
  • M. Surdeanu, J. Tibshirani, R. Nallapati, C.D. Manning, Multi-instance multi-label learning for relation extraction,...
  • A. Severyn, A. Moschitti, Learning to rank short text pairs with convolutional deep neural networks, in: Proceedings of...
  • N. Kalchbrenner, E. Grefenstette, P. Blunsom, A convolutional neural network for modelling sentences, arXiv preprint...
  • D. Zeng, K. Liu, S. Lai, G. Zhou, J. Zhao, Relation classification via convolutional deep neural network, in:...
  • D. Zeng, K. Liu, Y. Chen, J. Zhao, Distant supervision for relation extraction via piecewise convolutional neural...
  • P. Qin, W. Xu, W.Y. Wang, Robust distant supervision relation extraction via deep reinforcement learning, arXiv...
  • X. Han, Z. Liu, M. Sun, Denoising distant supervision for relation extraction via instance-level adversarial training,...
  • Y. Lin, S. Shen, Z. Liu, H. Luan, M. Sun, Neural relation extraction with selective attention over instances, in:...
  • C. Yuan, H. Huang, C. Feng, X. Liu, X. Wei, Distant supervision for relation extraction with linear attenuation...
  • Cited by (0)

    Mei Yu professor master’s tutor, Tianjin University Computer School Student Competition Guidance Center Competition Instructor, IT discipline Innovation and Entrepreneurship Training Base Instructor. Mainly engaged in data mining, artificial intelligence, computer networks, etc., and published many related academic papers. Participate in vertical projects such as the National Natural Science Foundation of China and Tianjin Science and Technology Major Special Projects. Obtained a number of authorized patents and software copyrights.

    Yunke Chen received the B.E. degree from Tianjin University, Tianjin, China, in 2019. He is currently working toward the Master. degree with the Department of Tianjin International Engineer Institute, Tianjin University. His research interests include natural language processing, knowledge graph, and machine learning.

    Mankun Zhao received Master degree in computer science and technology from Tianjin University, Tianjin, China, in 2015. He is currently an engineer at School of Computer Science and Technology, Tianjin University, China. His research interest includes Knowledge graph and data mining.

    Tianyi Xu received Master degree in computer science and technology from Tianjin University, Tianjin, China, in 2015. He is currently an engineer at School of Computer Science and Technology, Tianjin University, China. His research interest includes Internet of Things and Blockchain and data mining. He has dozens of papers in International journals and conference proceedings, such as IEEE Internet of Things Journal, International Journal of Distributed Sensor Networks etc.

    Jian Yu received a PhD in Communication and Information System from Tianjin University in 2010, he is mainly engaged in data mining, database and computer network research, and has published many scientific research papers in international conferences and journals.

    Ruiguo Yu Presided over or participated in a number of projects from the National Natural Science Foundation of China, Tianjin Science and Technology Support, and Tianjin Science and Technology Major Special Projects. It has reached the international advanced level in the fields of artificial intelligence-assisted thyroid medicine and wind power prediction. Has published more than dozens of papers in domestic and foreign journals or conferences, and served as a reviewer for the top journal ”APPLIED ENERGY”; as a reviewer for the National Natural Science Foundation of China; since 2002, he has served as the coach of the Tianjin University ACM team and led the team Won many medals in Asian competitions, and led the Tianjin University ACM team to the world finals six times and achieved excellent results. Served as the director of the IT discipline innovation and entrepreneurship training base of Tianjin University.

    Hongwei Liu Professor of Tianjin Foreign Studies University,Obtained a PhD from the University of Western Australia in 2008, Her main research direction is intelligent language learning, creative writing, etc. and published many scientific research papers in international conferences and journals.

    Xuewei Lii Associate Professor and Master’s Supervisor of the School of Computer Science and Technology of Tianjin University. She received his Ph.D. in computer application technology from Tianjin University in June 2009. The research direction is image processing, computer vision, artificial intelligence, etc., mainly including image segmentation, image enhancement, target detection and tracking, etc. He has published more than 10 papers in domestic and foreign journals, many conference papers, and served as a reviewer for the top international journal APPLIED ENERGY. The main projects that have presided over or participated in include: National Key R&D Program (sub-project leader) and National Science and Technology Support Project (project leader) each; 3 National Natural Science Foundation of China (participated in 2 projects, responsible for 1 project); Tianjin The city’s new generation of artificial intelligence major special project (participation) and Tianjin science and technology support key project (participation) each, responsible for or participating in a number of horizontal projects.

    View full text