AEP: Aligning knowledge graphs via embedding propagation
Introduction
Knowledge bases contain lots of prior human knowledge, benefiting many machine learning tasks, such as text generation [1], machine translation [2], web crawling [3], reading comprehension [4], and recommendation systems [5], [6]. They are modeled as knowledge graphs that include entities and their relations in the form of triples. Due to their broad range of applications, multilingual knowledge bases such as ConceptNet [7], DBpedia [8], YAGO [9], and BabelNet [10] are built to integrate different knowledge from various human languages. However, manually linking entities from different knowledge graphs (KGs) is labor-consuming and error-prone. Therefore, knowledge graph alignment has attracted much attention for automated entity linkage.
With the development of knowledge representation learning, researchers have developed many useful embedding-based knowledge graph alignment algorithms. The translation-based methods map entities of different knowledge graphs into separated embedding spaces and then find a proper transformation of different embedding spaces [11], [12], [13], [14], [15]. However, these methods fail to utilize the useful semantic information in entity names and descriptions [16], [17]. Graph neural network (GNN)-based approaches map entities of different KGs into a unified vector space and link entities based on the distances of entity embeddings [18], [19], [20], [21]. Some methods also first represent entities with word embeddings and then employ Graph Convolutional Networks (GCNs) to encode the structural information in a unified embedding space [22], [23], [24]. These methods well exploit the structural and semantic information for entity alignment. Researchers also suggest following the one-to-one alignment constraint to avoid aligning errors introduced by one-to-many alignments [16], [25].
In practice, some entities are easily aligned with high confidence while others are not. Therefore, we can exploit the explicitly aligned entities to boost the alignment of implicitly aligned ones. For example, Fig. 1 presents two entities’ local network structures from two matching knowledge graphs. Entities in red circles represent the focused ego entities to be aligned, and they have several neighbors already being aligned. The green entities have relatively high similarities, while the blue ones have much lower similarities. Due to the limitation of the GCN model, which propagates all neighbors’ embeddings to the concerned entity, the red entities may have relatively low similarity, preventing them from being correctly matched. However, considering the collaborative effects of explicitly aligned entities, we can improve entity alignment accuracy by propagating the green entities’ embeddings to the red ones.
Moreover, knowledge graph alignment faces insufficient training sets due to the high labor cost of linking entities manually. Therefore, most alignment models can be easily over-fitted, i.e., a model with a low loss score for the training set does not mean a good performance for the test set. To address this difficulty, researchers have proposed a bootstrapping strategy that mines potential high-quality entity alignments in the training processing and adds them to the training set for subsequent training [14]. However, the training of the model is easily misguided by mistakenly aligned entities. Thus, the quality of the mined seeds is essential to the performance of the final model. It is necessary to guarantee the correctness of the mined entity alignments.
This paper presents AEP, a novel knowledge graph alignment framework via embedding propagation. AEP handles the interference of heterogeneity and overfitting problems for entity representation learning. First, AEP employs Fasttext [26] to obtain the initialized entity embeddings which encode entity names’ semantic information. Next, we propose an attention-based GCN model to integrate the structural information with semantic embeddings. We exploit entity similarity as the attention coefficients to prevent semantic embeddings from being lost due to the over-smoothing of information aggregation. To handle the overfitting problem of insufficient training sets encountered by GCN, we develop a multi-view bootstrapping strategy to seek high-quality potential entity alignments and add them to the training set recursively. Finally, we propose a propagation scheme to propagate explicitly aligned entities’ embeddings to unaligned ones. It allows us to utilize explicitly aligned entities to handle the heterogeneity problem. Besides, we conduct extensive experiments to validate the effectiveness of AEP and evaluate the effects of different schemes on performance. Experimental results show that AEP achieves superior performance over state-of-the-art methods, and the proposed schemes can significantly improve the performance of the alignment task. In summary, the main contributions of this paper are as follows:
- •
1) An attention-based GCN model is developed to address the neighborhood heterogeneity problem.
- •
2) A multi-view bootstrapping strategy is proposed to mine high-quality training entity alignments, which tackles the overfitting problem of insufficient training sets.
- •
3) A propagation-based alignment scheme is developed to utilize explicitly aligned entity pairs for accurate alignment.
- •
4) Extensive experiments are conducted to verify the effectiveness of the proposed method and evaluate the effects of different schemes. Experimental results show the superior of AEP and the proposed schemes1.
The rest of this paper is organized as follows: Section 2 presents the reviews of relevant literature. In Section 3, we detail the proposed method. Section 4 shows the experiments conducted to evaluate the proposed schemes. Section 5 concludes the paper and discusses future work.
Section snippets
Related work
Existing knowledge graph alignment algorithms can be roughly categorized into three groups, including translation-based methods, GNN-based approaches, and word embedding-based solutions. The first two types of procedures exploit the structural information for entity alignment, and the last kind of method focuses on integrating structural information into semantic embeddings for entity alignment.
The proposed method
We represent a knowledge graph by , where E is the set of entities, R is the set of relations, and T is the set of triples. Each triple includes a head entity h, the relation r, and a tail entity t. An entity’s surface name is composed of a series of words , where is the number of words contained in the name of entity e. Given two knowledge graphs and , let be a set of linked entity pairs, where ,
Datasets
We conduct the experiments based on two cross-lingual knowledge datasets and one mono-lingual knowledge dataset, including [12], SRPRS [30] and [44]. For cross-lingual datasets, is built by linking Chinese, French, and Japanese entities with English ones. SRPRS is a sparser cross-lingual knowledge dataset following the node degree distribution in DBpedia. For the monolingual dataset, makes most unaligned entities share similar or even the same names. Compared with
Conclusion and future work
This paper presents a knowledge graph alignment framework that integrates several schemes, including word embedding initialization, attention-based GCN, multi-view bootstrapping, and propagation-based alignment, to improve alignment accuracy. We conduct extensive experiments to validate the effectiveness of our method. The results show that our method surpasses several state-of-the-art approaches and is robust for sparse knowledge graph alignment. We also evaluate the effects of different
CRediT authorship contribution statement
Chenxu Wang: Methodology, Writing - original draft. Yue Wan: Writing - original draft. Zhenhao Huang: Investigation. Panpan Meng: Visualization. Pinghui Wang: Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
The research presented in this paper is supported in part by the National Key R&D Program of China (2021YFB1715600), National Natural Science Foundation of China (No. 61602370, U1736205, 61922067, 61902305), Shenzhen Basic Research Grant (JCYJ20170816100819428), Natural Science Basic Research Plan in Shaanxi Province (2021JM-018).
Chenxu Wang received his B.S. degree in Communication Engineering and Ph. D. degree in Control Science and Engineering from Xi’an Jiaotong University in 2009 and 2015, respectively. He was a Post-Doctoral Research Fellow with the Hong Kong Polytechnic University from 2016 to 2017. He is currently an associate professor with the School of Software Engineering, Xi’an Jiaotong University. His current research interests include Data Mining, Complex Network Analysis, and Graph Representation
References (47)
- et al.
BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network
Artif. Intell.
(2012) - et al.
An experimental study of state-of-the-art entity alignment approaches
IEEE TKDE
(2020) - W. Liu, P. Zhou, Z. Zhao, Z. Wang, Q. Ju, H. Deng, P. Wang, K-bert: Enabling language representation with knowledge...
- et al.
Contrastive learning for many-to-many multilingual neural machine translation
- P. Joe Dhanith, B. Surendiran, S. Raja, A word embedding based approach for focused web crawling using the recurrent...
- et al.
Asking effective and diverse questions: a machine reading comprehension based framework for joint entity-relation extraction
- et al.
Enhancing topic-to-essay generation with external commonsense knowledge
- et al.
Deep collaborative filtering with multi-aspect information in heterogeneous networks
IEEE Trans. Knowl. Data Eng.
(2019) - et al.
Conceptnet at semeval-2017 task 2: Extending word embeddings with multilingual relational knowledge
SemEval
(2017) - et al.
DBpedia – A large-scale, multilingual knowledge base extracted from Wikipedia
Semantic Web
(2015)
YAGO3: A knowledge base from multilingual wikipedias
CIDR
Reinforcement learning–based collective entity alignment with adaptive features
ACM TOIS
Bert-int: A bert-based interaction model for knowledge graph alignment.
IJCAI
Cross-lingual knowledge graph alignment via graph convolutional networks
EMNLP
Semi-supervised entity alignment via joint knowledge embedding model and cross-graph model
EMNLP-IJCNLP
Multi-channel graph neural network for entity alignment
ACL
Cross-lingual knowledge graph alignment via graph matching neural network
ACL
Cited by (0)
Chenxu Wang received his B.S. degree in Communication Engineering and Ph. D. degree in Control Science and Engineering from Xi’an Jiaotong University in 2009 and 2015, respectively. He was a Post-Doctoral Research Fellow with the Hong Kong Polytechnic University from 2016 to 2017. He is currently an associate professor with the School of Software Engineering, Xi’an Jiaotong University. His current research interests include Data Mining, Complex Network Analysis, and Graph Representation Learning.
Yue Wan received the B.E degree in Software Engineering from Nan Chang University. He is a currently a graduate student with the School of Software Engineering, Xi’an Jiaotong University. His current research focuses on knowledge graph alignment.
Zhenhao Huang received the B.E degree in Software Engineering from Nanchang University. He is a currently a graduate student with the School of Software Engineering, Xi’an Jiaotong University. His current research focuses on knowledge graph representation learning.
Panpan Meng received the bachelor’s degree in Software Engineering from Ningxia University. Now She is a PhD student in the School of Software Engineering, Xi’an Jiaotong University. At present, her main research focuses on graph contrastive learning.
Pinghui Wang received the B.S. degree in information engineering and the Ph.D. degree in automatic control from Xi’an Jiaotong University, Xi’an, China, in 2006 and 2012, respectively. He is currently an associate professor in the MOE Key Laboratory for Intelligent Networks and Network Security, Xi’an Jiaotong University. His research interests include Internet traffic measurement and modeling, abnormal detection, and online social network measurement.