Elsevier

Neurocomputing

Volume 507, 1 October 2022, Pages 130-144
Neurocomputing

AEP: Aligning knowledge graphs via embedding propagation

https://doi.org/10.1016/j.neucom.2022.08.018Get rights and content

Abstract

Knowledge graph alignment aims to identify entity pairs having the same meaning between different knowledge graphs, which is essential to the automated construction of a coherent knowledge base. With the development of knowledge representation learning, researchers have proposed several useful embedding-based alignment methods. In practice, some entities are easily aligned with high confidence while others are not. However, there is a lack of effective methods which boost the alignment of implicitly aligned entities based on explicitly aligned ones. This paper presents AEP, which combines several effective schemes for accurate knowledge graph alignment. First, we employ a word-embedding model to encode the semantic information contained in entities’ surface names. We propose an attention-based graph convolutional network model to incorporate the structural information via supervised embedding propagation. Besides, we develop a multi-view bootstrapping strategy to address the over-fitting problem caused by insufficient training sets. Next, an embedding-propagation-based alignment scheme is proposed to improve the alignment accuracy by propagating explicitly aligned entities’ embeddings to implicitly aligned ones in an unsupervised manner. Finally, we conduct extensive experiments to validate the superiority of AEP and evaluate the effects of proposed schemes. Experimental results show that AEP outperforms state-of-the-art methods, and the proposed schemes improve the alignment accuracy significantly.

Introduction

Knowledge bases contain lots of prior human knowledge, benefiting many machine learning tasks, such as text generation [1], machine translation [2], web crawling [3], reading comprehension [4], and recommendation systems [5], [6]. They are modeled as knowledge graphs that include entities and their relations in the form of triples. Due to their broad range of applications, multilingual knowledge bases such as ConceptNet [7], DBpedia [8], YAGO [9], and BabelNet [10] are built to integrate different knowledge from various human languages. However, manually linking entities from different knowledge graphs (KGs) is labor-consuming and error-prone. Therefore, knowledge graph alignment has attracted much attention for automated entity linkage.

With the development of knowledge representation learning, researchers have developed many useful embedding-based knowledge graph alignment algorithms. The translation-based methods map entities of different knowledge graphs into separated embedding spaces and then find a proper transformation of different embedding spaces [11], [12], [13], [14], [15]. However, these methods fail to utilize the useful semantic information in entity names and descriptions [16], [17]. Graph neural network (GNN)-based approaches map entities of different KGs into a unified vector space and link entities based on the distances of entity embeddings [18], [19], [20], [21]. Some methods also first represent entities with word embeddings and then employ Graph Convolutional Networks (GCNs) to encode the structural information in a unified embedding space [22], [23], [24]. These methods well exploit the structural and semantic information for entity alignment. Researchers also suggest following the one-to-one alignment constraint to avoid aligning errors introduced by one-to-many alignments [16], [25].

In practice, some entities are easily aligned with high confidence while others are not. Therefore, we can exploit the explicitly aligned entities to boost the alignment of implicitly aligned ones. For example, Fig. 1 presents two entities’ local network structures from two matching knowledge graphs. Entities in red circles represent the focused ego entities to be aligned, and they have several neighbors already being aligned. The green entities have relatively high similarities, while the blue ones have much lower similarities. Due to the limitation of the GCN model, which propagates all neighbors’ embeddings to the concerned entity, the red entities may have relatively low similarity, preventing them from being correctly matched. However, considering the collaborative effects of explicitly aligned entities, we can improve entity alignment accuracy by propagating the green entities’ embeddings to the red ones.

Moreover, knowledge graph alignment faces insufficient training sets due to the high labor cost of linking entities manually. Therefore, most alignment models can be easily over-fitted, i.e., a model with a low loss score for the training set does not mean a good performance for the test set. To address this difficulty, researchers have proposed a bootstrapping strategy that mines potential high-quality entity alignments in the training processing and adds them to the training set for subsequent training [14]. However, the training of the model is easily misguided by mistakenly aligned entities. Thus, the quality of the mined seeds is essential to the performance of the final model. It is necessary to guarantee the correctness of the mined entity alignments.

This paper presents AEP, a novel knowledge graph alignment framework via embedding propagation. AEP handles the interference of heterogeneity and overfitting problems for entity representation learning. First, AEP employs Fasttext [26] to obtain the initialized entity embeddings which encode entity names’ semantic information. Next, we propose an attention-based GCN model to integrate the structural information with semantic embeddings. We exploit entity similarity as the attention coefficients to prevent semantic embeddings from being lost due to the over-smoothing of information aggregation. To handle the overfitting problem of insufficient training sets encountered by GCN, we develop a multi-view bootstrapping strategy to seek high-quality potential entity alignments and add them to the training set recursively. Finally, we propose a propagation scheme to propagate explicitly aligned entities’ embeddings to unaligned ones. It allows us to utilize explicitly aligned entities to handle the heterogeneity problem. Besides, we conduct extensive experiments to validate the effectiveness of AEP and evaluate the effects of different schemes on performance. Experimental results show that AEP achieves superior performance over state-of-the-art methods, and the proposed schemes can significantly improve the performance of the alignment task. In summary, the main contributions of this paper are as follows:

  • 1) An attention-based GCN model is developed to address the neighborhood heterogeneity problem.

  • 2) A multi-view bootstrapping strategy is proposed to mine high-quality training entity alignments, which tackles the overfitting problem of insufficient training sets.

  • 3) A propagation-based alignment scheme is developed to utilize explicitly aligned entity pairs for accurate alignment.

  • 4) Extensive experiments are conducted to verify the effectiveness of the proposed method and evaluate the effects of different schemes. Experimental results show the superior of AEP and the proposed schemes1.

The rest of this paper is organized as follows: Section 2 presents the reviews of relevant literature. In Section 3, we detail the proposed method. Section 4 shows the experiments conducted to evaluate the proposed schemes. Section 5 concludes the paper and discusses future work.

Section snippets

Related work

Existing knowledge graph alignment algorithms can be roughly categorized into three groups, including translation-based methods, GNN-based approaches, and word embedding-based solutions. The first two types of procedures exploit the structural information for entity alignment, and the last kind of method focuses on integrating structural information into semantic embeddings for entity alignment.

The proposed method

We represent a knowledge graph by G=(E,R,T), where E is the set of entities, R is the set of relations, and T is the set of triples. Each triple (h,r,t) includes a head entity h, the relation r, and a tail entity t. An entity’s surface name is composed of a series of words e=[w1,w2,,wne], where ne is the number of words contained in the name of entity e. Given two knowledge graphs G1=(E1,R1,T1) and G2=(E2,R2,T2), let L={(u,v)|uE1L,vE2L} be a set of linked entity pairs, where E1LE1,E2LE2,

Datasets

We conduct the experiments based on two cross-lingual knowledge datasets and one mono-lingual knowledge dataset, including DBP15K [12], SRPRS [30] and DBP-FB [44]. For cross-lingual datasets, DBP15K is built by linking Chinese, French, and Japanese entities with English ones. SRPRS is a sparser cross-lingual knowledge dataset following the node degree distribution in DBpedia. For the monolingual dataset, DBP-FB makes most unaligned entities share similar or even the same names. Compared with

Conclusion and future work

This paper presents a knowledge graph alignment framework that integrates several schemes, including word embedding initialization, attention-based GCN, multi-view bootstrapping, and propagation-based alignment, to improve alignment accuracy. We conduct extensive experiments to validate the effectiveness of our method. The results show that our method surpasses several state-of-the-art approaches and is robust for sparse knowledge graph alignment. We also evaluate the effects of different

CRediT authorship contribution statement

Chenxu Wang: Methodology, Writing - original draft. Yue Wan: Writing - original draft. Zhenhao Huang: Investigation. Panpan Meng: Visualization. Pinghui Wang: Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

The research presented in this paper is supported in part by the National Key R&D Program of China (2021YFB1715600), National Natural Science Foundation of China (No. 61602370, U1736205, 61922067, 61902305), Shenzhen Basic Research Grant (JCYJ20170816100819428), Natural Science Basic Research Plan in Shaanxi Province (2021JM-018).

Chenxu Wang received his B.S. degree in Communication Engineering and Ph. D. degree in Control Science and Engineering from Xi’an Jiaotong University in 2009 and 2015, respectively. He was a Post-Doctoral Research Fellow with the Hong Kong Polytechnic University from 2016 to 2017. He is currently an associate professor with the School of Software Engineering, Xi’an Jiaotong University. His current research interests include Data Mining, Complex Network Analysis, and Graph Representation

References (47)

  • R. Navigli et al.

    BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network

    Artif. Intell.

    (2012)
  • X. Zhao et al.

    An experimental study of state-of-the-art entity alignment approaches

    IEEE TKDE

    (2020)
  • W. Liu, P. Zhou, Z. Zhao, Z. Wang, Q. Ju, H. Deng, P. Wang, K-bert: Enabling language representation with knowledge...
  • X. Pan et al.

    Contrastive learning for many-to-many multilingual neural machine translation

  • P. Joe Dhanith, B. Surendiran, S. Raja, A word embedding based approach for focused web crawling using the recurrent...
  • T. Zhao et al.

    Asking effective and diverse questions: a machine reading comprehension based framework for joint entity-relation extraction

  • P. Yang et al.

    Enhancing topic-to-essay generation with external commonsense knowledge

  • C. Shi et al.

    Deep collaborative filtering with multi-aspect information in heterogeneous networks

    IEEE Trans. Knowl. Data Eng.

    (2019)
  • R. Speer et al.

    Conceptnet at semeval-2017 task 2: Extending word embeddings with multilingual relational knowledge

    SemEval

    (2017)
  • J. Lehmann et al.

    DBpedia – A large-scale, multilingual knowledge base extracted from Wikipedia

    Semantic Web

    (2015)
  • F. Mahdisoltani et al.

    YAGO3: A knowledge base from multilingual wikipedias

    CIDR

    (2015)
  • M. Chen, Y. Tian, M. Yang, C. Zaniolo, Multilingual knowledge graph embeddings for cross-lingual knowledge alignment,...
  • Z. Sun, W. Hu, C. Li, Cross-lingual entity alignment via joint attribute-preserving embedding, in: Lecture Notes in...
  • H. Zhu, R. Xie, Z. Liu, M. Sun, Iterative entity alignment via joint knowledge embeddings, in: IJCAI, Vol. 0, 2017, pp....
  • Z. Sun, W. Hu, Q. Zhang, Y. Qu, Bootstrapping entity alignment with knowledge graph embedding, in: IJCAI, Vol....
  • M. Chen, Y. Tian, K.W. Chang, S. Skiena, C. Zaniolo, Co-training embeddings of knowledge graphs and entity descriptions...
  • W. Zeng et al.

    Reinforcement learning–based collective entity alignment with adaptive features

    ACM TOIS

    (2021)
  • X. Tang et al.

    Bert-int: A bert-based interaction model for knowledge graph alignment.

    IJCAI

    (2020)
  • Z. Wang et al.

    Cross-lingual knowledge graph alignment via graph convolutional networks

    EMNLP

    (2018)
  • C. Li et al.

    Semi-supervised entity alignment via joint knowledge embedding model and cross-graph model

    EMNLP-IJCNLP

    (2019)
  • R. Ye, X. Li, Y. Fang, H. Zang, M. Wang, A vectorized relational graph convolutional network for multi-relational...
  • Y. Cao et al.

    Multi-channel graph neural network for entity alignment

    ACL

    (2019)
  • K. Xu et al.

    Cross-lingual knowledge graph alignment via graph matching neural network

    ACL

    (2019)
  • Cited by (0)

    Chenxu Wang received his B.S. degree in Communication Engineering and Ph. D. degree in Control Science and Engineering from Xi’an Jiaotong University in 2009 and 2015, respectively. He was a Post-Doctoral Research Fellow with the Hong Kong Polytechnic University from 2016 to 2017. He is currently an associate professor with the School of Software Engineering, Xi’an Jiaotong University. His current research interests include Data Mining, Complex Network Analysis, and Graph Representation Learning.

    Yue Wan received the B.E degree in Software Engineering from Nan Chang University. He is a currently a graduate student with the School of Software Engineering, Xi’an Jiaotong University. His current research focuses on knowledge graph alignment.

    Zhenhao Huang received the B.E degree in Software Engineering from Nanchang University. He is a currently a graduate student with the School of Software Engineering, Xi’an Jiaotong University. His current research focuses on knowledge graph representation learning.

    Panpan Meng received the bachelor’s degree in Software Engineering from Ningxia University. Now She is a PhD student in the School of Software Engineering, Xi’an Jiaotong University. At present, her main research focuses on graph contrastive learning.

    Pinghui Wang received the B.S. degree in information engineering and the Ph.D. degree in automatic control from Xi’an Jiaotong University, Xi’an, China, in 2006 and 2012, respectively. He is currently an associate professor in the MOE Key Laboratory for Intelligent Networks and Network Security, Xi’an Jiaotong University. His research interests include Internet traffic measurement and modeling, abnormal detection, and online social network measurement.

    View full text