PANC: Prototype Augmented Neighbor Constraint instance completion in knowledge graphs
Introduction
Knowledge graphs (KGs) collect and store a great deal of commonsense or domain knowledge in factual triples, which are represented in the form of (head entity, relation, tail entity). In recent years, many existing knowledge graphs, such as Freebase (Bollacker, Evans, Paritosh, Sturge & Taylor, 2008), YAGO (Suchanek, Kasneci & Weikum, 2007), NELL (Mitchell, Cohen, Hruschka, Talukdar, Yang, Betteridge, Carlson, Dalvi, Gardner, Kisiel et al., 2018), have been viewed as key resources and introduced to many AI applications, such as question answering (Huang et al., 2019, Saxena et al., 2020), recommendation (Wang, He, Cao, Liu & Chua, 2019) and semantic search (Berant et al., 2013, Xiong et al., 2017a). However, they are generally incomplete and still suffer from missing relations between entities (Cui, Kapanipathi, Talamadupula, Gao & Ji, 2021). For example, a lot of people from Freebase and DBpedia do not have the place of birth (Nickel, Murphy, Tresp & Gabrilovich, 2015). Once we try to search Jobs Steve’s birthplace, we probably will not get an accurate answer. If we need to make some food recommendations in the future, it is difficult to get appropriate recommendation results based on this wrong answer. Hence it is necessary to explore knowledge graph completion method to improve the effectiveness of KGs.
Over the past few years, many efforts have been made in KG completion problems. Knowledge representation models are mostly used to infer the missing relation between the two given entities during the initial research period. The typical models are as follows: TransE (Bordes, Usunier, Garcia-Duran, Weston & Yakhnenko, 2013), TransR (Lin, Liu, Sun, Liu & Zhu, 2015), TransH (Wang, Zhang, Feng & Chen, 2014). Besides, reinforcement learning and path reasoning methods, which aim to model the complex relation path, are gradually introduced in this domain, e.g., DeepPath (Xiong, Hoang & Wang, 2017b), CPL (Fu, Chen, Qu, Jin & Ren, 2019) and MultiHop (Lin, Socher & Xiong, 2018). It is a popular way to introduce extra entity information, such as entity types (Jain et al., 2018, Niu et al., 2020, Xie et al., 2016b), text descriptions (García-Durán and Niepert, 2017, Kristiadi et al., 2019), which can enhance the entity embedding so as to improve the accuracy of model.
However, the great majority of methods generally define KG completion task as link prediction task. Specifically, given two elements in the fact triplet, the task is to infer the missing one, such as (h, r, ?), (h, ?, t), (?, r, t), where the question mark represents the missing entity/relation (Rosso, Yang, Ostapuk & Cudré-Mauroux, 2021). Although the excellent performance of such task has achieved, it is still unreasonable in many circumstances, such as the prediction for (Trump, capital of, ?). Link prediction task implicitly makes an assumption that given two elements are strongly related (Rosso et al., 2021). And most existing works implement it just through taking out one element from a true triplet in KG so as to ensure the assumption. However, we are not generally given two correlated elements in real world, where such meaningless combinations as (Trump, capital of, ?) will appear frequently. For this inherently meaningless combination, no matter what the final reasoning result is, the formed triplet makes no sense. Therefore, we need a more reasonable method to remove these invalid combinations as much as possible to ensure the effectiveness of the completion.
In response to the above problem, Paolo and Yang propose RETA based on a more complex task definition–instance completion, which can predict missing (r, t) pairs for a given head entity. In other words, they make prediction on (h, ?, ?) (Rosso et al., 2021) so that they can get rid of unreasonable assumption of knowing two correlated elements in a triplet. They designed a filter to generate candidate (r, t) pairs for given head entity, and a grader to score triples consisting of the given head entity and its candidate (r, t) pairs. Through considering the constraints of entity types in their filter, the number of candidate (r, t) pairs for given head entity can be reduced to a small magnitude. And they made a better performance than those baseline models. However, entity types are used as constraints in RETA Filter and Grader, which can be slightly inappropriate. Type information is fine-grained, in other words, overly specific, which may limit the filter performance on a high level and cause the incompleteness of the candidate set. As shown in Fig. 1, RETA-Filter may lose (produce, film/film) pair for given head entity Leonardo DiCaprio if it lacks the corresponding type film/producer in the KG and then misjudge the relation between Leonardo DiCaprio and Robin Hood.
To tackle the issue mentioned, we propose Prototype Augmented Neighbor Constraint instance completion model (PANC) which uses a kind of coarse-grained information called prototype (see problem formulation in method for details) to filter the candidate set. On the one hand, prototypes are employed to obtain more reasonable (r, t) pairs as candidate sets. On the other hand, the Grader is redesigned to judge the correctness of the triples more accurately. Based on original two neural network pipelines in RETA (Rosso et al., 2021) and inspired by (Xiong, Yu, Chang, Guo & Wang, 2018) and (Wang, Liu, Xu & Sheng, 2020), neighbor aggregation is introduced into Grader so as to make full use of the local graph structure information.
Our contributions are summarized as follows:
- •
We leverage the relevance among entities and fuzzy clustering algorithm to obtain the membership of each entity for each prototype so as to divide entity into the corresponding prototype space, and construct our filter with prototype information to generate more reasonable candidate (r, t) pairs.
- •
We introduce the local graph neighbor information to further constrain the given head entity and all candidate tail entities we have filtered and redesign the Grader with neighbor aggregation to enhance entity embedding.
- •
The experimental results show that our proposed model outperforms other baselines in the instance completion task. Compared with the state-of-the-art baseline RETA, our model obtains improvements of 18.58%, 13.4%, 8.76%, 12.39% on FB15k evaluated by metrics Rec@10, Rec@5, MAP, NDCG. The ablation study illustrates the effectiveness of each module in our model.
Section snippets
Related work
In this section, we briefly review knowledge graph completion models, which can be roughly divided into two categories: KG reasoning and KG embedding.
Method
In this section, we will describe the problem formulation and the detail of our proposed model. First, we design prototype filter, which can be divided into prototype segmentator and filter, to obtain the prototype memberships of each entity and generate the candidate (r, t) pairs. We then develop a neighbor aggregation grader constraining the entities with neighbor information and finally ranking the candidate (r, t) pairs with a consideration on the plausibility of the fact triplets and their
Experiments
In this section, we present the results of experiments performed on two real-world datasets on instance completion task. And some important hyper-parameters are analyzed their impacts on the robustness of PANC.
Conclusion
In this paper, we propose an PANC model for instance completion task consisting of two modules: prototype filter and neighbor aggregation grader. Our prototype filter first clusters all entities according to the prototype and generates more candidate relation-tail pairs set for given head entity through prototypes than types; our neighbor aggregation grader relies on neighbor aggregation to enhance the entity embedding and constrain the combination between head entity and candidate pairs and
CRediT authorship contribution statement
Ruixin Ma: Project administration. Yunlong Ma: Methodology, Software, Writing - original draft. Hongyan Zhang: Data curation, Investigation. Liang Zhao: Supervision, Writing – review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgement
This work is supported by the National Natural Science Foundation of China (61906030), the Science and Technology Project of Liaoning Province (2021JH2/10300064) and the Youth Science and Technology Star Support Program of Dalian City (2021RQ057).
References (46)
- et al.
Multi-hop reasoning over paths in temporal knowledge graphs using reinforcement learning
Applied Soft Computing
(2021) - Berant, J., Chou, A., Frostig, R., & Liang, P. (2013). Semantic parsing on freebase from question-answer pairs. In...
- Bollacker, K., Evans, C., Paritosh, P., Sturge, T., & Taylor, J. (2008). Freebase: a collaboratively created graph...
- et al.
Translating embeddings for modeling multi-relational data
Advances in Neural Information Processing Systems
(2013) - Cao, E., Wang, D., Huang, J., & Hu, W. (2020). Open knowledge enrichment for long-tail entities. In Proceedings of The...
- Chen, W., Xiong, W., Yan, X., & Wang, W. Y. (2018). Variational knowledge graph reasoning. In Proceedings of the 2018...
- Cui, Z., Kapanipathi, P., Talamadupula, K., Gao, T., & Ji, Q. (2021). Type-augmented relation prediction in knowledge...
- Das, R., Dhuliawala, S., Zaheer, M., Vilnis, L., Durugkar, I., Krishnamurthy, A., Smola, A., & McCallum, A. (2018). Go...
- Das, R., Neelakantan, A., Belanger, D., & McCallum, A. (2017). Chains of reasoning over entities, relations, and text...
- Dettmers, T., Minervini, P., Stenetorp, P., & Riedel, S. (2018). Convolutional 2d knowledge graph embeddings. In...
Simple embedding for link prediction in knowledge graphs
Advances in Neural Information Processing Systems
Incorporating literals into knowledge graph embeddings
Relational retrieval using a combination of path-constrained random walks
Machine Learning
Learning entity and relation embeddings for knowledge graph completion
Twenty-ninth AAAI conference on artificial intelligence
Analogical inference for multi-relational embeddings
Transt: Type-based multiple embedding representations for knowledge graph completion
Never-ending learning
Communications of the ACM
Cited by (2)
Complete feature learning and consistent relation modeling for few-shot knowledge graph completion
2024, Expert Systems with ApplicationsDynamic relation learning for link prediction in knowledge hypergraphs
2023, Applied Intelligence
- 1
Liang Zhao: 0000-0001-6301-1311.