PANC: Prototype Augmented Neighbor Constraint instance completion in knowledge graphs

https://doi.org/10.1016/j.eswa.2022.119013Get rights and content

Highlights

  • Instance completion task considers the rationality of prediction.

  • Rely on prototype for a wider set of reasonable candidates.

  • Neighbor aggregation assists to accurately discriminating knowledge triplets.

  • Performance of model declines the number of neighbors exceed a threshold.

Abstract

Great attention has been devoted to knowledge graph completion techniques with the wide application of knowledge graph. Previous works on knowledge graph completion mainly consider predicting missing one element for given two elements in a triplet, such as (h, r, ?). However, this task lacks of reasonable consideration between entities and relations, such as (Italy, acted in, ?), which may cause the meaningless predictions. Recent attempt solves this problem by redefining a new task-instance completion, which generates and evaluates reasonable relation-tail pairs for given head entity, such as (h, ?, ?). In this work, we propose a novel Prototype Augmented Neighbor Constraint instance completion model called PANC, which consists of two modules-prototype filter and neighbor aggregation grader. A kind of coarse-grained information prototype is utilized in filters to generate more candidate relation-tail pairs and neighbor aggregation is introduced into grader so as to enhance entity embedding and constrain the combination between head entity and candidate pairs. The experiments show that our PANC outperforms the state-of-the-art instance completion techniques on two real-world datasets FB15k and JF17k. And the ablation results verify the effectiveness of the modules in our proposed PANC.

Introduction

Knowledge graphs (KGs) collect and store a great deal of commonsense or domain knowledge in factual triples, which are represented in the form of (head entity, relation, tail entity). In recent years, many existing knowledge graphs, such as Freebase (Bollacker, Evans, Paritosh, Sturge & Taylor, 2008), YAGO (Suchanek, Kasneci & Weikum, 2007), NELL (Mitchell, Cohen, Hruschka, Talukdar, Yang, Betteridge, Carlson, Dalvi, Gardner, Kisiel et al., 2018), have been viewed as key resources and introduced to many AI applications, such as question answering (Huang et al., 2019, Saxena et al., 2020), recommendation (Wang, He, Cao, Liu & Chua, 2019) and semantic search (Berant et al., 2013, Xiong et al., 2017a). However, they are generally incomplete and still suffer from missing relations between entities (Cui, Kapanipathi, Talamadupula, Gao & Ji, 2021). For example, a lot of people from Freebase and DBpedia do not have the place of birth (Nickel, Murphy, Tresp & Gabrilovich, 2015). Once we try to search Jobs Steve’s birthplace, we probably will not get an accurate answer. If we need to make some food recommendations in the future, it is difficult to get appropriate recommendation results based on this wrong answer. Hence it is necessary to explore knowledge graph completion method to improve the effectiveness of KGs.

Over the past few years, many efforts have been made in KG completion problems. Knowledge representation models are mostly used to infer the missing relation between the two given entities during the initial research period. The typical models are as follows: TransE (Bordes, Usunier, Garcia-Duran, Weston & Yakhnenko, 2013), TransR (Lin, Liu, Sun, Liu & Zhu, 2015), TransH (Wang, Zhang, Feng & Chen, 2014). Besides, reinforcement learning and path reasoning methods, which aim to model the complex relation path, are gradually introduced in this domain, e.g., DeepPath (Xiong, Hoang & Wang, 2017b), CPL (Fu, Chen, Qu, Jin & Ren, 2019) and MultiHop (Lin, Socher & Xiong, 2018). It is a popular way to introduce extra entity information, such as entity types (Jain et al., 2018, Niu et al., 2020, Xie et al., 2016b), text descriptions (García-Durán and Niepert, 2017, Kristiadi et al., 2019), which can enhance the entity embedding so as to improve the accuracy of model.

However, the great majority of methods generally define KG completion task as link prediction task. Specifically, given two elements in the fact triplet, the task is to infer the missing one, such as (h, r, ?), (h, ?, t), (?, r, t), where the question mark represents the missing entity/relation (Rosso, Yang, Ostapuk & Cudré-Mauroux, 2021). Although the excellent performance of such task has achieved, it is still unreasonable in many circumstances, such as the prediction for (Trump, capital of, ?). Link prediction task implicitly makes an assumption that given two elements are strongly related (Rosso et al., 2021). And most existing works implement it just through taking out one element from a true triplet in KG so as to ensure the assumption. However, we are not generally given two correlated elements in real world, where such meaningless combinations as (Trump, capital of, ?) will appear frequently. For this inherently meaningless combination, no matter what the final reasoning result is, the formed triplet makes no sense. Therefore, we need a more reasonable method to remove these invalid combinations as much as possible to ensure the effectiveness of the completion.

In response to the above problem, Paolo and Yang propose RETA based on a more complex task definition–instance completion, which can predict missing (r, t) pairs for a given head entity. In other words, they make prediction on (h, ?, ?) (Rosso et al., 2021) so that they can get rid of unreasonable assumption of knowing two correlated elements in a triplet. They designed a filter to generate candidate (r, t) pairs for given head entity, and a grader to score triples consisting of the given head entity and its candidate (r, t) pairs. Through considering the constraints of entity types in their filter, the number of candidate (r, t) pairs for given head entity can be reduced to a small magnitude. And they made a better performance than those baseline models. However, entity types are used as constraints in RETA Filter and Grader, which can be slightly inappropriate. Type information is fine-grained, in other words, overly specific, which may limit the filter performance on a high level and cause the incompleteness of the candidate set. As shown in Fig. 1, RETA-Filter may lose (produce, film/film) pair for given head entity Leonardo DiCaprio if it lacks the corresponding type film/producer in the KG and then misjudge the relation between Leonardo DiCaprio and Robin Hood.

To tackle the issue mentioned, we propose Prototype Augmented Neighbor Constraint instance completion model (PANC) which uses a kind of coarse-grained information called prototype (see problem formulation in method for details) to filter the candidate set. On the one hand, prototypes are employed to obtain more reasonable (r, t) pairs as candidate sets. On the other hand, the Grader is redesigned to judge the correctness of the triples more accurately. Based on original two neural network pipelines in RETA (Rosso et al., 2021) and inspired by (Xiong, Yu, Chang, Guo & Wang, 2018) and (Wang, Liu, Xu & Sheng, 2020), neighbor aggregation is introduced into Grader so as to make full use of the local graph structure information.

Our contributions are summarized as follows:

  • We leverage the relevance among entities and fuzzy clustering algorithm to obtain the membership of each entity for each prototype so as to divide entity into the corresponding prototype space, and construct our filter with prototype information to generate more reasonable candidate (r, t) pairs.

  • We introduce the local graph neighbor information to further constrain the given head entity and all candidate tail entities we have filtered and redesign the Grader with neighbor aggregation to enhance entity embedding.

  • The experimental results show that our proposed model outperforms other baselines in the instance completion task. Compared with the state-of-the-art baseline RETA, our model obtains improvements of 18.58%, 13.4%, 8.76%, 12.39% on FB15k evaluated by metrics Rec@10, Rec@5, MAP, NDCG. The ablation study illustrates the effectiveness of each module in our model.

Section snippets

Related work

In this section, we briefly review knowledge graph completion models, which can be roughly divided into two categories: KG reasoning and KG embedding.

Method

In this section, we will describe the problem formulation and the detail of our proposed model. First, we design prototype filter, which can be divided into prototype segmentator and filter, to obtain the prototype memberships of each entity and generate the candidate (r, t) pairs. We then develop a neighbor aggregation grader constraining the entities with neighbor information and finally ranking the candidate (r, t) pairs with a consideration on the plausibility of the fact triplets and their

Experiments

In this section, we present the results of experiments performed on two real-world datasets on instance completion task. And some important hyper-parameters are analyzed their impacts on the robustness of PANC.

Conclusion

In this paper, we propose an PANC model for instance completion task consisting of two modules: prototype filter and neighbor aggregation grader. Our prototype filter first clusters all entities according to the prototype and generates more candidate relation-tail pairs set for given head entity through prototypes than types; our neighbor aggregation grader relies on neighbor aggregation to enhance the entity embedding and constrain the combination between head entity and candidate pairs and

CRediT authorship contribution statement

Ruixin Ma: Project administration. Yunlong Ma: Methodology, Software, Writing - original draft. Hongyan Zhang: Data curation, Investigation. Liang Zhao: Supervision, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This work is supported by the National Natural Science Foundation of China (61906030), the Science and Technology Project of Liaoning Province (2021JH2/10300064) and the Youth Science and Technology Star Support Program of Dalian City (2021RQ057).

References (46)

  • L. Bai et al.

    Multi-hop reasoning over paths in temporal knowledge graphs using reinforcement learning

    Applied Soft Computing

    (2021)
  • Berant, J., Chou, A., Frostig, R., & Liang, P. (2013). Semantic parsing on freebase from question-answer pairs. In...
  • Bollacker, K., Evans, C., Paritosh, P., Sturge, T., & Taylor, J. (2008). Freebase: a collaboratively created graph...
  • A. Bordes et al.

    Translating embeddings for modeling multi-relational data

    Advances in Neural Information Processing Systems

    (2013)
  • Cao, E., Wang, D., Huang, J., & Hu, W. (2020). Open knowledge enrichment for long-tail entities. In Proceedings of The...
  • Chen, W., Xiong, W., Yan, X., & Wang, W. Y. (2018). Variational knowledge graph reasoning. In Proceedings of the 2018...
  • Cui, Z., Kapanipathi, P., Talamadupula, K., Gao, T., & Ji, Q. (2021). Type-augmented relation prediction in knowledge...
  • Das, R., Dhuliawala, S., Zaheer, M., Vilnis, L., Durugkar, I., Krishnamurthy, A., Smola, A., & McCallum, A. (2018). Go...
  • Das, R., Neelakantan, A., Belanger, D., & McCallum, A. (2017). Chains of reasoning over entities, relations, and text...
  • Dettmers, T., Minervini, P., Stenetorp, P., & Riedel, S. (2018). Convolutional 2d knowledge graph embeddings. In...
  • Fu, C., Chen, T., Qu, M., Jin, W., & Ren, X. (2019). Collaborative policy learning for open knowledge graph reasoning....
  • García-Durán, A., & Niepert, M. (2017). Kblrn: End-to-end learning of knowledge base representations with latent,...
  • Huang, X., Zhang, J., Li, D., & Li, P. (2019). Knowledge graph embedding based question answering. In Proceedings of...
  • Jain, P., Kumar, P., Chakrabarti, S. et al. (2018). Type-sensitive knowledge base inference without explicit type...
  • Ji, G., He, S., Xu, L., Liu, K., & Zhao, J. (2015). Knowledge graph embedding via dynamic mapping matrix. In...
  • S.M. Kazemi et al.

    Simple embedding for link prediction in knowledge graphs

    Advances in Neural Information Processing Systems

    (2018)
  • A. Kristiadi et al.

    Incorporating literals into knowledge graph embeddings

  • N. Lao et al.

    Relational retrieval using a combination of path-constrained random walks

    Machine Learning

    (2010)
  • Lin, X. V., Socher, R., & Xiong, C. (2018). Multi-hop knowledge graph reasoning with reward shaping. In Proceedings of...
  • Y. Lin et al.

    Learning entity and relation embeddings for knowledge graph completion

    Twenty-ninth AAAI conference on artificial intelligence

    (2015)
  • H. Liu et al.

    Analogical inference for multi-relational embeddings

  • S. Ma et al.

    Transt: Type-based multiple embedding representations for knowledge graph completion

  • T. Mitchell et al.

    Never-ending learning

    Communications of the ACM

    (2018)
  • Cited by (2)

    1

    Liang Zhao: 0000-0001-6301-1311.

    View full text