Elsevier

Neural Networks

Volume 158, January 2023, Pages 142-153
Neural Networks

UniSKGRep: A unified representation learning framework of social network and knowledge graph

https://doi.org/10.1016/j.neunet.2022.11.010Get rights and content

Abstract

The human-oriented applications aim to exploit behaviors of people, which impose challenges on user modeling of integrating social network (SN) with knowledge graph (KG), and jointly analyzing two types of graph data. However, existing graph representation learning methods merely represent one of two graphs alone, and hence are unable to comprehensively consider features of both SN and KG with profiling the correlation between them, resulting in unsatisfied performance in downstream tasks. Considering the diverse gap of features and the difficulty of associating of the two graph data, we introduce a Unified Social Knowledge Graph Representation learning framework (UniSKGRep), with the goal to leverage the multi-view information inherent in the SN and KG for improving the downstream tasks of user modeling. To the best of our knowledge, we are the first to present a unified representation learning framework for SN and KG. Concretely, the SN and KG are organized as the Social Knowledge Graph (SKG), a unified representation of SN and KG. For the representation learning of SKG, first, two separate encoders in the Intra-graph model capture both the social-view and knowledge-view in two embedding spaces, respectively. Then the Inter-graph model is learned to associate the two separate spaces via bridging the semantics of overlapping node pairs. In addition, the overlapping node enhancement module is designed to effectively align two spaces with the consideration of a relatively small number of overlapping nodes. The two spaces are gradually unified by continuously iterating the joint training procedure. Extensive experiments on two real-world SKG datasets have proved the effectiveness of UniSKGRep in yielding general and substantial performance improvement compared with the strong baselines in various downstream tasks.

Introduction

In recent years, the human-oriented applications like online social platform (Shi, Yang, Weninger, How, & He, 2019) and social recommendation (Huang et al., 2021) portray users by exploiting related social behaviors and background concepts of users, which are derived from the social network(SN) and the knowledge graph(KG). The two types of graph data describe the users in two views: SN reflects the social behaviors among people (Daud, Ab Hamid, Saadoon, Sahran, & Anuar, 2020), and KG describes relational facts of famous people and other types of entities in the real world (Ji, Pan, Cambria, Marttinen, & Philip, 2021). Furthermore, SN and KG both possess different characteristics respectively: SN can easily be collected on a large scale through automatic procedures such as crawlers and covers a large number of users, but is weakly semantic and lacks interpretability. For example, friends, colleagues, or relatives of users cannot directly be identified through followers on Twitter. KG contains rich semantic information among entities and relations. However, KG suffers from the incompletion circumstance, which results from rare human annotations and difficulty in gathering (Huang et al., 2022). The low-resource feature of KG causes the long-tail effect of entity occurrence frequency (Zhang et al., 2021), which indicates that the background knowledge of most people in KG is sparse, even blank.

For leveraging these two types of graph-structure data, graph representation learning (GRL) methods have been devoted to encoding nodes to low-dimensional representations, which can be applied for the node features in various downstream tasks on graphs (Hamilton, 2020), and have recently garnered considerable attention from social network analysis (Hamilton, Ying, & Leskovec, 2017) and knowledge computing (Ji et al., 2021). However, they are designed for specific graph types, and hence only describe users from a single view, either describing the neighborhoods or community features of users on the social network view  (Kipf and Welling, 2016a, Veličković et al., 2018, Zhu et al., 2020) or representing semantics in heterogeneous concepts on the knowledge graph view (Hu et al., 2020, Nathani et al., 2019, Schlichtkrull et al., 2018), but ignore the correlation of SN and KG, which usually underperform in downstream tasks of user analysis.

In fact, SN and KG are not naturally isolated to each other, but usually share some real-world people. An intuitive explanation of how the SN and KG could be associated is illustrated in Fig. 1 : the overlapping node pairs, which denote whether a user in the SN corresponds to an entity in the KG, can be treated as bridges. Properly associating and representing SN with KG will provide more comprehensive insights for user modeling. On the one hand, SN facilitates social view information propagation to low-resource KGs to alleviate the sparsity problem. For example, for an NBA player with few facts in KG, we can infer more hidden facts via his social behaviors related to other players and fans, such as following relations and liked interactions. On the other hand, KG injects rich semantics and provides strong explainability for explaining social behaviors in SN. For example, the user @Eve follows all players that play for LA Lakers, indicating @Eve may fan LA Lakers.

Despite the potential benefits, associating and leveraging the information from both SN and KG represents three non-trivial challenges: (1) The inconsistency issue of SN and KG. The features such as scales and structures are largely different between the two graph types, where the KG includes the multifarious entities and relations, and SN is much larger and with only person-type nodes. Therefore, the features in SN and KG should be modeled by different types of GRL respectively. (2) The semantic gap issue of embedding spaces. Even if different GRL methods are applied to encode the two graphs, the two embedding spaces are diverse from each other and far from unified. (3) The scarcity issue of overlapping node pairs. Since SN excludes the entities in KG other than person-type entities, and the person-type entities are usually “prominent enough” people to be included in the KG, the overlapping nodes often inadequately cover a vast number of users in SN and entities in KG, which leads to undesirable unification of two embedding spaces.

To meet the challenges above, we introduce a Unified Social Knowledge Graph Representation learning framework named UniSKGRep, which suits the representation learning of related SN and KG data. To tackle the inconsistency issue of SN and KG, both two types of graph data are integrated as the Social Knowledge Graph (SKG) through overlapping node pairs. Based on the SKG, the Intra-graph model captures the social-view and knowledge-view information of users with adaptive GRL methods in two separate embedding spaces. To handle the semantic gap issue between two embedding spaces, the Inter-graph model learns to associate two embedding spaces via bridging overlapping node pairs. The semantic transforming strategy that enables the overlapping nodes pairs sharing semantics via seamlessly transforming from one space to the other. To alleviate the scarcity issue of overlapping node pairs, UniSKGRep exploits the overlapping node enhancement module in the Inter-graph model to effectively align two spaces via a limited number of overlapping node pairs. This is achieved by enhancing node-wise and neighbor-wise features to convey more information. The two spaces are gradually unified by continuously iterating the joint training procedure, which is designed to ensure that the unified node representations can be aware of downstream tasks and the alignment of the two spaces.

We conduct extensive experiments to verify the effectiveness of UniSKGRep. Firstly, we crawl and combine the real-world SN and KG data, thereby constructing two representative SKG datasets: OAG-SKG in the academic field and Twiki in the sports field. Next, we evaluate the UniSKGRep in two representative downstream tasks: node classification and link prediction, and compare with state-of-the-art baseline models. UniSKGRep has significantly outperformed these baseline models of two tasks, confirming that the UniSKGRep effectively learns and associates information from both SN and KG and can be applied to various downstream tasks.

In general, our main contributions are fourfold:

(1) To the best of our knowledge, we are the first to propose a unified representation learning framework UniSKGRep, which tackles the inconsistency issue by comprehensively leveraging the information of users from both SN and KG.

(2) In UniSKGRep, we propose a semantic transforming component and the overlapping node enhancement module. The former handles the semantic gap of SN and KG by introducing the transforming layer of the crossing-graph overlapping nodes, and the latter alleviates the scarcity issue of overlapping node pairs with self-supervision.

(3) We propose the novel joint training procedure of UniSKGRep simultaneously learning on the node classification and the link prediction task, and experimental results of the two tasks have proved that UniSKGRep outperforms state-of-the-art models.

(4) The two social knowledge graph datasets we crawled, combined, and constructed from open-source data provide new benchmarks to study the combination of SN and KG.

Section snippets

Related work

To the best of our knowledge, there is no previous work on learning to represent SN and KG in two views. We discuss the following three lines of research work that are closely relevant to this paper.

Graph Representation Learning (GRL). GRL methods mainly focus on learning the low-dimension node representations according to the node features and graph structure and apply representations for various downstream tasks in graph analyzing.

For social networks, GRL methods analyze social behaviors

Problem formulation

The original data consists of a social network GSN={VSN,ESN}, a knowledge graph GKG={VKG,EKG} and an overlapping nodes pair set S. GSN is a homogeneous graph which is formed with the person type nodes set VSN, and the set of edges ESN. The sets of entities and relations in the GKG are respectively denoted as VKG and EKG. The intersection of VSN and VKG is the overlapping nodes set Vol. S denotes the overlapping node pair set. We use (s,k)S to denote an overlapping node pair, where sVSN, kVKG

The framework

This section introduces our proposed unified representation learning framework UniSKGRep, which jointly learns unified node embedding using the Intra-graph model and the Inter-graph model in the joint training procedure. The illustration of the UniSKGRep framework is shown in Fig. 2.

Experiments

We evaluate UniSKGRep performance on two representative tasks in social network analysis and knowledge reasoning: node classification and link prediction. In this section, we first describe the proposed datasets, tasks, evaluation metrics, and baseline methods. Next, we give a detailed analysis of the results in two downstream tasks. Then, we also conduct the extension experiments of ablation study and hyperparameter analysis. Finally, we provide the visualization of node embeddings and the

Conclusion and future work

In this paper, we propose a unified representation learning framework for the social network and knowledge graph called UniSKGRep. The UniSKGRep aims to addressing the inconsistency issue of SN and KG by integrating SN and KG into the social knowledge graph and jointly learning and unifying the two representation spaces via the Intra- and Inter-graph models. It features the semantic transforming component and the overlapping node pair enhancement module to tackle the semantic gap of SN with KG

Declaration of Competing Interest

The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Yuanzhuo Wang reports financial support was provided by Zhongke Big Data Academy.

Acknowledgments

This work is supported by the National Natural Science Foundation of China (No. U1836206, 91646120, U21B2046, 62172393),   the National Key Research and Development Program of China under grants (No. 2018YFB1402601),  the  Zhongyuanyingcai program-funded to central plains science and technology innovation leading talent program (No. 204200510002) and Major Public Welfare Project of Henan Province (No. 201300311200).

References (39)

  • Chen, H., Yin, H., Sun, X., Chen, T., Gabrys, B., & Musial, K. (2020). Multi-level graph convolutional networks for...
  • Fan, W., Ma, Y., Li, Q., He, Y., Zhao, E., Tang, J., et al. (2019). Graph neural networks for social recommendation. In...
  • GlorotX. et al.

    Understanding the difficulty of training deep feedforward neural networks

  • HamiltonW.L.

    Graph representation learning

    Synthesis Lectures on Artifical Intelligence and Machine Learning

    (2020)
  • HamiltonW.L. et al.

    Representation learning on graphs: Methods and applications

    (2017)
  • Hao, J., Chen, M., Yu, W., Sun, Y., & Wang, W. (2019). Universal representation learning of knowledge bases by jointly...
  • He, Q., Yang, J., & Shi, B. (2020). Constructing knowledge graph for social networks in a deep and holistic way. In...
  • Hu, Z., Dong, Y., Wang, K., & Sun, Y. (2020). Heterogeneous graph transformer. In Proceedings of the web conference...
  • Huang, Z., Li, Z., Jiang, H., Cao, T., Lu, H., Yin, B., et al. (2022). Multilingual Knowledge Graph Completion with...
  • Cited by (0)

    1

    These authors contribute equally.

    View full text