Exploiting user-to-user topic inclusion degree for link prediction in social-information networks
Introduction
Network is an important organizational form of real-world data. Analyzing on network data is essential to help us explore the law of network evolution (Juszczyszyn, Musial, Budka, 2011, Zhang, Fang, Chen, Tang, 2015a), and understand the mechanism of complex systems (Li, Fu, Wang, Lu, Berezin, Stanley, Havlin, 2015, Pastor-Satorras, Castellano, Van Mieghem, Vespignani, 2015). Among the many tasks in network data analysis, link prediction (Getoor & Diehl, 2005) is the most fundamental one, and its solution is of great significance for many applications, such as finding like-minded friends in social networks (Aiello et al., 2012), recommending items in user-item networks (Xie et al., 2015), finding experts in academic networks (Pavlov & Ichise, 2007), and discovering unknown interactions in biological networks (Lu, Guo, & Korhonen, 2017).
It still remains a challenge in networks to predict the node-to-node relations with rich content. For instance, in social-information networks (Romero, Kleinberg, 2010, Rowe, Stankovic, Alani, 2012) (like Twitter and Weibo) with both social and informational properties, as the name implies. Formally, a social-information network can be modeled as G(V, E, {Tu}u ∈ V) where V denotes the set of users, E is the set of following/followed links between users, and Tu ∈ V correspond to the set of published microblogs/tweets of user u. As shown in Fig. 1, where a directed network is formed when some users begin to follow others, and such structures expose the generalized social relations among people; besides the following/followed relations in the network, rich published content, i.e. many tweets published by users, are also existed. As is well-known to those who familiar with the platforms of social-information networks (such as Twitter and Weibo), the dissemination of published content is entirely dependent on the network structure, where a tweet is usually propagated from its publisher to his/her followers. However, the formation of the network structure is probably due to many complex factors. One factor goes like this: during the process of the content dissemination, if any content appeals to some users, they would like to create following links to the information publisher/mediator. Although users’ interests seem to play an apparent role in producing the following/followed links, both the quantity and the exact contents of the factors that manipulate the formation of the links in the social-information networks are still not clear. Here comes the challenge: how to build the relationships between the rich published content and the formation of the following/followed network in a social-information network. Dealing with the challenge is essential to understand the evolution of the network structure and the dissemination mechanism of the published content in social-information networks, and is certainly the key to efficiently solve the link prediction problem in this kind of network. In this paper, we on the one hand focus on effectively exploiting the rich content in the social-information networks, and on the other hand, aim to establish a fusion model which can build the relationships between the information of the following/followed network and the rich content and then to improve the link prediction performance in the social-information networks.
For link prediction, many methods have been proposed by researchers from physics, biology, sociology, and computer science, through focusing on physical networks, biological networks, social networks, and information networks (Clauset, Moore, Newman, 2008, He, Liu, Hu, Wang, 2015, Luo, Wu, Li, 2017, Martnez, Berzal, Cubero, 2016, Moradabadi, Meybodi, 2017, Rowe, Stankovic, Alani, 2012, Soares, Prudêncio, 2013, Wang, Liang, Li, Qian, 2016). The existing metric-based methods, including neighbor-based metrics (Adamic, Adar, 2003, Ravasz, Somera, Mongru, Oltvai, Barabási, 2002, Zhu, Lü, Zhang, Zhou, 2012), path-based metrics (Katz, 1953, Lü, Jin, Zhou, 2009, Papadimitriou, Symeonidis, Manolopoulos, 2012), random walk-based metrics (Brin, Page, 1998, Fouss, Pirotte, Renders, Saerens, 2007, Jeh, Widom, 2002, Lichtenwalter, Lussier, Chawla, 2010) and auxiliary information-based metrics (Aiello, Barrat, Schifanella, Cattuto, Markines, Menczer, 2012, Anderson, Huttenlocher, Kleinberg, Leskovec, 2012, Dong, Tang, Wu, Tian, Chawla, Rao, Cao, 2012, Wang, Liao, Cao, Qi, 2015), are taken into consideration in topological or non-topological information which can reflect users personal interests and social behaviors. Compared to the metric-based methods, the network models such as hierarchical network model (Clauset, Moore, Newman, 2008, Ravasz, Somera, Mongru, Oltvai, Barabási, 2002), stochastic block model (Airoldi, Blei, Fienberg, Xing, 2008, Holland, Laskey, Leinhardt, 1983, Nowicki, Snijders, 2001) and latent-feature model (Miller, Jordan, Griffiths, 2009, Palla, Knowles, Ghahramani, 2012, Zhu, 2012) have expanded the scope of application to a certain extent. Despite these significant advances, current state-of-the-art methods may not be good enough for solving the following/followed link prediction problem in social-information networks. Of the existing metric-based and the learning-based methods, some methods have combined both the topological and non-topological information to solve link prediction problem. However, the in-depth analysis of the rich content in solving link prediction problem still being a minority, and the rich content is still underused in the existing link prediction methods. The depth mining and exploiting of the rich content may be great potential to improve the performance of link prediction in the social-information networks. Based on these considerations, we focus on addressing the following problems and dealing with link prediction task in social-information networks.
- •
How to in-depth analysis and exploit the rich content effectively in social-information networks.
- •
How to build a fusion model which can fuse the information of the network structure and the rich published content simultaneously and to deal with the link prediction task in social-information networks.
Concerning with these problems, this paper defines a kind of user-to-user topic inclusion degree based on the dissemination mechanism of the published content in social-information networks and constructs a topic inclusion degree-based network. On this basis, the paper builds a fusion probabilistic matrix factorization model which solves the link prediction problem by fusing the information of the original following/followed network and the topic inclusion degree-based network in a unified probabilistic matrix factorization framework. Finally, the linking probability between network nodes can be obtained based on the learning results of the model. The method provides a new way to solve the link prediction problem by fusing the two different types of semantic between users.
The rest of the paper is organized as follows. Section 2 introduces the related work, Sections 3 and 4 introduce a topic-based network construction and a fusion probabilistic matrix factorization model, respectively. Section 5 presents the link prediction algorithm based on the fusion model, and Section 6 evaluates the proposed methods with different social-information network datasets. Section 7 summarizes the whole text.
Section snippets
Related work
Research on link prediction has won increasing attention in recent years, and various link prediction methods have been proposed. Furthermore, there are also some surveys (Hasan, Zaki, 2011, Lü, Zhou, 2011, Martnez, Berzal, Cubero, 2016, Wang, Xu, Wu, Zhou, 2014) for the link-prediction problem. The existing link prediction methods can be roughly divided into two parts, i.e. the Metric-based methods and the learning based methods.
Topic inclusion degree-based network construction
To exploiting the rich content in social-information networks, we first define a user-to-user relation measurement from a perspective of the topic which refers to topic inclusion degree; then construct a network which encodes the information of the topic inclusion degree between users. The mainly used notations are listed in Table 1 before we introduce the method of this paper.
The topic inclusion degree is defined based on the dissemination mechanism of the published content in
Fusion probabilistic matrix factorization model
Given the adjacency matrixes N and C of the following/followed network and the TID-based network, the fusion probabilistic matrix factorization (FPMF) model is built to fuse the two kinds of network information in a unified probabilistic matrix factorization framework. Specifically, the FPMF model is based on the following assumptions
- 1.
Each network node is represented as a L-dimension latent-feature vector Ui (), and U is the n × L latent-feature matrix of the n nodes in the network. We
Link prediction algorithm
We have presented the FPMF model which provides a strategy to fusion the information of the following/followed network and the topic inclusion degree-based network in a unified probabilistic matrix factorization framework. In the FPMF model, the basic part of the model is the approximation UW0U⊤ of the following/followed network N. Supposing we have learned any two users’ low-dimension vector representation Ui and Uj and the linking parameter matrix W0, the linking probability density pij from
Experiments
In this section, we conduct the experiments for the following purposes: (1) find out whether the proposed fusion model is superior to baseline methods in link prediction, (2) find out whether our method is superior to other methods in link prediction, (3) analyze the impacts of the sparseness of the constructed topic inclusion degree-based networks on the performance of link prediction, (4) find out the impacts of the weight parameter λC on link prediction.
Conclusions and future work
The study of how to accurately infer the node-to-node relations in social-information networks still remains a challenge. This study presents a fusion model, in which the information of the original following/followed network and a topic-based network are fused in one unified probabilistic matrix factorization framework. Based on the learned latent-feature representation and the learned matrix linking parameters of the fusion model, the linking probability between any pair of the network nodes
Acknowledgments
This work was supported by the State Key Program of National Natural Science Foundation of China (No.61432011, No.U1435212), the Key Scientific and Technological Project of Shanxi Province (MQ2014-09), and the 1331 Engineering Project of Shanxi Province, China.
References (89)
- et al.
Friends and neighbors on the web
Social Networks
(2003) - et al.
Analysis of user keyword similarity in online social networks
Social Network Analysis and Mining
(2011) - et al.
A measure of similarity between graph vertices: applications to synonym extraction and web searching
SIAM Review
(2004) - Chang, P. C., Galley, M., & Manning, C. D. (2008). Optimizing chinese word segmentation for machine translation...
- et al.
Owa operator based link prediction ensemble for social network
Expert Systems with Applications
(2015) A new status index derived from sociometric analysis
Psychometrika
(1953)- et al.
Modeling social networks with node attributes using the multiplicative attribute graph model
The 27th conference on uncertainty in artificial intelligence
(2011) Clustering and preferential attachment in growing networks
Physical Review E
(2001)- et al.
Estimation and prediction for stochastic block structures
Journal of the American Statistical Association
(2001) - et al.
Recommending positive links in signed social networks by optimizing a generalized auc
Proceedings of the twenty-ninth AAAI conference on artificial intelligence
(2015)