Elsevier

Expert Systems with Applications

Volume 108, 15 October 2018, Pages 143-158
Expert Systems with Applications

Exploiting user-to-user topic inclusion degree for link prediction in social-information networks

https://doi.org/10.1016/j.eswa.2018.04.034Get rights and content

Highlights

  • Introduce a model to fuse network and content in social-information networks.

  • A new topic-oriented measurement is defined to measure the user-user relation.

  • Rich content is effectively encoded in a constructed sparse network.

  • Link prediction is significantly improved in social-information networks.

Abstract

As one kind of typical network big data, social-information networks (such as Weibo and Twitter) include both the complex network structure among users and the rich microblog/tweets information published by users. Understanding the interplay of rich content and social relationships is potentially valuable to the fundamental network mining task, i.e. the link prediction. Although some of the link prediction methods have been proposed by combining topological and non-topological information simultaneously, the in-depth analysis of the rich content still being in a minority, and the rich content in the social-information networks is still underused in solving link prediction. In this paper, we approach the link prediction problem in social-information network by combining network structure and topic information which is extracted from users’ rich content. We first define a kind of user-to-user topic inclusion degree (TID) based on the dissemination mechanism of the published content in the social-information networks, and then construct a TID-based sparse network. On the basis, we build a fusion probabilistic matrix factorization model which solves the link prediction problem by fusing the information of the original following/followed network and the TID-based network in a unified probabilistic matrix factorization framework. We conduct link prediction experiments on two types of real social-information network datasets, i.e. Twitter and Weibo. The experimental results demonstrate that the proposed method is more effective in solving the link prediction problem in social-information networks.

Introduction

Network is an important organizational form of real-world data. Analyzing on network data is essential to help us explore the law of network evolution (Juszczyszyn, Musial, Budka, 2011, Zhang, Fang, Chen, Tang, 2015a), and understand the mechanism of complex systems (Li, Fu, Wang, Lu, Berezin, Stanley, Havlin, 2015, Pastor-Satorras, Castellano, Van Mieghem, Vespignani, 2015). Among the many tasks in network data analysis, link prediction (Getoor & Diehl, 2005) is the most fundamental one, and its solution is of great significance for many applications, such as finding like-minded friends in social networks (Aiello et al., 2012), recommending items in user-item networks (Xie et al., 2015), finding experts in academic networks (Pavlov & Ichise, 2007), and discovering unknown interactions in biological networks (Lu, Guo, & Korhonen, 2017).

It still remains a challenge in networks to predict the node-to-node relations with rich content. For instance, in social-information networks (Romero, Kleinberg, 2010, Rowe, Stankovic, Alani, 2012) (like Twitter and Weibo) with both social and informational properties, as the name implies. Formally, a social-information network can be modeled as G(V, E, {Tu}u ∈ V) where V denotes the set of users, E is the set of following/followed links between users, and Tu ∈ V correspond to the set of published microblogs/tweets of user u. As shown in Fig. 1, where a directed network is formed when some users begin to follow others, and such structures expose the generalized social relations among people; besides the following/followed relations in the network, rich published content, i.e. many tweets published by users, are also existed. As is well-known to those who familiar with the platforms of social-information networks (such as Twitter and Weibo), the dissemination of published content is entirely dependent on the network structure, where a tweet is usually propagated from its publisher to his/her followers. However, the formation of the network structure is probably due to many complex factors. One factor goes like this: during the process of the content dissemination, if any content appeals to some users, they would like to create following links to the information publisher/mediator. Although users’ interests seem to play an apparent role in producing the following/followed links, both the quantity and the exact contents of the factors that manipulate the formation of the links in the social-information networks are still not clear. Here comes the challenge: how to build the relationships between the rich published content and the formation of the following/followed network in a social-information network. Dealing with the challenge is essential to understand the evolution of the network structure and the dissemination mechanism of the published content in social-information networks, and is certainly the key to efficiently solve the link prediction problem in this kind of network. In this paper, we on the one hand focus on effectively exploiting the rich content in the social-information networks, and on the other hand, aim to establish a fusion model which can build the relationships between the information of the following/followed network and the rich content and then to improve the link prediction performance in the social-information networks.

For link prediction, many methods have been proposed by researchers from physics, biology, sociology, and computer science, through focusing on physical networks, biological networks, social networks, and information networks (Clauset, Moore, Newman, 2008, He, Liu, Hu, Wang, 2015, Luo, Wu, Li, 2017, Martnez, Berzal, Cubero, 2016, Moradabadi, Meybodi, 2017, Rowe, Stankovic, Alani, 2012, Soares, Prudêncio, 2013, Wang, Liang, Li, Qian, 2016). The existing metric-based methods, including neighbor-based metrics (Adamic, Adar, 2003, Ravasz, Somera, Mongru, Oltvai, Barabási, 2002, Zhu, Lü, Zhang, Zhou, 2012), path-based metrics (Katz, 1953, Lü, Jin, Zhou, 2009, Papadimitriou, Symeonidis, Manolopoulos, 2012), random walk-based metrics (Brin, Page, 1998, Fouss, Pirotte, Renders, Saerens, 2007, Jeh, Widom, 2002, Lichtenwalter, Lussier, Chawla, 2010) and auxiliary information-based metrics (Aiello, Barrat, Schifanella, Cattuto, Markines, Menczer, 2012, Anderson, Huttenlocher, Kleinberg, Leskovec, 2012, Dong, Tang, Wu, Tian, Chawla, Rao, Cao, 2012, Wang, Liao, Cao, Qi, 2015), are taken into consideration in topological or non-topological information which can reflect users personal interests and social behaviors. Compared to the metric-based methods, the network models such as hierarchical network model (Clauset, Moore, Newman, 2008, Ravasz, Somera, Mongru, Oltvai, Barabási, 2002), stochastic block model (Airoldi, Blei, Fienberg, Xing, 2008, Holland, Laskey, Leinhardt, 1983, Nowicki, Snijders, 2001) and latent-feature model (Miller, Jordan, Griffiths, 2009, Palla, Knowles, Ghahramani, 2012, Zhu, 2012) have expanded the scope of application to a certain extent. Despite these significant advances, current state-of-the-art methods may not be good enough for solving the following/followed link prediction problem in social-information networks. Of the existing metric-based and the learning-based methods, some methods have combined both the topological and non-topological information to solve link prediction problem. However, the in-depth analysis of the rich content in solving link prediction problem still being a minority, and the rich content is still underused in the existing link prediction methods. The depth mining and exploiting of the rich content may be great potential to improve the performance of link prediction in the social-information networks. Based on these considerations, we focus on addressing the following problems and dealing with link prediction task in social-information networks.

  • How to in-depth analysis and exploit the rich content effectively in social-information networks.

  • How to build a fusion model which can fuse the information of the network structure and the rich published content simultaneously and to deal with the link prediction task in social-information networks.

Concerning with these problems, this paper defines a kind of user-to-user topic inclusion degree based on the dissemination mechanism of the published content in social-information networks and constructs a topic inclusion degree-based network. On this basis, the paper builds a fusion probabilistic matrix factorization model which solves the link prediction problem by fusing the information of the original following/followed network and the topic inclusion degree-based network in a unified probabilistic matrix factorization framework. Finally, the linking probability between network nodes can be obtained based on the learning results of the model. The method provides a new way to solve the link prediction problem by fusing the two different types of semantic between users.

The rest of the paper is organized as follows. Section 2 introduces the related work, Sections 3 and 4 introduce a topic-based network construction and a fusion probabilistic matrix factorization model, respectively. Section 5 presents the link prediction algorithm based on the fusion model, and Section 6 evaluates the proposed methods with different social-information network datasets. Section 7 summarizes the whole text.

Section snippets

Related work

Research on link prediction has won increasing attention in recent years, and various link prediction methods have been proposed. Furthermore, there are also some surveys (Hasan, Zaki, 2011, Lü, Zhou, 2011, Martnez, Berzal, Cubero, 2016, Wang, Xu, Wu, Zhou, 2014) for the link-prediction problem. The existing link prediction methods can be roughly divided into two parts, i.e. the Metric-based methods and the learning based methods.

Topic inclusion degree-based network construction

To exploiting the rich content in social-information networks, we first define a user-to-user relation measurement from a perspective of the topic which refers to topic inclusion degree; then construct a network which encodes the information of the topic inclusion degree between users. The mainly used notations are listed in Table 1 before we introduce the method of this paper.

The topic inclusion degree is defined based on the dissemination mechanism of the published content in

Fusion probabilistic matrix factorization model

Given the adjacency matrixes N and C of the following/followed network and the TID-based network, the fusion probabilistic matrix factorization (FPMF) model is built to fuse the two kinds of network information in a unified probabilistic matrix factorization framework. Specifically, the FPMF model is based on the following assumptions

  • 1.

    Each network node is represented as a L-dimension latent-feature vector Ui (i{1,,n}), and U is the n × L latent-feature matrix of the n nodes in the network. We

Link prediction algorithm

We have presented the FPMF model which provides a strategy to fusion the information of the following/followed network and the topic inclusion degree-based network in a unified probabilistic matrix factorization framework. In the FPMF model, the basic part of the model is the approximation UW0U of the following/followed network N. Supposing we have learned any two users’ low-dimension vector representation Ui and Uj and the linking parameter matrix W0, the linking probability density pij from

Experiments

In this section, we conduct the experiments for the following purposes: (1) find out whether the proposed fusion model is superior to baseline methods in link prediction, (2) find out whether our method is superior to other methods in link prediction, (3) analyze the impacts of the sparseness of the constructed topic inclusion degree-based networks on the performance of link prediction, (4) find out the impacts of the weight parameter λC on link prediction.

Conclusions and future work

The study of how to accurately infer the node-to-node relations in social-information networks still remains a challenge. This study presents a fusion model, in which the information of the original following/followed network and a topic-based network are fused in one unified probabilistic matrix factorization framework. Based on the learned latent-feature representation and the learned matrix linking parameters of the fusion model, the linking probability between any pair of the network nodes

Acknowledgments

This work was supported by the State Key Program of National Natural Science Foundation of China (No.61432011, No.U1435212), the Key Scientific and Technological Project of Shanxi Province (MQ2014-09), and the 1331 Engineering Project of Shanxi Province, China.

References (89)

  • T. Wohlfarth et al.

    Semantic and event-based approach for link prediction

    Practical aspects of knowledge management

    (2008)
  • L.M. Aiello et al.

    Friendship prediction and homophily in social media

    ACM Transactions on the Web

    (2012)
  • E.M. Airoldi et al.

    Mixed membership stochastic blockmodels

    Journal of Machine Learning Research

    (2008)
  • A. Anderson et al.

    Effects of user similarity in social media

    Proceedings of the fifth ACM international conference on web search and data mining

    (2012)
  • L. Backstrom et al.

    Supervised random walks: Predicting and recommending links in social networks

    Proceedings of the fourth ACM international conference on web search and data mining

    (2011)
  • A.L. Barabási et al.

    Emergence of scaling in random networks

    Science

    (1999)
  • A.L. Barabâsi et al.

    Evolution of the social network of scientific collaborations

    Physica A: Statistical Mechanics and Its Applications

    (2002)
  • D.M. Blei et al.

    Latent dirichlet allocation

    Journal of Machine Learning Research

    (2003)
  • C.A. Bliss et al.

    An evolutionary algorithm approach to link prediction in dynamic social networks

    Journal of Computational Science

    (2014)
  • S. Brin et al.

    The anatomy of a large-scale hypertextual web search engine

    Computer Networks and ISDN Systems

    (1998)
  • ChenH.H. et al.

    Discovering missing links in networks using vertex similarity measures

    Proceedings of the 27th annual ACM symposium on applied computing

    (2012)
  • ChiangK. et al.

    Exploiting longer cycles for link prediction in signed networks

    Proceedings of the 20th ACM international conference on information and knowledge management

    (2011)
  • A. Clauset et al.

    Hierarchical structure and the prediction of missing links in networks

    Nature

    (2008)
  • H.R. De Sá et al.

    Supervised link prediction in weighted networks

    The 2011 international joint conference on neural networks

    (2011)
  • DongY. et al.

    Link prediction and recommendation across heterogeneous social networks

    IEEE international conference on data mining

    (2012)
  • F. Fouss et al.

    Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation

    IEEE Transactions on Knowledge and Data Engineering

    (2007)
  • L. Getoor et al.

    Link mining: A survey

    ACM SIGKDD Explorations Newsletter

    (2005)
  • F. Göbel et al.

    Random walks on graphs

    Stochastic Processes and Their Applications

    (1974)
  • G.H. Golub et al.

    Singular value decomposition and least squares solutions

    Numerische Mathematik

    (1970)
  • J.A. Hanley et al.

    The meaning and use of the area under a receiver operating characteristic (roc) curve

    Radiology

    (1982)
  • N. Hansen et al.

    Completely derandomized self-adaptation in evolution strategies

    Evolutionary Computation

    (2001)
  • M.A. Hasan et al.

    A survey of link prediction in social networks

    (2011)
  • P.W. Holland et al.

    Stochastic blockmodels: First steps

    Social Networks

    (1983)
  • P. Jaccard

    Etude de la distribution florale dans une portion des alpes et du jura

    Bulletin De La Societe Vaudoise Des Sciences Naturelles

    (1901)
  • G. Jeh et al.

    Simrank: A measure of structural-context similarity

    Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining

    (2002)
  • K. Juszczyszyn et al.

    Link prediction based on subgraph evolution in dynamic social networks

    IEEE international conference on social computing

    (2011)
  • LeeD.D. et al.

    Learning the parts of objects by non-negative matrix factorization

    Nature

    (1999)
  • V. Leroy et al.

    Cold start link prediction

    Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining

    (2010)
  • J. Leskovec et al.

    Predicting positive and negative links in online social networks

    Proceedings of the 19th international conference on world wide web

    (2010)
  • LiD. et al.

    Percolation transition in dynamical traffic network with evolving critical bottlenecks

    Proceedings of the National Academy of Sciences

    (2015)
  • D. Liben-Nowell et al.

    The link prediction problem for social networks

    Journal of the American Society for Information Science and Technology

    (2007)
  • R.N. Lichtenwalter et al.

    New perspectives and methods in link prediction

    Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining

    (2010)
  • F. Lorrain et al.

    Structural equivalence of individuals in social networks

    The Journal of Mathematical Sociology

    (1971)
  • L. et al.

    Similarity index based on local paths for link prediction of complex networks

    Physical Review E

    (2009)
  • Cited by (0)

    View full text