Abstract
In recent years, various online social networks offering specific services have gained great popularity and success. To enjoy more online social services, some users can be involved in multiple social networks simultaneously. A challenging problem in social network studies is to identify the common users across networks to gain better understanding of user behavior. This is referred to as the anchor link prediction problem. Meanwhile, across these partially aligned social networks, users can be connected by different kinds of links, e.g., social links among users in one single network and anchor links between accounts of the shared users in different networks. Many different link prediction methods have been proposed so far to predict each type of links separately. In this paper, we want to predict the formation of social links among users in the target network as well as anchor links aligning the target network with other external social networks. The problem is formally defined as the “collective link identification” problem. Predicting the formation of links in social networks with traditional link prediction methods, e.g., classification-based methods, can be very challenging. The reason is that, from the network, we can only obtain the formed links (i.e., positive links) but no information about the links that will never be formed (i.e., negative links). To solve the collective link identification problem, a unified link prediction framework, collective link fusion (CLF) is proposed in this paper, which consists of two phases: step (1) collective link prediction of anchor and social links with positive and unlabeled learning techniques, and step (2) propagation of predicted links across the partially aligned “probabilistic networks” with collective random walk. Extensive experiments conducted on two real-world partially aligned networks demonstrate that CLF can perform very well in predicting social and anchor links concurrently.










Similar content being viewed by others
References
Adamic L, Adar E (2001) Friends and neighbors on the web. Soc Netw 25:211–230
Backstrom L, Leskovec J (2011) Supervised random walks: predicting and recommending links in social networks. In: WSDM
Chang C-C, Lin C-J (2001) LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Elkan C, Noto K (2008) Learning classifiers from only positive and unlabeled data. In: KDD
Fouss F, Pirotte A, Renders J, Saerens M (2007) Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. TKDE 19:355–369
Fujiwara Y, Nakatsuji M, Onizuka M, Kitsuregawa M (2012) Fast and exact top-k search for random walk with restart. VLDB 55:442–453
Getoor L, Diehl CP (2005) Link mining: a survey. SIGKDD Explor Newslett 7:3–12
Hasan M, Chaoji V, Salem S, Zaki M (2006) Link prediction using supervised learning. In: SDM
Hasan M, Zaki MJ (2011) A survey of link prediction in social networks. In: Aggarwal CC (ed) Social network data analytics. Springer, New York
Hsieh C-J, Natarajan N, Dhillon IS (2015) PU learning for matrix completion. In: ICML, pp 2445–2453
Hwang T, Kuang R (2010) A heterogeneous label propagation algorithm for disease gene discovery. In: SDM
Iofciu T, Fankhauser P, Abel F, Bischoff K (2011) Identifying users across social tagging systems. In: ICWSM
Jin S, Zhang J, Yu P, Yang S, Li A (2014) Synergistic partitioning in multiple large scale social networks. In: IEEE BigData
Kong X, Zhang J, Yu P (2013) Inferring anchor links across multiple heterogeneous social networks. In: CIKM
Konstas I, Stathopoulos V, Jose JM (2009) On social networks and collaborative recommendation. In: SIGIR
Lü L, Zhou T (2011) Link prediction in complex networks: a survey. Phys A Stat Mech Its Appl 390:1150–1170
Leskovec J, Huttenlocher D, Kleinberg J (2010) Predicting positive and negative links in online social networks. In: WWW
Liben-Nowell D, Kleinberg J (2003) The link prediction problem for social networks. In: CIKM
Liu B, Dai Y, Li X, Lee W, Yu P (2003) Building text classifiers using positive and unlabeled examples. In: ICDM
Liu J, Zhang F, Song X, Song Y, Lin C, Hon H (2013) What’s in a name? An unsupervised approach to link users across communities. In: WSDM
Lü L, Zhou T (2011) Link prediction in complex networks: a survey. Phys A Stat Mech Its Appl 390(6):1150–1170
Namata G, Kok S, Getoor L (2011) Collective graph identification. In: KDD
Perkins D, Salomon G (1992) Transfer of learning Pergamon Press, Oxford, England
Sahraeian S, Yoon B (2013) Smetana: accurate and scalable algorithm for probabilistic alignment of large-scale biological networks. PLoS ONE 8:e67995
Song D, Meyer D (2014) A model of consistent node types in signed directed social networks. In: ASONAM ’14 Proceedings of the 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, IEEE Press, Piscataway, NJ, USA, pp 72–80
Tong H, Faloutsos C, Pan J (2006) Fast random walk with restart and its applications. In: ICDM
Wilcox K, Stephen AT (2012) Are close friends the enemy? Online social networks, self-esteem, and self-control. J Consum Res 40:90–103
Xi W, Zhang B, Chen Z, Lu Y, Yan S, Ma W, Fox E (2004) Link fusion: a unified link analysis framework for multi-type interrelated data objects. In: WWW
Xiang R, Neville J, Rogati M (2010) Modeling relationship strength in online social networks. In: WWW
Yao Y, Tong H, Yan X, Xu F, Lu J (2013) Matri: a multi-aspect and transitive trust inference model. In: WWW
Ye J, Cheng H, Zhu Z, Chen M (2013) Predicting positive and negative links in signed social networks by transfer learning. In: WWW
Zafarani R, Liu H (2009) Connecting corresponding identities across communities. In: ICWSM
Zhan Q, Wang S, Zhang J, Yu P, Xie J (2015) Influence maximization across partially aligned heterogenous social networks. In: PAKDD
Zhang J, Kong X, Yu P (2013) Predicting social links for new users across aligned heterogeneous social networks. In: ICDM
Zhang J, Kong X, Yu P (2014) Transferring heterogeneous links across location-based social networks. In: WSDM
Zhang J, Shao W, Wang S, Kong X, Yu P (2015) Pna: Partial network alignment with generic stable matching. In: IEEE IRI
Zhang J, Yu P (2015) Community detection for emerging networks. In: SDM
Zhang J, Yu P (2015) Mcd: Mutual clustering across multiple heterogeneous networks. In: IEEE BigData Congress
Zhang J, Yu P, Zhou Z (2014) Meta-path based multi-network collective link prediction. In: KDD
Zhao Y, Kong X, Yu P (2011) Positive and unlabeled learning for graph classification. In: ICDM
Acknowledgements
This work is supported by the Fundamental Research Funds for the Central Universities under grant JUSRP11852. This work was partially supported by Florida State University Council on Research and Creativity (CRC) via the Project ID 041776. This work is also supported in part by NSF through Grants IIS-1526499, IIS-1763325, CNS-1626432 and NSFC 61672313. The views and conclusions are those of the authors and should not be interpreted as representing the official policies of the funding agencies or the government.
Author information
Authors and Affiliations
Corresponding author
Additional information
A preliminary version of this work appeared in: Proceedings of International Joint Conferences on Artificial Intelligence (IJCAI ’15), 2015.
Appendix
Appendix
Social features of anchor links have been introduced in previous part, in this part, we will introduce the social features of social links and spatial distribution features, temporal distribution features and text usage features of both anchor links and social links.
1.1 6.1. Social features
See Table 3.
\(\Gamma (u)\) is the set of neighbors of user u.
In addition to social information, we also extract features from users’ location check-ins. For a certain anchor/social link (u, v), we can get the locations that u and v have been to \(\Phi (u)\) and \(\Phi (v)\), respectively. Since each user can visit a location many times, we construct vector l(u) and l(v) for u and v, respectively, each cell in which record the times that u and v visit a certain location in \(\Phi (u) \cup \Phi (v)\).
1.2 6.2. Spatial distribution features
See Table 4.
Similarly, we can get the set of locations that u has visited from the networks, \(\Phi (u)\). For a certain anchor/social link (u, v), we can extract the spatial distribution features for it with those summarized in Table 3 except the “Adamic/Adar” measure based on \(\Phi (u)\) and \(\Phi (u)\).
1.3 6.3. Temporal distribution features
See Table 5.
Users’ temporal activity information is also used to extract features for link (u, v). Each day is divided into 24 h slots, and the number of online posts published at certain hours is stored in vector \({\mathbf {x}}(u)\) and \({\mathbf {x}}(v)\), from which we can extract \(IP({\mathbf {x}}(u), {\mathbf {x}}(v))\), \(ED({\mathbf {x}}(u), {\mathbf {x}}(v))\) and \(CS({\mathbf {x}}(u), {\mathbf {x}}(v))\) summarized in Table 5 as the temporal distribution features of link (u, v).
1.4 6.4. Text usage features
For a certain link (u, v), we can get the words that u and v have used in the past and group them as two bag-of-words vectors, \({\mathbf {x}}(u)\) and \({\mathbf {x}}(v)\), weighted by TF-IDF. From \({\mathbf {x}}(u)\) and \({\mathbf {x}}(v)\), we also extract \(IP({\mathbf {x}}(u), {\mathbf {x}}(v))\), \(ED({\mathbf {x}}(u), {\mathbf {x}}(v))\) and \(CS({\mathbf {x}}(u), {\mathbf {x}}(v))\) summarized in Table 5 as the text usage features of link (u, v).
Rights and permissions
About this article
Cite this article
Zhan, Q., Zhang, J. & Yu, P.S. Integrated anchor and social link predictions across multiple social networks. Knowl Inf Syst 60, 303–326 (2019). https://doi.org/10.1007/s10115-018-1210-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-018-1210-1