Elsevier

Journal of Systems and Software

Volume 85, Issue 9, September 2012, Pages 2119-2132
Journal of Systems and Software

Fast and accurate link prediction in social networking systems

https://doi.org/10.1016/j.jss.2012.04.019Get rights and content

Abstract

Online social networks (OSNs) recommend new friends to registered users based on local-based features of the graph (i.e. based on the number of common friends that two users share). However, OSNs do not exploit all different length paths of the network. Instead, they consider only pathways of maximum length 2 between a user and his candidate friends. On the other hand, there are global-based approaches, which detect the overall path structure in a network, being computationally prohibitive for huge-sized social networks. In this paper we provide friend recommendations, also known as the link prediction problem, by traversing all paths of a limited length, based on the “algorithmic small world hypothesis”. As a result, we are able to provide more accurate and faster friend recommendations. We also derive variants of our method that apply to different types of networks (directed/undirected and signed/unsigned). We perform an extensive experimental comparison of the proposed method against existing link prediction algorithms, using synthetic and three real data sets (Epinions, Facebook and Hi5). We also show that a significant accuracy improvement can be gained by using information about both positive and negative edges. Finally, we discuss extensively various experimental considerations, such as a possible MapReduce implementation of FriendLink algorithm to achieve scalability.

Highlights

► We define a new node similarity measure. ► We provide more accurate friend recommendations, by traversing paths of different length. ► We provide higher efficiency than the global-based approaches, by limiting our traversing ℓ-length paths. ► We derive variants of our method for directed/undirected and signed/unsigned networks.

Introduction

Online social networks (OSNs) such as Facebook.com, Myspace.com, Hi5-.com, etc. contain gigabytes of data that can be mined to make predictions about who is a friend of whom. OSNs gather information on users’ social contacts, construct a large interconnected social network, and recommend other people to users based on their common friends. The premise of these recommendations is that individuals might only be a few steps from a desirable social friend, but not realize it.

In this paper, which is an extension of our previously published work in Papadimitriou et al. (2011), we focus on recommendations based on links that connect the nodes of an OSN, known as the Link Prediction problem, where there are two main approaches that handle it (Liben-Nowell and Kleinberg, 2003). The first one is based on local features of a network, focusing mainly on the nodes structure; the second one is based on global features, detecting the overall path structure in a network. For instance, an example of a local-based approach is shown in Fig. 1. Facebook.com or Hi5.com use the following style of recommendation for recommending new friends to a target user U1: “People you may know: (i) user U7 because you have two common friends (user U5 and user U6) (ii) user U9 because you have one common friend (user U8) …”. The list of recommended friends is ranked based on the number of common friends each candidate friend has with the target user.

Compared to approaches which are based on local-based features of a network, we expand user's neighborhood horizon by exploiting paths of greater length. In contrast, they consider only pathways of maximum length 2 between a target user and his candidate friends. In our approach, we assume that a person can be connected to another with many paths of different length (through human chains). For example, in Fig. 1, according to existing OSNs, U1 would get as friend recommendation with equal probability U4 or U7. However, if we take into account also paths of length 3, then U4 should have a higher probability to be recommended as a friend to U1. Compared to global-based approaches, which detect the overall path structure in a network, our method is more efficient. This means, that our method, which is based on a limited path traversal, requires less time and space complexity than the global based algorithms. The reason is that we traverse only paths of length ℓ in a network based on the “algorithmic small world hypothesis”, whereas global-based approaches detect the overall path structure.

The contributions of our approach are summarized as follows: (i) We define a new node similarity measure that exploits local and global characteristics of a network. (ii) We provide more accurate friend recommendations, by traversing paths of different length that connect a person to all other persons in an OSN. (iii) We provide higher efficiency than the global-based approaches, by limiting our traversing in ℓ-length paths in a network. (iv) We also derive variants of our method that apply to different types of networks (directed/undirected and signed/unsigned). We show that a significant accuracy improvement can be gained by using information about both positive and negative edges. (v) To run our algorithm with huge sized networks, we discuss its possible MapReduce (Dean and Ghemawat, 2008) implementation. Note that this paper is an extension of our previously published work in Papadimitriou et al. (2011).

The rest of this paper is organized as follows. Section 2 summarizes the related work, whereas Section 3 briefly reviews preliminaries in graphs employed in our approach. Section 4 defines a new node similarity measure in OSNs. A motivating example, the proposed approach, its complexity analysis, and the extension of FriendLink for different types of networks, i.e. signed networks, are described in Section 5. Experimental results are given in Section 6. Also, in Section 7 we discuss the scalability of our method by proposing a possible MapReduce implementation. Finally, Section 8 discusses basic research questions, whereas Section 9 concludes this paper.

Section snippets

Related work

Based on his provocative “small world” experiments, Stanley Milgram claimed that everyone in the world could be connected to everyone else via an average small path length (Milgram, 1967). This experiment is also known as the “six degrees of separation”, although Milgram did not use this term himself. Recently, Goel et al. (2009) reported experiments for the “algorithmic small-world hypothesis”, where half of all chains can be completed in 6–7 steps, supporting the “six degrees of separation”

Preliminaries in graphs

A graph G=(V,E) is a set V of vertices and a set E of edges such that an edge joins a pair of vertices. In this paper, G will always be a general undirected and unvalued graph as shown in Fig. 1. G expresses friendships among users of an OSN and will be used as our running example, throughout the paper.

The adjacency matrix A of graph G is a matrix with rows and columns labeled by graph vertices, with a 1 or 0 in position (vi,vj) according to whether vi and vj are friends or not. For an

Defining a node similarity measure

In this section, we define a new similarity measure to determine a way of expressing the proximity among graph nodes. Let vi and vj be two graph nodes and sim(vi,vj) a function that expresses their similarity. The higher the similarity score between two nodes, the higher the possibility of them being friends.

Suppose that two persons in an OSN want to have a relationship, but the shortest path between them is blocked by a reluctant broker. If there exists another pathway, the two persons are

The proposed approach

In this section, through a motivating example we first provide the outline of our approach, named FriendLink. Next, we analyze the steps of the proposed algorithm.

Experimental evaluation

In this section, we compare experimentally our FriendLink algorithm with 8 other link prediction algorithms. In particular, we use in the comparison the Markov diffusion kernel (Fouss et al., 2006), the Regularized commute-time kernel (Fouss et al., 2012), the Random Walk with Restart (Pan et al., 2004) algorithm, the Katz (1953) status index, the Adamic and Adar (2005), the Preferential Attachment (Newman, 2001), the Friend of a Friend (Chen et al., 2009) and the Shortest Path (Fredman and

Scalability

There are many difficulties in the study of the link prediction problem. One of them is the huge size of real systems. For instance, Facebook has over 500 million users with an average of roughly 100 friends each. To run our algorithm for huge sized networks, it should be adjusted to support a MapReduce (Dean and Ghemawat, 2008) implementation. MapReduce is a distributed computing model for processing large volumes of data. MapReduce is implemented in three steps: (i) splitting up the computing

Discussion

Real networks have many complex structural properties (Costa et al., 2007), such as degree heterogeneity, the rich-club phenomenon, the mixing pattern, etc. These network properties are not considered by our synthetic network model, since they are out of the scope of this paper. However, our synthetic network model can be easily extended to better resemble real networks. For example, by applying the degree heterogeneity index (Costa et al., 2007) with a probability p, a synthetic network with

Conclusions

Online social networking systems have become popular because they allow users to share content, such as videos and photos, and expand their social circle, by making new friendships. In this paper, we introduced a framework to provide friend recommendations in OSNs. Our framework's advantages are summarized as follows:

  • We define a new node similarity measure that exploits local and global characteristics of a network. Our FriendLink algorithm, takes into account all ℓ-length paths that connect a

Alexis Papadimitriou received a Bachelor (BSc) in Computer Science from Sussex University of the UK in 2004. He also received a Master diploma (MSc) in Distributed Systems from Brighton University in 2005. He has just received his PhD at Aristotle University of Thessaloniki, Greece. His research interests include sensor networks, data mining, and social networks.

References (30)

  • L. Adamic et al.

    How to search a social network

    Social Networks

    (2005)
  • A.L. Barabasi et al.

    Evolution of the social network of scientific collaborations

    Physica A

    (2002)
  • J. Chen et al.

    Make new friends, but keep the old: recommending people on social networking sites

  • L. da F. Costa et al.

    Characterization of complex networks: a survey of measurements

    Advances in Physics

    (2007)
  • J. Dean et al.

    Mapreduce: simplified data processing on large clusters

    Communications of the ACM

    (2008)
  • K.C. Foster et al.

    A faster Katz status score algorithm

    Computational & Mathematical Organization Theory

    (2001)
  • F. Fouss et al.

    An experimental investigation of graph kernels on a collaborative recommendation task

  • F. Fouss et al.

    Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation

    IEEE Transactions on Knowledge Data Engineering

    (2007)
  • Fouss, F., Francoisse, K., Yen, L., Pirotte, A., Saerens, M., 2012. An experimental investigation of graph kernels on...
  • M. Fredman et al.

    Fibonacci heaps and their uses in improved network optimization algorithms

    Journal of the ACM

    (1987)
  • S. Goel et al.

    Social search in ‘small-world’ experiments

  • Hage, P., Harary, F., 1983. Structural models in...
  • G. Jeh et al.

    Simrank: a measure of structural-context similarity

  • L. Katz

    A new status index derived from sociometric analysis

    Psychometrika

    (1953)
  • Lü, L., Jin, C.-H., Zhou, T., 2009. Similarity index based on local paths for link prediction of complex networks....
  • Cited by (0)

    Alexis Papadimitriou received a Bachelor (BSc) in Computer Science from Sussex University of the UK in 2004. He also received a Master diploma (MSc) in Distributed Systems from Brighton University in 2005. He has just received his PhD at Aristotle University of Thessaloniki, Greece. His research interests include sensor networks, data mining, and social networks.

    Panagiotis Symeonidis received a Bachelor (BA) in Applied Informatics from Macedonia University of Greece in 1996. He also received a Master diploma (MSc) in Information Systems from the same University in 2004. He received his PhD in Web Mining and Information Retrieval for Personalization from the Department of Informatics in Aristotle University of Thessaloniki, Greece in 2008. Currently, he is working as a post-doc researcher at the Department of Informatics, Aristotle University of Thessaloniki, Greece. He has published more than 25 papers in refereed scientific journals and conference proceedings. His work has received over 120 citations. His research interests include web mining (usage mining, content mining and graph mining), information retrieval and filtering, recommender systems, social media in Web 2.0 and online social networks.

    Yannis Manolopoulos received his B.Eng (1981) in Electrical Eng. and his Ph.D. (1986) in Computer Eng., both from the Aristotle Univ. of Thessaloniki. Currently, he is Professor at the Department of Informatics of the latter university. He has been with the Department of Computer Science of the Univ. of Toronto, the Department of Computer Science of the Univ. of Maryland at College Park and the Department of Computer Science of the Univ. of Cyprus. He has published more than 200 papers in refereed scientific journals and conference proceedings. His work has received over 2000 citations from over 450 institutional groups. His research interests include databases, data mining, web and geographical information systems, bibliometrics/webometrics.

    A preliminary version of this paper entitled “Predicting Links in Social Networks of Trust via Bounded Local Path Traversal” has been presented at the 3rd Conference on Computational Aspects of Social Networks (CASON’2011).

    View full text