Elsevier

Social Networks

Volume 65, May 2021, Pages 1-7
Social Networks

Link prediction in growing networks with aging

https://doi.org/10.1016/j.socnet.2020.11.001Get rights and content

Highlights

  • We find existing link prediction methods perform poorly in growing networks with aging.

  • We develop time-sliced metrics that take into account time information when predicting links in growing networks.

  • We propose methods for estimating the aging speed of growing networks.

  • The new methods are validated in both modeled and real growing networks.

Abstract

The link prediction problem aims to predict new links for future, or missing links or unobserved links in complex networks. Traditional link prediction methods are mostly concentrated on static networks. In this paper, we mainly explore link prediction problems in growing networks. We propose a series of time-sliced metrics to estimate the likelihood of the existence of missing links between two nodes for evolving networks based on traditional link prediction indices. We found that these proposed metrics outperform existing metrics for growing networks with time decay factor, especially when the decay factors are small. Besides, to improve prediction efficiency and practicability, we propose the function expressions for optimal slice number and decay factor for real-world networks. The formula enables us to estimate the aging speed of real growing networks, resulting in accurate and fast prediction of missing links in growing networks.

Introduction

Many real-world systems can be described as complex networks, where nodes represent individuals or organizations and edges represent interactions among them (Barabási et al., 2000). Complex networks play an increasingly important role in understanding and analyzing a wide range of complex systems (Newman, 2009, Redner, 2005). Examples include communication networks, human social networks, infrastructure networks as well as metabolic networks. Studies of networks within the physics community mainly focus on two categories: one is the structure and function of networks; and the other is the evolution of networks themselves. For most real-world networks, their structures are evolving over time, with new nodes and links appearing and meanwhile old nodes and links disappearing. These kinds of networks are commonly referred to as growing networks (Tadić, 2001, Strogatz, 2001). It is widely believed that the growth of networks leads to many different features compared with static networks, such as the acyclic structure in citation networks and the power-law degree distribution in the Internet (Bianconi and Barabási, 2001, Albert et al., 1999).

One specific question that is of great importance concerns link prediction on the evolving networks. The link prediction problem aims to predict new links for future, or missing links or unobserved links in complex networks. Link prediction has many important applications. For example, it can be applied to recommender systems which recommend new friends (Aiello et al., 2012), potential collaborators (Mori et al., 2012, Wu et al., 2013) or online products (Akcora et al., 2011) to users. It can also be used to infer unknown information based on partial information observed in networks (Marchette and Priebe, 2008, Kim and Leskovec, 2011). To design accurate link prediction algorithms, much efforts has been paid by scientists from different fields.

In some social networks, new links are more likely to be generated between nodes who are topologically closer to each other. This motivates scientists to design many neighbor-based metrics for link prediction. In a pioneer work, Newman verified that the common neighbor (CN) (Lorrain and White, 1977) is a good index in the context of collaboration networks for link prediction (Newman, 2001). Based on common neighbor index, there have emerged a large number of metrics using different ways to remove the size bias of common neighbors. These metrics include Jaccard Coefficient index (Jaccard, 1901), Hub Promoted index, Hub depressed index (Redner, 2008), Salton index (Salton and McGill, 1983), Preferential Attachment index (Barabási et al., 2002), etc. Other studies also proposed to incorporate paths information to calculate the similarity scores of node pairs, leading to many path-based metrics such as Local Path (Zhou et al., 2009, Lü et al., 2009), Katz index (Katz, 1953), Relation Strength Similarity (Chen et al., 2012), and FriendLink (Papadimitriou et al., 2012). Apart from these neighbor-based and path-based metrics, random walk models are also used for link prediction. Transition probability in random walk means the transfer likelihood from one node to another node, so this to some extend can represent the generation probability of a new link. Hitting Time (Fouss et al., 2007), SimRank (Jeh and Widom, 2004) and Rooted PageRank (Liben-Nowell and Kleinberg, 2007) are link prediction metrics which calculate similarities between nodes based on random walk.

However, these three kinds of methods do not consider temporal information of nodes, so they may perform poorly in evolving networks. In the real world, some scientists are more likely to cite relatively recent articles instead of old papers, and readers are more likely to read timely news instead of outdated news. This implies that two nodes tend to connect with each other if they are active in the same period, which motivates us to create more practical and useful metrics for link prediction for growing networks. In this paper, we propose several new metrics based on time-sliced technique to predict missing links in aging networks. We find that our proposed metrics outperform the existing methods in both synthetic networks and real-world networks.

Section snippets

Model

As we mentioned above, in our real world, scientists are more likely to cite relatively recent articles instead of old papers, and readers are more likely to read real-time news instead of outdated news. To model this aging effect in growing networks, researchers have proposed preferential attachment model with an exponential time decay factor (Zhu et al., 2003, Medo et al., 2011). Here, we will consider link prediction problem in this network model. The model starts with two connected nodes at

Methods based on time-sliced network

The aforementioned methods do not consider temporal information of nodes. For example, people who have similar age are more likely to be friends with each other. This implies that two nodes tend to connect with each other if they are active in the same period, which motivates us to create more practical and useful methods. Here, we proposed link prediction indicators that consider temporal information of the networks. In our methods, each network is divided into multiple time-sliced networks

Data

In this section, we will consider twelve representative real-world networks drawn from disparate fields: eight news networks and four citation networks. The news networks are sub-networks extracted from six major German news sources between June 2014 and March 2015, including: the Asia network, the Australia network, the China network, the Europe network, the India network, the Latin-America network, the Middle-East network and the US–Canada network.

  • 1.

    The news networks (the Asia network, the

Conclusion

In this paper, we firstly proposed a series of slice-based metrics to estimate the likelihood of the existence of missing links for evolving networks. We use PA metric as an example to illustrate implementation steps and advantage of the time-sliced methods. In terms of PA metric, we found that these proposed metrics outperform existing metrics for growing networks with aging sites, especially when the aging is fast.

In order to further improve predicted accuracy, we got a simple linear

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant No. 71731002).

References (37)

  • BarabásiA.L. et al.

    Emergence of scaling in random networks

    Science

    (1999)
  • BarabásiA.L. et al.

    Power-law distribution of the world wide web

    Science

    (2000)
  • BianconiA.L. et al.

    Competition and multiscaling in evolving networks

    Europhys. Lett.

    (2001)
  • Chen, H.H., Gou, L., Zhang, X., Giles, C.L., 2012. Discovering missing links in networks using vertex similarity...
  • DorogovtsevS.N. et al.

    Evolution of networks with decay factor of sites

    Phys. Rev. E

    (2000)
  • FoussF. et al.

    Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation

    IEEE Trans. Knowl. Data Eng.

    (2007)
  • ...
  • JaccardP.

    Étude comparative de la distribution florale dans une portion des Alpes et des Jura

    Bull. Soc. Vaudoise Sci. Nat.

    (1901)
  • Cited by (5)

    1

    Both Li Zou and Chao Wang are first authors. They have contributed equally to this work.

    View full text