Link prediction in growing networks with aging
Introduction
Many real-world systems can be described as complex networks, where nodes represent individuals or organizations and edges represent interactions among them (Barabási et al., 2000). Complex networks play an increasingly important role in understanding and analyzing a wide range of complex systems (Newman, 2009, Redner, 2005). Examples include communication networks, human social networks, infrastructure networks as well as metabolic networks. Studies of networks within the physics community mainly focus on two categories: one is the structure and function of networks; and the other is the evolution of networks themselves. For most real-world networks, their structures are evolving over time, with new nodes and links appearing and meanwhile old nodes and links disappearing. These kinds of networks are commonly referred to as growing networks (Tadić, 2001, Strogatz, 2001). It is widely believed that the growth of networks leads to many different features compared with static networks, such as the acyclic structure in citation networks and the power-law degree distribution in the Internet (Bianconi and Barabási, 2001, Albert et al., 1999).
One specific question that is of great importance concerns link prediction on the evolving networks. The link prediction problem aims to predict new links for future, or missing links or unobserved links in complex networks. Link prediction has many important applications. For example, it can be applied to recommender systems which recommend new friends (Aiello et al., 2012), potential collaborators (Mori et al., 2012, Wu et al., 2013) or online products (Akcora et al., 2011) to users. It can also be used to infer unknown information based on partial information observed in networks (Marchette and Priebe, 2008, Kim and Leskovec, 2011). To design accurate link prediction algorithms, much efforts has been paid by scientists from different fields.
In some social networks, new links are more likely to be generated between nodes who are topologically closer to each other. This motivates scientists to design many neighbor-based metrics for link prediction. In a pioneer work, Newman verified that the common neighbor (CN) (Lorrain and White, 1977) is a good index in the context of collaboration networks for link prediction (Newman, 2001). Based on common neighbor index, there have emerged a large number of metrics using different ways to remove the size bias of common neighbors. These metrics include Jaccard Coefficient index (Jaccard, 1901), Hub Promoted index, Hub depressed index (Redner, 2008), Salton index (Salton and McGill, 1983), Preferential Attachment index (Barabási et al., 2002), etc. Other studies also proposed to incorporate paths information to calculate the similarity scores of node pairs, leading to many path-based metrics such as Local Path (Zhou et al., 2009, Lü et al., 2009), Katz index (Katz, 1953), Relation Strength Similarity (Chen et al., 2012), and FriendLink (Papadimitriou et al., 2012). Apart from these neighbor-based and path-based metrics, random walk models are also used for link prediction. Transition probability in random walk means the transfer likelihood from one node to another node, so this to some extend can represent the generation probability of a new link. Hitting Time (Fouss et al., 2007), SimRank (Jeh and Widom, 2004) and Rooted PageRank (Liben-Nowell and Kleinberg, 2007) are link prediction metrics which calculate similarities between nodes based on random walk.
However, these three kinds of methods do not consider temporal information of nodes, so they may perform poorly in evolving networks. In the real world, some scientists are more likely to cite relatively recent articles instead of old papers, and readers are more likely to read timely news instead of outdated news. This implies that two nodes tend to connect with each other if they are active in the same period, which motivates us to create more practical and useful metrics for link prediction for growing networks. In this paper, we propose several new metrics based on time-sliced technique to predict missing links in aging networks. We find that our proposed metrics outperform the existing methods in both synthetic networks and real-world networks.
Section snippets
Model
As we mentioned above, in our real world, scientists are more likely to cite relatively recent articles instead of old papers, and readers are more likely to read real-time news instead of outdated news. To model this aging effect in growing networks, researchers have proposed preferential attachment model with an exponential time decay factor (Zhu et al., 2003, Medo et al., 2011). Here, we will consider link prediction problem in this network model. The model starts with two connected nodes at
Methods based on time-sliced network
The aforementioned methods do not consider temporal information of nodes. For example, people who have similar age are more likely to be friends with each other. This implies that two nodes tend to connect with each other if they are active in the same period, which motivates us to create more practical and useful methods. Here, we proposed link prediction indicators that consider temporal information of the networks. In our methods, each network is divided into multiple time-sliced networks
Data
In this section, we will consider twelve representative real-world networks drawn from disparate fields: eight news networks and four citation networks. The news networks are sub-networks extracted from six major German news sources between June 2014 and March 2015, including: the Asia network, the Australia network, the China network, the Europe network, the India network, the Latin-America network, the Middle-East network and the US–Canada network.
- 1.
The news networks (the Asia network, the
Conclusion
In this paper, we firstly proposed a series of slice-based metrics to estimate the likelihood of the existence of missing links for evolving networks. We use metric as an example to illustrate implementation steps and advantage of the time-sliced methods. In terms of metric, we found that these proposed metrics outperform existing metrics for growing networks with aging sites, especially when the aging is fast.
In order to further improve predicted accuracy, we got a simple linear
Acknowledgments
This work was supported by the National Natural Science Foundation of China (Grant No. 71731002).
References (37)
- et al.
Evolution of the social network of scientific collaborations
Physica A
(2002) - et al.
Maximum attainable discrimination and the utilization of radiologic examinations
J. Chronic Dis.
(1982) - et al.
Structural equivalence of individuals in social networks
Social Networks
(1977) - et al.
Predicting unobserved links in incompletely observed networks
Comput. Statist. Data Anal.
(2008) - et al.
Machine learning approach for finding business partners and building reciprocal relationships
Expert Syst. Appl.
(2012) - et al.
Fast and accurate link prediction in social networking systems
J. Syst. Softw.
(2012) Dynamics of directed graphs: the world-wide web
Physica A
(2001)- et al.
Friendship prediction and homophily in social media
ACM Trans. Web
(2012) - Akcora, C.G., Carminati, B., Ferrari, E., 2011. Network and profile based measures for user similarities on social...
- et al.
The diameter of the world wide web
Nature
(1999)
Emergence of scaling in random networks
Science
Power-law distribution of the world wide web
Science
Competition and multiscaling in evolving networks
Europhys. Lett.
Evolution of networks with decay factor of sites
Phys. Rev. E
Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation
IEEE Trans. Knowl. Data Eng.
Étude comparative de la distribution florale dans une portion des Alpes et des Jura
Bull. Soc. Vaudoise Sci. Nat.
Cited by (5)
A comprehensive survey of link prediction methods
2024, Journal of SupercomputingShort- and long-term temporal network prediction based on network memory
2023, Applied Network ScienceImproved Link Prediction Method for Maritime Silk Road Shipping Network Using Composite Index
2023, Transportation Research RecordTime to vote: Temporal clustering of user activity on Stack Overflow
2022, Journal of the Association for Information Science and TechnologyTemporal Network Prediction and Interpretation
2022, IEEE Transactions on Network Science and Engineering
- 1
Both Li Zou and Chao Wang are first authors. They have contributed equally to this work.