NetSRE: Link predictability measuring and regulating
Introduction
Networks have been proved to be an effective abstraction for representing real-world complex systems [1]. With network models, the various complex systems, ranging from the Internet and the World Wide Web to biological and social networks, are all considered as a collection of discrete units that interact through a set of connections. Driven by the increasing availability of network data [2], [3], network science has seen a surge of interest in the last twenty years, and the research focus has been transferred from statistical analysis-based empirical studies [4], [5], [6] to practical structure mining. Most recently, many structure mining works have been developed, including community detection [7], [8], [9], influential node ranking [10], [11], [12], graph classification [13], [14], and graph summarization [15].
In network science, link prediction [16], [17], [18] is a fundamental notion, which attempts to uncover missing links or detect spurious links using features intrinsic to the network topology itself. In the last decade, the link prediction problem has received increased attention and a growing number of methods have been proposed for link prediction. These methods can be roughly divided into three classes [19], [20]: similarity-based methods, maximum likelihood methods, and matrix decomposition-based methods. Link prediction can benefit a wide range of real-world applications. For instance, in biological networks, our knowledge of biological interactions is highly limited; using the predicted results for guiding the design of experiments, rather than blindly checking all possible interactions, can sharply reduce the experimental costs [21]. In online social networks, the potential commercial interests have led to the creation and proliferation of fake accounts, and link prediction can help to find the fake accounts by detecting abnormal social relations [22]. In e-commerce websites, link prediction can be used for recommending products to target users [23]. In the security domain, with the availability of network data related to terrorist activities, link prediction can be used to reveal some hidden relationships to discover the potential terrorists [24].
Just as a popular saying goes, “every coin has two sides”. Recently, link prediction has raised privacy concerns in the case where the predicted link is between users who would like to keep the relationship private. Specifically, in the real world, many types of information, such as sexual contacts, purchase records, and financial relationships, are considered highly sensitive and anonymized for privacy preserving. However, based on link prediction, many privacy inference attack methods have been proposed. For example, Zheleva and Getoor [25] conducted a preliminary study on sensitive relationship inference from anonymized graphs. Ying and Wu [26] investigated how well a graph randomization approach can protect sensitive links and showed that similarity measures can be exploited by attackers to significantly improve their confidence and accuracy of the predicted sensitive links. Yang et al. [27] identified a fundamental weakness of link-based graph anonymization mechanisms and exploited it to recover most of the original graph structure. Michael et al. [28] presented a “link reconstruction attack method, which can infer connections that a user wants to hide to preserve his privacy. Moreover, link prediction-based de-anonymization methods are defined to match the accounts across networks for user identification [29], [30], [31].
Motivation. As discussed above, link prediction can be applied to predict the potential relationship between two individuals. To reveal the structure of various networks accurately, more robust and sophisticated link prediction methods are required. From another perspective, link prediction may increase the risk of information leakage. Even if the data publishers remove sensitive information before network datasets are released, it may still be inferred by link prediction, thereby encroaching user privacy. Naturally, considering the interests of all parties, the problem of link predictability measuring and regulating (LPMR), which characterizes the inherent difficulty of link prediction and explores the potential influence of network links on the accuracy of link prediction methods.
Based on considerable literature on link prediction, researchers have started realizing the significance of structural features of networks. Besides relying on specific algorithms, the accuracy of link prediction methods depends on the network structure itself. Especially, no algorithm can achieve satisfactory performance in random networks, while high level of prediction accuracy can be achieved readily in regular networks. In fact, real-world networks usually embody both regular components and irregular components, where only the former can be modeled and explained. Consequently, the accuracy of link prediction depends on the regularity level of networks, i.e., the proportion of the regular components. Therefore, the intrinsic regularity of networks is the fundamental factor influencing the accuracy of link prediction.
Link predictability denotes the inherent difficulty of link prediction in networks independent of specific algorithms, which can be calculated by estimating their regularity level. By measuring the predictability of a network, we can determine whether the deficient performance of link prediction is caused by an inappropriate algorithm or is due to the irregularity of the network itself, and then estimate how a large space remains for performance improvement. Furthermore, by regulating the link predictability of networks, the risks arising from link prediction, such as privacy disclosure, can be reduced directly. However, despite its practical importance, so far, the problem of LPMR has not been fully investigated.
Contributions. This paper proposes a network structural regularity exploring architecture, called NetSRE. NetSRE measures the link predictability of networks by exploring their organization principles, which indicate the upper bound of link prediction accuracy and provide guidance for algorithm optimization. NetSRE assumes that links play different roles in network organization, where some of them have disproportionate influence on network regularity, and then, link predictability can be regulated based on a limited number of links. By analyzing the organizational relationships in network self-representation, the potential links can be predicted based on the learned structure patterns of networks. Along this line, the distribution of the representative subgraphs in network self-representation indicates the link predictability of networks, and the links with various substitutabilities in network self-representation have different influences on link predictability regulating. The main contributions of this paper are summarized as follows:
- •
First, we model a network structure from the perspective of self-representation and formalize the question as an optimization problem. Using the self-representation model, the network structure can be decomposed into a set of representative subgraphs and the combination relationships between them. By applying the model on link prediction, i.e., Low Frobenius norm-based Link Prediction (LFLP), the expressive power of the self-representation model is proved.
- •
Second, based on the assumption that real-world networks have a certain degree of regularity and the data matrices are approximately low rank, we define a low-rank pursuit-based self-representation model to uncover the common representative subgraphs. According to the learned representation matrix, we define a Structural Regularity Index (SRI) to measure the link predictability of networks.
- •
Third, according to the usage of links in the network self-representation model, we define a novel importance metric for network links to indicate their regularity level. Based on the link selection mechanism, the structure perturbation-based Link Predictability Regulation (LPR) algorithm is proposed to control the networks’ potentiality for link prediction.
The remainder of this paper is organized as follows. Section 2 surveys the background and related work. Section 3 introduces the problem definition and evaluation mechanism. Section 4 introduces our proposed method. Section 5 shows the experiments conducted, and Section 6 concludes the paper.
Section snippets
Link prediction and network modeling
The most generic framework used for link prediction is similarity-based methods [32], including local indices Common Neighbors (CN), Adamic–Adar (AA) [33], Resource Allocation (RA) [34], etc., global indices Katz [35], SimRank [36], etc., and quasi-local indices Local Path Index (LP) [34], Local Random Walk (LRW) [37], etc. Recently, some novel similarity-based methods, including dynamical response-based method [38], neighborhood-based method [39], etc., have also been developed. The methods
Link prediction
To clearly illustrate the LPMR problem, a working flowchart of network data analysis with structure perturbation is illustrated in Fig. 1. The working flowchart contains the following parts: first, the datasets about real-world complex systems are collected; then, on the basis of the datasets, networks are constructed to characterize the interactive relationships of the objects in complex systems; next, because networks always contain sensitive links or noisy links that can be identified by
Network representation modeling
Empirical studies on complex networks have indicated that most real-world networks possess some common topological characteristics, such as small-world, scale-free, and core–periphery features, which can be modeled effectively based on the presupposed organization principles [1], [42]. Moreover, from the perspective of network summarization, Koutra et al. [15] found that network structures are composed of an enriched set of representative subgraphs, including cliques, stars, chains, and
Experiments
We conduct an experimental study of the proposed algorithm based on real-world networks. Three sets of experiments are conducted to evaluate the performance of the proposed methods, including the link prediction algorithm, link predictability measure, and link predictability regulation algorithm.
Conclusions and discussion
This paper introduces the LPMR problem. Theoretically, exploring and controlling the link predictability of networks is of significance in network analysis and graph mining. Link predictability can be used to indicate the expected link prediction accuracy of networks. Moreover, exploring link predictability can help us uncover the organization principle of networks and understand the structural roles of network links. From the practical viewpoint, via irregular link identification, the abnormal
CRediT authorship contribution statement
Xingping Xian and Tao Wu contributed the conceptualization, data curation, methodology and wrote the original draft. The remaining authors contributed to validating the ideas, carrying out additional analyses and reviewing this paper. All authors read and approved the manuscript.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
This work was partially supported by the National Natural Science Foundation of China under Grant No. 61802039, 61772098, 61772091, 61802035; National Key R&D Program of China under Grant No. 2018YFB0904900, 2018YFB0904905; Science and Technology Research Program of Chongqing Municipal Education Commission, China under Grant No. KJQN201800630; Innovative Talents Program, China under Grant No. BYJS201811.
References (86)
- et al.
Vital nodes identification in complex networks
Phys. Rep.
(2016) - et al.
Enhanced collective influence: A paradigm to optimize network disruption
Physica A
(2017) - et al.
Power iteration ranking via hybrid diffusion for vital nodes identification
Physica A
(2018) - et al.
Link prediction in complex networks: A survey
Physica A
(2011) - et al.
Predicting the evolution of complex networks via similarity dynamics
Physica A
(2017) - et al.
Friends and neighbors on the web
Soc. Netw.
(2003) - et al.
Link prediction based on linear dynamical response
Physica A
(2019) - et al.
CNDP:link prediction based on common neighbors degree penalization
Physica A
(2020) - et al.
A fusion probability matrix factorization framework for link prediction
Knowl.-Based Syst.
(2018) - et al.
Kernel framework based on non-negative matrix factorization for networks reconstruction and link prediction
Knowl.-Based Syst.
(2017)
Optimizing complex networks for resilience against cascading failure
Physica A
Assessment of spatial and temporal variability in ecosystem attributes of the St Marks National Wildlife Refuge, Apalachee Bay, Florida
Estuar. Coast. Shelf Sci.
The structure and function of complex networks
SIAM Rev.
Social network analysis and mining for business applications
ACM Trans. Intell. Syst. Technol.
Network structure from rich but noisy data
Nat. Phys.
Statistical mechanics of complex networks
Rev. Modern Phys.
Scale-free networks: a decade and beyond
science
Empirical analysis of an evolving social network
Science
Structure and inference in annotated networks
Nature Commun.
Detecting community structure in networks
Eur. Phys. J. B
The ground truth about metadata and community detection in networks
Sci. Adv.
Incremental subgraph feature selection for graph classification
IEEE Trans. Knowl. Data Eng.
Graph classification using signal-subgraphs: applications in statistical connectomics
IEEE Trans. Pattern Anal. Mach. Intell.
Summarizing and understanding large graphs
Stat. Anal. Data Min.
The link-prediction problem for social networks
J. Assoc. Inf. Sci. Technol.
Link prediction via linear optimization
Physica A
Link predication based on matrix factorization by fusion of multi class organizations of the network
Sci. Rep.
Network link prediction by global silencing of indirect correlations
Nature Biotechnol.
Recommender systems
Phys. Rep.
CTRL+Z: Recovering anonymized social graphs
Links reconstruction attack
User identity linkage across online social networks: A review
Acm Sigkdd Explor. Newsl.
Structure based user identification across social networks
IEEE Trans. Knowl. Data Eng.
Review on graph feature learning and feature extraction techniques for link prediction
Predicting missing links via local information
Eur. Phys. J. B
A new status index derived from sociometric analysis
Psychometrika
Simrank: a measure of structural-context similarity
Link prediction based on local random walk
Europhys. Lett.
Cited by (23)
An extended self-representation model of complex networks for link prediction
2024, Information SciencesLink prediction and its optimization based on low-rank representation of network structures
2023, Expert Systems with ApplicationsNetwork-energy-based predictability and link-corrected prediction in complex networks
2022, Expert Systems with ApplicationsCitation Excerpt :In cases where a certain link prediction algorithm has poor performance on a complex network, it is important to determine whether this is caused by an inappropriate link prediction algorithm, or by the network structure itself. The research on the predictability of networks is still in its infancy (Chen, Fang et al., 2019; Chen, Jaio et al., 2019; Lü et al., 2015; Sun et al., 2020; Xian et al., 2020; Yin, Zheng, Bian, & Deng, 2017). Lü et al. used the matrix perturbation theory to reconstruct the link structure of the network.
Network structural perturbation against interlayer link prediction
2022, Knowledge-Based SystemsCitation Excerpt :Lü et al. [59] proposed the structural consistency index to reveal the intrinsic link predictability and designed an eigenvalue perturbation method to quantify the consistency level. Similarly, Xian et al. [60] proposed a structural perturbation algorithm to measure and regulate link predictability. On the other hand, sophisticated and widespread online network analysis tools raise security and privacy-related concerns of the general public and some organizations [61].
Small perturbations are enough: Adversarial attacks on time series prediction
2022, Information SciencesCitation Excerpt :To fool the semantic segmentation and object detection models, Xie et al. [45] generated adversarial perturbations to produce an incorrect prediction on all output labels. Moreover, because of the importance of graph mining and text analysis [42,44], adversarial examples also exist against the models of graph-structured [43] and text [5] data. Differing from the above studies, we focus on the adversarial attacks against time-series prediction models.