Nonnegative matrix factorization algorithms for link prediction in temporal networks using graph communicability

doi:10.1016/j.patcog.2017.06.025

Pattern Recognition

Volume 71, November 2017, Pages 361-374

https://doi.org/10.1016/j.patcog.2017.06.025 Get rights and content

Highlights

•
The equivalence between the eigene-decomposition and nonnegative matrix factorization is proven.
•
This paper proposes two matrix decomposition factorization algorithms for temporal link prediction.
•
The algorithm outperforms the state-of-the-art approaches on various temporal networks.

Abstract

Networks derived from many disciplines, such as social relations, web contents, and cancer progression, are temporal and incomplete. Link prediction in temporal networks is of theoretical interest and practical significance because spurious links are critical for investigating evolving mechanisms. In this study, we address the temporal link prediction problem in networks, i.e. predicting links at time $T + 1$ based on a given temporal network from time 1 to T. To address the relationships among matrix decomposition-based algorithms, we prove the equivalence between the eigendecomposition and nonnegative matrix factorization (NMF) algorithms, which serves as the theoretical foundation for designing NMF-based algorithms for temporal link prediction. A novel NMF-based algorithm is proposed based on such equivalence. The algorithm factorizes each network to obtain features using graph communicability, and then collapses the feature matrices to predict temporal links. Compared with state-of-the-art methods, the proposed algorithm exhibits significantly improved accuracy by avoiding the collapse of temporal networks. Experimental results of a number of artificial and real temporal networks illustrate that the proposed method is not only more accurate but also more robust than state-of-the-art approaches.

Introduction

The network (sometimes called graph) effectively characterizes and analyzes complex systems, in which each vertex represents an individual, such as a biological entity (e.g., a gene or a protein), a web user, or a terminal in Internet. Each link denotes an interaction between a pair of vertices. Various real-world networks have been derived from, such as social networks [1], [2], technological networks [3] and biological networks [4]. Network analysis has emerged as a key technique in modern science with the immediate purpose of discovering graph patterns by elucidating the structure-function relationship of overall systems. For example, communities in protein interaction networks correspond to the protein complexes that are critical for biological processes [5].

However, many networks are incomplete because of the limitations of our knowledge regarding complex systems, which significantly hinder the practical application of network analysis. For example, nearly 80% of interactions within yeast [6] and 99% within human [7] remain unknown. Accordingly, link prediction plays a critical role in network analysis [8], [9], which does not only help us recover data, but also improve our understanding of the mechanisms of networks.

Therefore, approaches for predicting links in networks are urgently required, and considerable efforts have been exerted to address this issue [9], [10], [11], [12]. Available methods for link prediction can be categorized into two classes: experimental and computational methods. Experimental methods use a physical strategy to validate the existence of links. They fail to provide satisfactory answers primarily due to the limitations of finance and technology. These methods are also costly and time-consuming, particularly for validating interactions among proteins through biological experiments [13]. Thus, computational methods for predicting links based on known interactions becomes alternatives for experimental approaches [14], [15], [16], [17], [18], [19].

However, the vast majority of available algorithms focus on static networks, where the ultimate goal is to predict links to describe a complete picture of the whole network structure [3]. Many networks derived from the real world dynamically change over time (called temporal or dynamic networks) [20]. For example, in scientific collaboration networks, interactions evolve since scientists directly change their collaborators as they shift their research directions [21]. In disease networks, cancer metastasis is mainly due to cancer cell immigration [22]. Thus, the analysis of temporal networks has received considerable attention because the evolution patterns provide novel insights into the underlying mechanisms of complex networks [20], [23], [24], [25], [26].

Accordingly, the prediction of links in temporal networks is a promising and interesting subject because it is the foundation for network analysis. Unlike missing link prediction in static networks, the temporal link prediction problem obtains the edges in a network at time $T + 1$ based on given temporal network from time 1 to T. This problem is applied to a variety of contexts, such as collaborative filtering [27] and social network connections [28]. However, designing algorithms for the temporal link prediction problem is highly non-trivial [3] because of two reasons. First, features in temporal networks are significantly more complicated than those in static ones, and thus, they are difficult to characterize and extract. Second, the complexity of temporal networks poses a considerable challenge to designing effective and efficient algorithms.

Although the process is difficult, many algorithms for predicting temporal links have been proposed [29], [30], [31]. Sharan et al. [30] collapsed dynamic networks to predict temporal links by summing the matrices associated with networks, thereby saving running time by sacrificing accuracy. To fully utilize topology structure, the Katz index predicts temporal links by counting the number of paths. Matrix decomposition-based algorithms [29], [31], such as singular value decomposition (SVD) and tensor decomposition (TD), have been developed to predict temporal links using low-rank approximation. These algorithms initially collapse temporal networks and then predict temporal links based on the collapsed network, which liminates critical information hidden in dynamic networks, thereby affecting the performance of algorithms. To avoid the collapse of temporal networks, Acar et al. [29] provided a TD method for the temporal link prediction problem, and this method dramatically improved the accuracy of algorithms.

Although considerable efforts have been devoted to the temporal link prediction, some problems remains unsolved, including determining the theoretical relationship among matrix decomposition algorithms and improving the accuracy of algorithms. In this work, the first problem is addressed by proving the equivalence between the eigendecomposition (ED) and nonnegative matrix factorization (NMF) based algorithms. To address the second issue, an NMF-based algorithm is developed without collapsing dynamic networks, which significantly improves the accuracy of algorithms.

Overall, the main contributions of this study can be summarized as follows.

•
We prove the equivalence between the eigendecomposition and nonnegative matrix factorization algorithms in temporal networks, which serves as the theoretical foundation for designing NMF-based algorithms for the temporal link prediction problem.
•
Two NMF-based frameworks for the temporal link prediction problem have been proposed based on the proven equivalence by using graph communicability. The two frameworks differ greatly in terms of objects to collapse. The first framework collapses temporal features, whereas the second framework collapses temporal networks.
•
The proposed method outperforms state-of-the-art methods by using both the artificial and real-world dynamic networks.

The remainder of this paper is organized as follows. The preliminaries are presented in Section 2. Related works are reviewed in Section 3. The equivalence relationship is proven in Section 4. The proposed algorithm is described in Section 5. The experimental results are presented in Section 6. The extension of algorithms and conclusion are provided inSections 7 and 8, respectively.

Section snippets

Preliminaries

Terminologies that are extensively used in the subsequent sections are first introduced prior to presenting the detailed description of the proposed algorithms. Let ${1, 2, \dots, T}$ be a finite set of time points. For a given variable, the attached subscript t represents the value of the variable at time point t (time t for short). The temporal (dynamic) network $G$ is defined as a sequence of networks $G = {G_{1}, G_{2}, \dots, G_{T}},$ where G_t is the network at time t with a vertex set V_t and an edge set E_t. Without loss

Related works

In this section, we briefly review the matrix-based algorithms for temporal link prediction problem, which are classified into three classes: network collapse, topology, and matrix decomposition-based approaches.

Typical network collapsing-based approaches include collapsing tensor (CT) [32] and weighted CT (WCT) [30]. The CT algorithm collapses $G$ by averaging link weights, i.e. $X = \sum_{i = 1}^{T} W_{t} / T .$ Then, it predicts temporal links by setting $W_{T + 1} = X,$ which is criticized for its assumption that all

Equivalence between NMF and ED

We first introduce communicability in networks and then prove the equivalence between NMF and ED.

Algorithm

In this section, we propose the SNMF based on Feature Collapsing algorithm (SNMF-FC) algorithm to predict temporal links, which consists of three components: feature discovery, feature collapsing and link prediction (Fig. 1). The algorithm procedure, parameter selection, and complexity analysis are presented in the subsequent subsections.

Results

In order to evaluate the performance of SNMF-FC, five algorithms are selected for a comparative comparison, including Katz index, NMF for Katz score matrix (NMF-KZ), ED, SVD, PCA (principal component analysis [45]), RW and TD. These algorithms are selected because they are matrix based algorithms that have been widely used for the temporal link prediction problem. Five datasets, including two artificial and three real temporal networks, are employed to test the performance of the algorithms.

Extension of SNMF-FC

In this section, we investigate the possibility of accelerating the SNMF-FC algorithm and discuss situations wherein one of the frameworks is preferred.

Conclusion

Link prediction has been extensively studied because it provides a complete and reliable picture of overall systems. Although many algorithms have been developed for link prediction in static networks, only a few approaches have been devoted to the problem in temporal networks. In fact, link prediction in temporal networks has many critical applications, such as predicting the number of visits on web pages based on browsing history and extracting gene expression patterns based on historical

Acknowledgments

This work was supported by the NSFC (Grant No. 61502363), Natural Science Foundation of Shaanxi Province (Grant No. 2016JQ6044), Fundamental Research Funding of Central Universities (Grant no. JB160306, BDY181417, JB160303) and Natural Science Basic Research Plan in Ningbo City (Program No. 2016A610034). The authors would like to thank the reviewers for their valuable comments and suggestions.

References (50)

G. Kossinets
Effects of missing data in social networks
Social Netw.
(2006)
L. Lü et al.
Link prediction in complex networks: a survey
Physica A
(2011)
P. Holme et al.
Temporal networks
Phys. Rep.
(2012)
Y. Liu et al.
Predicting who rated what in large-scale datasets
ACM SIGKDD Explorations Newslett.
(2007)
E. Estrada et al.
Communicability graph and community structures in complex networks
Appl. Math. Comput.
(2009)
M. Girvan et al.
Community structure in social and biological networks
Proc. Natl. Acad. Sci.
(2002)
G. Palla et al.
Quantiyfing social group evolution
Nature
(2007)
A. Clauset et al.
Hierarchical structure and the prediction of missing links in networks
Nature
(2008)
J. Menche et al.
Uncovering disease-disease relationships through the incomplete interactome
Science
(2015)
A. Tong et al.
Combined experimental and computational strategy to define protein interaction networks for peptide recognition modules
Science
(2002)

H. Yu et al.

High-quality binary protein interaction map of the yeast interactome network

Science

(2008)

M. Stumpf et al.

Estimating the size of the human interactome

Proc. Natl Acad. Sci.

(2008)

B. Barzel et al.

Network link prediction by global silencing of indirect correlations

Nat. Biotechnol.

(2013)

C. Cannistraci et al.

From link-prediction in brain connectomes and protein interactomes to the local-community-paradigm in complex networks

Sci. Rep.

(2013)

C. Lei et al.

A novel link prediction algorithm for reconstructing protein-protein interaction networks by topological similarity

Bioinformatics

(2013)

R. Guimera et al.

Missing and spurious interactions and the reconstruction of complex networks

Proc. Natl. Acad. Sci.

(2009)

T. K et al.

An in vivo map of the yeast protein interactome

Science

(2008)

L. Getoor et al.

Link mining: a survey

ACM SIGKDD Explorations

(2005)

L. Lü et al.

Link prediction in weighted networks: the role of weak ties

Europhys. Lett.

(2010)

T. Zhou et al.

Predicting missing links via local information

Eur. Phys. J. B.

(2009)

L. Lü et al.

Similarity index based on local paths for link prediction of complex networks

Phys. Rev. E

(2009)

J. Zhao et al.

Prediction of links and weights in networks by reliable routes

Sci. Rep.

(2015)

M. Newman

The structure of scientific collaboration networks

Proc. Natl. Acad. Sci.

(2001)

B. Craene et al.

Regulatory networks defining emt during cancer initiation and progression

Nat. Rev. Cancer

(2013)

J. Lee et al.

A unifying framework of mining trajectory patterns of various temporal tightness

IEEE Trans. Knowl. Data Eng.

(2015)

Cited by (89)

Global and local similarity learning in multi-kernel space for nonnegative matrix factorization
2023, Knowledge-Based Systems
Most of existing nonnegative matrix factorization (NMF) methods do not fully exploit global and local similarity information from data. In this paper, we propose a novel local similarity learning approach in the convex NMF framework, which encourages inter-class separability that is desired for clustering. Thus, the new model is capable of enhancing intra-class similarity and inter-class separability with simultaneous global and local learning. Moreover, the model learns the factor matrices in an augmented kernel space, which is a convex combination of pre-defined kernels with auto-learned weights. Thus, the learnings of cluster structure, representation factor matrix, and the optimal kernel mutually enhance each other in a seamlessly integrated model, which leads to informative representation. Multiplicative updating rules are developed with theoretical convergence guarantee. Extensive experimental results have confirmed the effectiveness of the proposed model.
Deep Autoencoder-like non-negative matrix factorization with graph regularized for link prediction in dynamic networks[Formula presented]
2023, Applied Soft Computing
Link prediction in dynamic(temporal) networks refers to predicting future edges by analyzing the available network information. Among the existing temporal link prediction approaches, non-negative matrix factorization(NMF) is a kind of competitive algorithm and has attracted extensive attention. However, traditional NMF-based prediction methods are shallow methods and cannot fully mine the dynamic network, which may lead to a decrease in performance of algorithms. To overcome these shortcomings, inspired by deep Autoencoder, we propose two novel deep Autoencoder-like NMF with graph regularized prediction methods for dynamic networks. By fusing encoder component with deep structure into deep NMF model, our algorithms can sufficiently exploit the complex hierarchical information hidden in dynamic networks. To further extract the abundant information hidden in dynamic networks, graph regularization and PageRank are utilized to exploit the local and global topology information of each snapshot, respectively. By jointly optimizing them in deep Autoencoder-like NMF model, our model is able to preserve the local and global information hidden in dynamic networks, simultaneously. Moreover, an effective alternating iterative method with convergence guarantee is developed for minimizing the established model. Finally, we test our proposed prediction methods on several synthetic and real world datasets to demonstrate that our approaches outperform the state-of-the-art prediction approaches.
Link prediction and its optimization based on low-rank representation of network structures
2023, Expert Systems with Applications
Currently, among the link prediction researches based on low-rank representation, almost none of literature has considered how to select an appropriate base-matrix and the impact of the structural characteristics of the reconstructed network on the link prediction. Therefore, in this paper, we use the adjacency matrix of the fully connected network (FCN) as the base-matrix for low-rank representation, and any local structure of the observed networks can be represented by the interactions of FCN structures. To explore the properties of link predictions, the nuclear norm of the adjacency matrix for the reconstructed network is taken as a penalty term in the newly proposed low-rank representation objective function. According to the optimal interactive coefficients achieved by solving the novel objective function, in this paper, we design a novel link prediction algorithm (LRNP algorithm) and its optimized algorithm (OLRNP algorithm). Experimental results based on real networks and synthetic networks lead to several conclusions. (1) The LRNP algorithm has good convergence properties. When changing the parameters of the LRNP algorithm, the changes along prediction performance do not exceed 9.48%. LRNP also performs well for sparse networks. (2) Compared with baseline link prediction algorithms, LRNP also shows excellent performance, and its AUC and Precision can increase by 14.35% and 14.89%. (3) The OLRNP algorithm exhibits better performance than the LRNP algorithm, and its AUC and Precision can rise by up to 7.50% and 6.79%, respectively. The data and codes are publicly available at https://github.com/pinglanchu/LRNP-OLRNP.
PQKLP: Projected Quantum Kernel based Link Prediction in Dynamic Networks
2022, Computer Communications
Link prediction in dynamic networks finds new or future links based on the previously seen structure of the network. Its study is crucial to comprehending network evolution and its effects on individual nodes. Accuracy and efficiency of link prediction on dynamic networks are the two aspects research. We present Projected Quantum Kernel-based Link Prediction ( $P Q K L P$ ), a quantum-enhanced feature-based framework for solving link prediction problems in dynamic networks. According to our study, the Projected Quantum Kernel has not been utilized in the field of link prediction. Thus, we propose this method that combines the disciplines of social networks and quantum computing. We employed high-dimensional Hilbert spaces to enhance the prediction data in this model, which otherwise we only have access to via inner products provided by measurements. Such enhancement leads to better prediction results from machine learning-based link prediction techniques. We trained six classical machine learning models and their quantum-enhanced counterparts based on the enhanced features generated by the Projected Quantum Kernel ( $P Q K$ ) technique. The proposed model outperforms traditional link prediction methods, classical machine learning approaches, and current state-of-the-art methods on five well-known dynamic network datasets, as per the results of four performance metrics.
An entity-weights-based convolutional neural network for large-sale complex knowledge embedding
2022, Pattern Recognition
Knowledge graph (KG) has increasingly been seen as a significant resource in financial applications (e.g., risk control, auditing and anti-fraud). However, there are few prior studies that focus on multi-relational circles, extracting additional information under the completed KG and selecting similarity measures for knowledge representation. In this paper, we introduce multi-relational circles and propose a novel embedding model, which considers entity weights calculated by PageRank algorithm to improve TransE method. In order to extract additional information, we use entity weights to convert embeddings into an on-map mining problem, and propose a model called CNNe based on entity weights and a convolutional neural network with three hidden layers, which converts vectors of entities, entity weights and relationships into matrices to perform link prediction in the same way as image processing. With the help of ten different similarity measures, it is demonstrated that the choice of distance measure greatly effect the results of the translation embedding models. Moreover, we propose two embedding methods, sMFE and tMFE, to enhance the results using matrix factorization. The complete incidence matrix is first applied to knowledge embedding, which contains the most comprehensive topological properties of the graph. Experimental results on standard benchmark datasets demonstrate that the proposed models are effective. In particular, CNNe achieves a mean rank of 166 less than the baseline method and an improvement of 2.1% on the proportion of correct entities ranked in the top ten on YAGO3-10 dataset.
CFLP: A new cost based feature for link prediction in dynamic networks
2022, Journal of Computational Science
Citation Excerpt :
In the literature, link prediction in dynamic networks has been explored in various methods. Recently link prediction in dynamic networks has become an emerging topic of research and various authors such as, Ma et al. [26], Ahmed et al. [27], Yasami et al. [28], Wu et al. [29] have presented a solution to this problem. Machine learning methods may be used to detect a node pair’s missing or future link.
Dynamic networks are social networks in which node-to-node links contain a temporal component, i.e., node-to-node interactions over a specific time interval. As a result, the dynamic network’s structure changes over time, and previously connected nodes may or may not have an edge connecting them at any one time. The link prediction issue in dynamic networks aims to identify future network linkages based on the relative behavior of previous network updates. We present a feature-based solution that considers both individual snapshots and the overall network throughout the full-time span to answer the link prediction problem. We present a novel feature called Cost-based feature for link prediction ( $C F L P$ ) for estimating edge behavior throughout the entire network, which uses a reward and penalty structure to summarize node activity across the entire network. We use similarity indices, classified into four major categories: local similarity, global similarity, quasi-local similarity, and clustering coefficient-based similarity, to measure edge activity in individual snapshots. We have also selected fourteen different snapshot-based features to find the most excellent combination of minimum features for link prediction. We used regression and mutual information-based scoring for feature selection to correctly quantify the relative effect of features among themselves and the overall link prediction problem. In order to give the best feasible solution to the link prediction problem, these individual features and their combinations were examined with five machine learning models. We employed five performance matrices – AUC, AUPR, Average Precision, F1, and Balance Accuracy Score – to compare the performance of our method to those of state-of-the-art approaches, and found that our method outperformed all.

View all citing articles on Scopus

Xiaoke Ma received his Ph.D. degree in computer science from Xidian University in 2012. He was a post-doctor at the University of Iowa (USA) during 2012–2015. He is an associate professor of School of computer science and technology, Xidian University (P.R.China). His research interests include machine learning, data mining and bioinformatics. He is an ad hoc reviewer for many international journals and publishes more than 30 papers in the peer-reviewed international journals and conferences, such as IEEE Trans. Knowledge and Data Engineering, Pattern Recognition, Information Sciences, New Journal of Physics, JSTAT, Physica A, Bioinformatics, PLoS Computational Biology, IEEE Trans. Computational Biology and Bioinformatics, IEEE Trans. Nanobioscience.

¹: Equal contribution.

View full text

Nonnegative matrix factorization algorithms for link prediction in temporal networks using graph communicability

Highlights

Abstract

Introduction

Section snippets

Preliminaries

Related works

Equivalence between NMF and ED

Algorithm

Results

Extension of SNMF-FC

Conclusion

Acknowledgments

Social Netw.

Physica A

Phys. Rep.

ACM SIGKDD Explorations Newslett.

Appl. Math. Comput.

Community structure in social and biological networks

Proc. Natl. Acad. Sci.

Quantiyfing social group evolution

Nature

Hierarchical structure and the prediction of missing links in networks

Nature

Uncovering disease-disease relationships through the incomplete interactome

Science

Combined experimental and computational strategy to define protein interaction networks for peptide recognition modules

Science

High-quality binary protein interaction map of the yeast interactome network

Science

Estimating the size of the human interactome

Proc. Natl Acad. Sci.

Network link prediction by global silencing of indirect correlations

Nat. Biotechnol.

From link-prediction in brain connectomes and protein interactomes to the local-community-paradigm in complex networks

Sci. Rep.

A novel link prediction algorithm for reconstructing protein-protein interaction networks by topological similarity

Bioinformatics

Missing and spurious interactions and the reconstruction of complex networks

Proc. Natl. Acad. Sci.

An in vivo map of the yeast protein interactome

Science

Link mining: a survey

ACM SIGKDD Explorations

Link prediction in weighted networks: the role of weak ties

Europhys. Lett.

Predicting missing links via local information

Eur. Phys. J. B.

Similarity index based on local paths for link prediction of complex networks

Phys. Rev. E

Prediction of links and weights in networks by reliable routes

Sci. Rep.

The structure of scientific collaboration networks

Proc. Natl. Acad. Sci.

Regulatory networks defining emt during cancer initiation and progression

Nat. Rev. Cancer

A unifying framework of mining trajectory patterns of various temporal tightness

IEEE Trans. Knowl. Data Eng.