Elsevier

Pattern Recognition

Volume 71, November 2017, Pages 361-374
Pattern Recognition

Nonnegative matrix factorization algorithms for link prediction in temporal networks using graph communicability

https://doi.org/10.1016/j.patcog.2017.06.025Get rights and content

Highlights

Abstract

Networks derived from many disciplines, such as social relations, web contents, and cancer progression, are temporal and incomplete. Link prediction in temporal networks is of theoretical interest and practical significance because spurious links are critical for investigating evolving mechanisms. In this study, we address the temporal link prediction problem in networks, i.e. predicting links at time T+1 based on a given temporal network from time 1 to T. To address the relationships among matrix decomposition-based algorithms, we prove the equivalence between the eigendecomposition and nonnegative matrix factorization (NMF) algorithms, which serves as the theoretical foundation for designing NMF-based algorithms for temporal link prediction. A novel NMF-based algorithm is proposed based on such equivalence. The algorithm factorizes each network to obtain features using graph communicability, and then collapses the feature matrices to predict temporal links. Compared with state-of-the-art methods, the proposed algorithm exhibits significantly improved accuracy by avoiding the collapse of temporal networks. Experimental results of a number of artificial and real temporal networks illustrate that the proposed method is not only more accurate but also more robust than state-of-the-art approaches.

Introduction

The network (sometimes called graph) effectively characterizes and analyzes complex systems, in which each vertex represents an individual, such as a biological entity (e.g., a gene or a protein), a web user, or a terminal in Internet. Each link denotes an interaction between a pair of vertices. Various real-world networks have been derived from, such as social networks [1], [2], technological networks [3] and biological networks [4]. Network analysis has emerged as a key technique in modern science with the immediate purpose of discovering graph patterns by elucidating the structure-function relationship of overall systems. For example, communities in protein interaction networks correspond to the protein complexes that are critical for biological processes [5].

However, many networks are incomplete because of the limitations of our knowledge regarding complex systems, which significantly hinder the practical application of network analysis. For example, nearly 80% of interactions within yeast [6] and 99% within human [7] remain unknown. Accordingly, link prediction plays a critical role in network analysis [8], [9], which does not only help us recover data, but also improve our understanding of the mechanisms of networks.

Therefore, approaches for predicting links in networks are urgently required, and considerable efforts have been exerted to address this issue [9], [10], [11], [12]. Available methods for link prediction can be categorized into two classes: experimental and computational methods. Experimental methods use a physical strategy to validate the existence of links. They fail to provide satisfactory answers primarily due to the limitations of finance and technology. These methods are also costly and time-consuming, particularly for validating interactions among proteins through biological experiments [13]. Thus, computational methods for predicting links based on known interactions becomes alternatives for experimental approaches [14], [15], [16], [17], [18], [19].

However, the vast majority of available algorithms focus on static networks, where the ultimate goal is to predict links to describe a complete picture of the whole network structure [3]. Many networks derived from the real world dynamically change over time (called temporal or dynamic networks) [20]. For example, in scientific collaboration networks, interactions evolve since scientists directly change their collaborators as they shift their research directions [21]. In disease networks, cancer metastasis is mainly due to cancer cell immigration [22]. Thus, the analysis of temporal networks has received considerable attention because the evolution patterns provide novel insights into the underlying mechanisms of complex networks [20], [23], [24], [25], [26].

Accordingly, the prediction of links in temporal networks is a promising and interesting subject because it is the foundation for network analysis. Unlike missing link prediction in static networks, the temporal link prediction problem obtains the edges in a network at time T+1 based on given temporal network from time 1 to T. This problem is applied to a variety of contexts, such as collaborative filtering [27] and social network connections [28]. However, designing algorithms for the temporal link prediction problem is highly non-trivial [3] because of two reasons. First, features in temporal networks are significantly more complicated than those in static ones, and thus, they are difficult to characterize and extract. Second, the complexity of temporal networks poses a considerable challenge to designing effective and efficient algorithms.

Although the process is difficult, many algorithms for predicting temporal links have been proposed [29], [30], [31]. Sharan et al. [30] collapsed dynamic networks to predict temporal links by summing the matrices associated with networks, thereby saving running time by sacrificing accuracy. To fully utilize topology structure, the Katz index predicts temporal links by counting the number of paths. Matrix decomposition-based algorithms [29], [31], such as singular value decomposition (SVD) and tensor decomposition (TD), have been developed to predict temporal links using low-rank approximation. These algorithms initially collapse temporal networks and then predict temporal links based on the collapsed network, which liminates critical information hidden in dynamic networks, thereby affecting the performance of algorithms. To avoid the collapse of temporal networks, Acar et al. [29] provided a TD method for the temporal link prediction problem, and this method dramatically improved the accuracy of algorithms.

Although considerable efforts have been devoted to the temporal link prediction, some problems remains unsolved, including determining the theoretical relationship among matrix decomposition algorithms and improving the accuracy of algorithms. In this work, the first problem is addressed by proving the equivalence between the eigendecomposition (ED) and nonnegative matrix factorization (NMF) based algorithms. To address the second issue, an NMF-based algorithm is developed without collapsing dynamic networks, which significantly improves the accuracy of algorithms.

Overall, the main contributions of this study can be summarized as follows.

  • We prove the equivalence between the eigendecomposition and nonnegative matrix factorization algorithms in temporal networks, which serves as the theoretical foundation for designing NMF-based algorithms for the temporal link prediction problem.

  • Two NMF-based frameworks for the temporal link prediction problem have been proposed based on the proven equivalence by using graph communicability. The two frameworks differ greatly in terms of objects to collapse. The first framework collapses temporal features, whereas the second framework collapses temporal networks.

  • The proposed method outperforms state-of-the-art methods by using both the artificial and real-world dynamic networks.

The remainder of this paper is organized as follows. The preliminaries are presented in Section 2. Related works are reviewed in Section 3. The equivalence relationship is proven in Section 4. The proposed algorithm is described in Section 5. The experimental results are presented in Section 6. The extension of algorithms and conclusion are provided inSections 7 and 8, respectively.

Section snippets

Preliminaries

Terminologies that are extensively used in the subsequent sections are first introduced prior to presenting the detailed description of the proposed algorithms. Let {1,2,,T} be a finite set of time points. For a given variable, the attached subscript t represents the value of the variable at time point t (time t for short). The temporal (dynamic) networkG is defined as a sequence of networks G={G1,G2,,GT}, where Gt is the network at time t with a vertex set Vt and an edge set Et. Without loss

Related works

In this section, we briefly review the matrix-based algorithms for temporal link prediction problem, which are classified into three classes: network collapse, topology, and matrix decomposition-based approaches.

Typical network collapsing-based approaches include collapsing tensor (CT) [32] and weighted CT (WCT) [30]. The CT algorithm collapses G by averaging link weights, i.e. X=i=1TWt/T.Then, it predicts temporal links by setting WT+1=X, which is criticized for its assumption that all

Equivalence between NMF and ED

We first introduce communicability in networks and then prove the equivalence between NMF and ED.

Algorithm

In this section, we propose the SNMF based on Feature Collapsing algorithm (SNMF-FC) algorithm to predict temporal links, which consists of three components: feature discovery, feature collapsing and link prediction (Fig. 1). The algorithm procedure, parameter selection, and complexity analysis are presented in the subsequent subsections.

Results

In order to evaluate the performance of SNMF-FC, five algorithms are selected for a comparative comparison, including Katz index, NMF for Katz score matrix (NMF-KZ), ED, SVD, PCA (principal component analysis [45]), RW and TD. These algorithms are selected because they are matrix based algorithms that have been widely used for the temporal link prediction problem. Five datasets, including two artificial and three real temporal networks, are employed to test the performance of the algorithms.

Extension of SNMF-FC

In this section, we investigate the possibility of accelerating the SNMF-FC algorithm and discuss situations wherein one of the frameworks is preferred.

Conclusion

Link prediction has been extensively studied because it provides a complete and reliable picture of overall systems. Although many algorithms have been developed for link prediction in static networks, only a few approaches have been devoted to the problem in temporal networks. In fact, link prediction in temporal networks has many critical applications, such as predicting the number of visits on web pages based on browsing history and extracting gene expression patterns based on historical

Acknowledgments

This work was supported by the NSFC (Grant No. 61502363), Natural Science Foundation of Shaanxi Province (Grant No. 2016JQ6044), Fundamental Research Funding of Central Universities (Grant no. JB160306, BDY181417, JB160303) and Natural Science Basic Research Plan in Ningbo City (Program No. 2016A610034). The authors would like to thank the reviewers for their valuable comments and suggestions.

Xiaoke Ma received his Ph.D. degree in computer science from Xidian University in 2012. He was a post-doctor at the University of Iowa (USA) during 2012–2015. He is an associate professor of School of computer science and technology, Xidian University (P.R.China). His research interests include machine learning, data mining and bioinformatics. He is an ad hoc reviewer for many international journals and publishes more than 30 papers in the peer-reviewed international journals and conferences,

References (50)

  • H. Yu et al.

    High-quality binary protein interaction map of the yeast interactome network

    Science

    (2008)
  • M. Stumpf et al.

    Estimating the size of the human interactome

    Proc. Natl Acad. Sci.

    (2008)
  • B. Barzel et al.

    Network link prediction by global silencing of indirect correlations

    Nat. Biotechnol.

    (2013)
  • C. Cannistraci et al.

    From link-prediction in brain connectomes and protein interactomes to the local-community-paradigm in complex networks

    Sci. Rep.

    (2013)
  • C. Lei et al.

    A novel link prediction algorithm for reconstructing protein-protein interaction networks by topological similarity

    Bioinformatics

    (2013)
  • R. Guimera et al.

    Missing and spurious interactions and the reconstruction of complex networks

    Proc. Natl. Acad. Sci.

    (2009)
  • T. K et al.

    An in vivo map of the yeast protein interactome

    Science

    (2008)
  • L. Getoor et al.

    Link mining: a survey

    ACM SIGKDD Explorations

    (2005)
  • L. et al.

    Link prediction in weighted networks: the role of weak ties

    Europhys. Lett.

    (2010)
  • T. Zhou et al.

    Predicting missing links via local information

    Eur. Phys. J. B.

    (2009)
  • L. et al.

    Similarity index based on local paths for link prediction of complex networks

    Phys. Rev. E

    (2009)
  • J. Zhao et al.

    Prediction of links and weights in networks by reliable routes

    Sci. Rep.

    (2015)
  • M. Newman

    The structure of scientific collaboration networks

    Proc. Natl. Acad. Sci.

    (2001)
  • B. Craene et al.

    Regulatory networks defining emt during cancer initiation and progression

    Nat. Rev. Cancer

    (2013)
  • J. Lee et al.

    A unifying framework of mining trajectory patterns of various temporal tightness

    IEEE Trans. Knowl. Data Eng.

    (2015)
  • Cited by (89)

    • CFLP: A new cost based feature for link prediction in dynamic networks

      2022, Journal of Computational Science
      Citation Excerpt :

      In the literature, link prediction in dynamic networks has been explored in various methods. Recently link prediction in dynamic networks has become an emerging topic of research and various authors such as, Ma et al. [26], Ahmed et al. [27], Yasami et al. [28], Wu et al. [29] have presented a solution to this problem. Machine learning methods may be used to detect a node pair’s missing or future link.

    View all citing articles on Scopus

    Xiaoke Ma received his Ph.D. degree in computer science from Xidian University in 2012. He was a post-doctor at the University of Iowa (USA) during 2012–2015. He is an associate professor of School of computer science and technology, Xidian University (P.R.China). His research interests include machine learning, data mining and bioinformatics. He is an ad hoc reviewer for many international journals and publishes more than 30 papers in the peer-reviewed international journals and conferences, such as IEEE Trans. Knowledge and Data Engineering, Pattern Recognition, Information Sciences, New Journal of Physics, JSTAT, Physica A, Bioinformatics, PLoS Computational Biology, IEEE Trans. Computational Biology and Bioinformatics, IEEE Trans. Nanobioscience.

    1

    Equal contribution.

    View full text