Elsevier

Neurocomputing

Volume 433, 14 April 2021, Pages 108-118
Neurocomputing

NetKI: A kirchhoff index based statistical graph embedding in nearly linear time

https://doi.org/10.1016/j.neucom.2020.12.075Get rights and content

Abstract

Recent advancements in learning from graph-structured data have shown promising results on the graph classification task. However, due to their high time complexities, making them scalable on large graphs, with millions of nodes and edges, remains a challenge. In this paper, we propose NetKI, an algorithm to extract sparse representation from a given graph with n nodes and m edges in O(m-2log4n) time. Our approach follows the notion of Kirchhoff index that encodes the structure of the graph by estimating effective resistance - relying on this approach yields nearly linear time graph representation method that allows scalability on sufficiently large graphs. Through extensive experiments, we show that NetKI provides improved results in terms of running time on large networks and the classification accuracy is within range 2% from the state-of-the-art results.

Introduction

Many real-world systems involve interactions between pairs of entities. Examples of such systems include social networks, financial systems, molecular graph structures, resistor networks, recommender systems, and protein–protein interaction networks. All of these systems and many more can be represented as graphs, that encode real-world objects (i.e., nodes) and the pair-wise interactions (i.e., edges) between them. More recently, graphs have also been used to model grid-structured data, such as images, to make the computations faster [1]. Encoding graphs into low-dimensional embeddings enables one to employ the traditional machine learning and data mining algorithms on graph-structured data. This approach facilitates researchers to solve hard problems on graphs such as graph classification. The graph classification problem refers to the understanding of complex graph structures among different classes, and it has a variety of real-world applications ranging from text classification to molecules toxicity prediction and classifying community structures in a social network [2]. However, graph classification poses several challenges such as permutation invariance, scalability, and, the runtime efficiency of the encoding. Generally, the same graph can be represented by many adjacency matrices due to a lack order on nodes of a graph, therefore, the ideal encoding procedure for graph classification should be invariant under permutations of the nodes. Moreover, it should preserve the atomic structure of the graph based on the local and global positions of pairs of nodes. Unfortunately, existing graph classification approaches often require a pairwise comparison among the graphs or are based solely on a statistical and spectral representations, which is hard to compute [3], [4], [5], [6]. Therefore, appropriate representation methods are required to encode the atomic structure of the graph succinctly that are efficient. To address the aforementioned challenges, in this paper, we propose using a well-known measure called the Kirchhoff index for extracting graph representations. The Kirchhoff index encapsulates the atomic structure of the graph because it incorporates the information regarding the number of paths and quality of paths. It features in many other contexts, such as Markov chains (the average commute time of a Markov chain on the graph), experiment design, and Euclidean distance embeddings [7], [8]. Recently, it has also been employed to quantify the resilience of networks based on noisy data in distributed networked control systems [9]. In fact, it has more equivalent descriptions including the Laplacian spectrum and the effective resistance which makes it suitable for extracting graph representations. Typically, Kirchhoff index is defined in terms of effective resistance using the analogy of graphs in an electrical circuit. A graph can be transformed into an electrical circuit by replacing each edge with a unit resistance. The effective resistance between any two nodes in such a network refers to the equivalent electrical resistance between those two nodes. The sum of the effective resistance between all pair of nodes is referred to as the Kirchhoff index of a graph. We note that a higher value of effective resistance between nodes indicates an inferior or less robust overall connection between the nodes (as measured by the number and quality of paths between them) [8], [10], [11], [7], [12]. Since there is always a path between two nodes in a connected network (that is, any two nodes are always connected, maybe via some other nodes), the effective resistance between any two nodes is well defined. Consequently, the effective resistance of a graph, which is the sum of all pair-wise effective resistances between nodes, is also well defined. To illustrate this, consider the example in Fig. 1.

In (b), there are four paths between nodes 1 and 2 with lengths 1,3,4, and 5, and the effective resistance between the nodes is 0.72. In (c), there are three paths between nodes 1 and 2 with lengths 1,3, and 5, and the effective resistance between is 0.73. We see only a slight increase in effective resistance (indicating a slightly less robust connection) as the path of length four does not exist anymore. The effect is minimal because the paths of smaller lengths, which are more important, are still there. In (d), there are only two paths between nodes 1 and 2 with lengths 3, and 5. The edge between nodes 1 and 2, hence the path of length 1, is removed. The effective resistance between the nodes increases significantly as the path with the shortest length having the maximum effect is deleted. In (e), there is only one path of length 3 between nodes 1 and 2, which increase the effective resistance from 2.75 to 3.0. Thus, we observe that effective resistance between two nodes (adjacent or non-adjacent) measure robustness in terms of the number and quality (length) of paths between nodes. Since Kirchhoff index measures the number as well as the quality of paths, we argue in this paper that this is a suitable measure to generate meaningful and expressive graph representations.

Tapping the recent advancements in solving linear systems in graph Laplacian and for measuring edge centrality in graphs, we propose an efficient method to compute the Kirchhoff index for graph classification task [13], [14]. The major contributions of this study are as follows.

  • We propose an efficient Kirchhoff index based graph classification method that encapsulates graph structure in terms of the number and quality of paths between all pairs of nodes.

  • We empirically show that proposed method provides improved results in terms of running time on large networks while the classification accuracy is close to the state-of-the-art methods that are, otherwise, impractical for large graphs.

  • We are making our implementation publicly available for other researchers to use and improve upon.

Section snippets

Related work

Our related work section is divided into four categories: direct methods, graph kernel methods, statistical and spectral representations, and graph-theoretic approaches. The methods in the first three categories deal with graph representations, while the last category surveys the relevant graph-theoretic approaches used or referred in this work. Table 1 summarizes the related work’s necessary properties and time complexity.

Graph Basics

Let G=(V,E) denote an undirected connected graph where V={v1,v2,.,vn} is the set of n nodes, and EV×V is the set of m edges. Throughout this study, we consider G as a resistor network where nodes represent junctions while edges represent resistors between the junctions. Let re denote a unit resistance over the edge e. We write G(e) to represent the graph obtained from G by deactivating edge e by decreasing re to θre for some small 0<θ<1/2. We denote this process by θ-deletion. Let L denote

Graph embedding algorithm using exact Kirchhoff index

Here we present Kirchhoff index based graph representation algorithm which encodes graph structures using network Kirchhoff index. The algorithm describes the procedure for extracting graph representation by computing exact network Kirchhoff index.

Algorithm 1: Compute H using exact Kirchhoff index
Input: Graph G=(V,E,w), Laplacian L, bin width h, number of bins b
Output: H
 1: Let L+ indicate pseudo-inverse of the Laplacian matrix
 2: KnTr(L+)
 3: initialize sparse matrix Fn×n with zeros
 4: for e(u,v)E

Evaluation

We performed a number of different experiments to evaluate NetKI in terms of scalability, feature sparsity, running time, classification accuracy and features comparison. The forthcoming sections describe the details of all the experimentation we performed for the evaluation of NetKI.

Conclusion

We introduced NetKI, a novel method for graphs classification. NetKI is purely based on graph statistical representation that can be computed efficiently in nearly linear time. We proposed using the network Kirchhoff index as a function of graph representation f(gh) and show its approximation in nearly-linear time. NetKI does not require any graph summary statistics or node attributes and relies only on graph structure. We performed extensive experiments on various real-world benchmark datasets

CRediT authorship contribution statement

Anwar Said: Data curation, Methodology, Investigation, Writing - original draft. Saeed-Ul Hassan: Supervision, Conceptualization, Methodology, Writing - original draft. Waseem Abbas: Writing - original draft, Methodology, Investigation. Mudassir Shabbir: Supervision, Investigation, Writing - original draft.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The authors (Saeed-Ul Hassan & Mudassir Shabbir) were funded by the CIPL National Center in Big Data and Cloud Computing (NCBC) grant, received from the Planning Commission of Pakistan, through Higher Education Commission (HEC) of Pakistan.

Anwar Said: Mr. Said is a PhD research scholar in AI AI Lab, Department of Computer Science at Information Technology University, Lahore, Pakistan. He received MPhil (2016) degree in Computer Science from Quaid-i-Azam University, Islamabad, Pakistan. His research interests are in the area of graph representation, social network analysis, and data science.

References (69)

  • W.L. Hamilton et al.

    Representation learning on graphs: methods and applications

    IEEE Data Eng. Bull.

    (2017)
  • A. Sanfeliu et al.

    A distance measure between attributed relational graphs for pattern recognition

    IEEE Trans. Syst. Man Cybern.

    (1983)
  • R. Kondor et al.

    The graphlet spectrum

  • S. Verma et al.

    Hunt for the unique, stable, sparse and fast feature learning on graphs

    Adv. Neural Inform. Process. Syst.

    (2017)
  • A. Tsitsulin et al.

    Netlsd: hearing the shape of a graph

  • A. Ghosh et al.

    Minimizing effective resistance of a graph

    SIAM Rev.

    (2008)
  • W. Abbas, M. Shabbir, A.Y. Yazicioglu, A. Akber, On the trade-off between controllability and robustness in networks of...
  • W. Ellens, R.E. Kooij, Graph measures and network robustness, arXiv preprint...
  • R. Kyng et al.

    Approximate gaussian elimination for laplacians-fast, sparse, and simple, IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)

    IEEE

    (2016)
  • H. Li, Z. Zhang, Kirchhoff index as a measure of edge centrality in weighted networks: Nearly linear time algorithms,...
  • P. Yanardag et al.

    Deep graph kernels

  • R. Kondor et al.

    The multiscale laplacian graph kernel

    Adv. Neural Inform. Process. Syst.

    (2016)
  • D. Koutra et al.

    Deltacon: a principled massive-graph similarity function, in

  • J. Bento et al.

    A family of tractable graph distances

  • K.M. Borgwardt, H.-P. Kriegel, Shortest-path kernels on graphs, in: Fifth IEEE international conference on data mining...
  • F. Li, Z. Zhu, X. Zhang, J. Cheng, Y. Zhao, Diffusion induced graph representation learning,...
  • N. Shervashidze et al.

    Weisfeiler-lehman graph kernels

    J. Mach. Learn. Res.

    (2011)
  • F. Orsini, P. Frasconi, L. De Raedt, Graph invariant kernels, in: Twenty-Fourth International Joint Conference on...
  • M. Sugiyama, K. Borgwardt, Halting in random walk kernels, in: Advances in neural information processing systems, 2015,...
  • S. Hido et al.

    A linear-time graph kernel

  • N. Pržulj

    Biological network comparison using graphlet degree distribution

    Bioinformatics

    (2007)
  • M. Togninalli et al.

    Wasserstein weisfeiler-lehman graph kernels

    Adv. Neural Inform. Process Syst.

    (2019)
  • C. Bock et al.

    A wasserstein subsequence kernel for time series

  • M. Berlingerio et al.

    Network similarity via multiple social theories, in

  • Cited by (10)

    • DGSD: Distributed graph representation via graph statistical properties

      2021, Future Generation Computer Systems
      Citation Excerpt :

      The algorithm does not require the whole graph to be kept in memory, but instead processes it in batches. More recently, NetKI [28], a nearly linear time graph descriptor has been proposed. NetKI is based on the idea of network Kirchhoff index to extract representations from graphs and scalable on sufficiently large graphs.

    • Circuit design completion using graph neural networks

      2023, Neural Computing and Applications
    View all citing articles on Scopus

    Anwar Said: Mr. Said is a PhD research scholar in AI AI Lab, Department of Computer Science at Information Technology University, Lahore, Pakistan. He received MPhil (2016) degree in Computer Science from Quaid-i-Azam University, Islamabad, Pakistan. His research interests are in the area of graph representation, social network analysis, and data science.

    Saeed-Ul Hassan: Dr. Hassan is the Director of AI AI Lab and the Chairperson in the De of Computer Science at Information Technology University (ITU) in Pakistan, a former Post-Doctorate Fellow at the United Nations University – with more than 15 years of hands-on experience of advanced statistical techniques, artificial intelligence, and software development client work. He earned his Ph.D. in the field Information Management from Asian Institute of Technology. He has also served as a Research Fellow at National Institute of Informatics in Japan. Dr. Hassan’s research interests lie within the areas of Data Science, Artificial Intelligence, Scientometrics, Information Retrieval and Text Mining. Dr. Hassan is also the recipient of James A. Linen III Memorial Award in recognition of his outstanding academic performance. More recently, he has been awarded Eugene Garfield Honorable Mention Award for Innovation in Citation Analysis by Clarivate Analytics, Thomson Reuters.

    Waseem Abbas: Dr. Abbas is a Research Assistant Professor in the Electrical Engineering and Computer Science Department at the Vanderbilt University, Nashville, TN, USA. Previously, he was an Assistant Professor at the Information Technology University Lahore in Pakistan, and a postdoctoral research scholar at the Vanderbilt University between 2014 and 2017. He received Ph.D. (2013) and M.Sc. (2010) degrees, both in Electrical and Computer Engineering, from Georgia Institute of Technology, Atlanta, GA, and was a Fulbright scholar from 2009 till 2013. His research interests are in the areas of resilience and security of network control systems, cyber-physical systems, and graph-theoretic methods in complex networks.

    Mudassir Shabbir: Dr. Shabbir is an Assistant Professor in the Department of Computer Science at the Information Technology University, Lahore, Pakistan. He received his Ph.D. from Division of Computer Science, Rutgers University, NJ USA in 2014. Previously, Mudassir has worked at Lahore University of Management Sciences, Pakistan, Los Alamos National Labs, NM, Bloomberg L.P. New York, NY, and at Rutgers University. He was Rutgers Honors Fellow for 2011-12. His main area of research is Algorithmic and Discrete Geometry and has developed new methods for the characterization and computation of succinct representations of large data sets with applications in nonparametric statistical analysis. He also works in Combinatorics and Graph Theory.

    View full text