NetKI: A kirchhoff index based statistical graph embedding in nearly linear time
Introduction
Many real-world systems involve interactions between pairs of entities. Examples of such systems include social networks, financial systems, molecular graph structures, resistor networks, recommender systems, and protein–protein interaction networks. All of these systems and many more can be represented as graphs, that encode real-world objects (i.e., nodes) and the pair-wise interactions (i.e., edges) between them. More recently, graphs have also been used to model grid-structured data, such as images, to make the computations faster [1]. Encoding graphs into low-dimensional embeddings enables one to employ the traditional machine learning and data mining algorithms on graph-structured data. This approach facilitates researchers to solve hard problems on graphs such as graph classification. The graph classification problem refers to the understanding of complex graph structures among different classes, and it has a variety of real-world applications ranging from text classification to molecules toxicity prediction and classifying community structures in a social network [2]. However, graph classification poses several challenges such as permutation invariance, scalability, and, the runtime efficiency of the encoding. Generally, the same graph can be represented by many adjacency matrices due to a lack order on nodes of a graph, therefore, the ideal encoding procedure for graph classification should be invariant under permutations of the nodes. Moreover, it should preserve the atomic structure of the graph based on the local and global positions of pairs of nodes. Unfortunately, existing graph classification approaches often require a pairwise comparison among the graphs or are based solely on a statistical and spectral representations, which is hard to compute [3], [4], [5], [6]. Therefore, appropriate representation methods are required to encode the atomic structure of the graph succinctly that are efficient. To address the aforementioned challenges, in this paper, we propose using a well-known measure called the Kirchhoff index for extracting graph representations. The Kirchhoff index encapsulates the atomic structure of the graph because it incorporates the information regarding the number of paths and quality of paths. It features in many other contexts, such as Markov chains (the average commute time of a Markov chain on the graph), experiment design, and Euclidean distance embeddings [7], [8]. Recently, it has also been employed to quantify the resilience of networks based on noisy data in distributed networked control systems [9]. In fact, it has more equivalent descriptions including the Laplacian spectrum and the effective resistance which makes it suitable for extracting graph representations. Typically, Kirchhoff index is defined in terms of effective resistance using the analogy of graphs in an electrical circuit. A graph can be transformed into an electrical circuit by replacing each edge with a unit resistance. The effective resistance between any two nodes in such a network refers to the equivalent electrical resistance between those two nodes. The sum of the effective resistance between all pair of nodes is referred to as the Kirchhoff index of a graph. We note that a higher value of effective resistance between nodes indicates an inferior or less robust overall connection between the nodes (as measured by the number and quality of paths between them) [8], [10], [11], [7], [12]. Since there is always a path between two nodes in a connected network (that is, any two nodes are always connected, maybe via some other nodes), the effective resistance between any two nodes is well defined. Consequently, the effective resistance of a graph, which is the sum of all pair-wise effective resistances between nodes, is also well defined. To illustrate this, consider the example in Fig. 1.
In (b), there are four paths between nodes 1 and 2 with lengths , and 5, and the effective resistance between the nodes is . In (c), there are three paths between nodes 1 and 2 with lengths , and 5, and the effective resistance between is . We see only a slight increase in effective resistance (indicating a slightly less robust connection) as the path of length four does not exist anymore. The effect is minimal because the paths of smaller lengths, which are more important, are still there. In (d), there are only two paths between nodes 1 and 2 with lengths 3, and 5. The edge between nodes 1 and 2, hence the path of length 1, is removed. The effective resistance between the nodes increases significantly as the path with the shortest length having the maximum effect is deleted. In (e), there is only one path of length 3 between nodes 1 and 2, which increase the effective resistance from to . Thus, we observe that effective resistance between two nodes (adjacent or non-adjacent) measure robustness in terms of the number and quality (length) of paths between nodes. Since Kirchhoff index measures the number as well as the quality of paths, we argue in this paper that this is a suitable measure to generate meaningful and expressive graph representations.
Tapping the recent advancements in solving linear systems in graph Laplacian and for measuring edge centrality in graphs, we propose an efficient method to compute the Kirchhoff index for graph classification task [13], [14]. The major contributions of this study are as follows.
- •
We propose an efficient Kirchhoff index based graph classification method that encapsulates graph structure in terms of the number and quality of paths between all pairs of nodes.
- •
We empirically show that proposed method provides improved results in terms of running time on large networks while the classification accuracy is close to the state-of-the-art methods that are, otherwise, impractical for large graphs.
- •
We are making our implementation publicly available for other researchers to use and improve upon.
Section snippets
Related work
Our related work section is divided into four categories: direct methods, graph kernel methods, statistical and spectral representations, and graph-theoretic approaches. The methods in the first three categories deal with graph representations, while the last category surveys the relevant graph-theoretic approaches used or referred in this work. Table 1 summarizes the related work’s necessary properties and time complexity.
Graph Basics
Let denote an undirected connected graph where is the set of n nodes, and is the set of m edges. Throughout this study, we consider G as a resistor network where nodes represent junctions while edges represent resistors between the junctions. Let denote a unit resistance over the edge e. We write to represent the graph obtained from G by deactivating edge e by decreasing to for some small . We denote this process by -deletion. Let L denote
Graph embedding algorithm using exact Kirchhoff index
Here we present Kirchhoff index based graph representation algorithm which encodes graph structures using network Kirchhoff index. The algorithm describes the procedure for extracting graph representation by computing exact network Kirchhoff index.Algorithm 1: Compute using exact Kirchhoff index Input: Graph , Laplacian L, bin width h, number of bins b Output: 1: Let indicate pseudo-inverse of the Laplacian matrix 2: 3: initialize sparse matrix with zeros 4: for
Evaluation
We performed a number of different experiments to evaluate NetKI in terms of scalability, feature sparsity, running time, classification accuracy and features comparison. The forthcoming sections describe the details of all the experimentation we performed for the evaluation of NetKI.
Conclusion
We introduced NetKI, a novel method for graphs classification. NetKI is purely based on graph statistical representation that can be computed efficiently in nearly linear time. We proposed using the network Kirchhoff index as a function of graph representation and show its approximation in nearly-linear time. NetKI does not require any graph summary statistics or node attributes and relies only on graph structure. We performed extensive experiments on various real-world benchmark datasets
CRediT authorship contribution statement
Anwar Said: Data curation, Methodology, Investigation, Writing - original draft. Saeed-Ul Hassan: Supervision, Conceptualization, Methodology, Writing - original draft. Waseem Abbas: Writing - original draft, Methodology, Investigation. Mudassir Shabbir: Supervision, Investigation, Writing - original draft.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
The authors (Saeed-Ul Hassan & Mudassir Shabbir) were funded by the CIPL National Center in Big Data and Cloud Computing (NCBC) grant, received from the Planning Commission of Pakistan, through Higher Education Commission (HEC) of Pakistan.
Anwar Said: Mr. Said is a PhD research scholar in AI AI Lab, Department of Computer Science at Information Technology University, Lahore, Pakistan. He received MPhil (2016) degree in Computer Science from Quaid-i-Azam University, Islamabad, Pakistan. His research interests are in the area of graph representation, social network analysis, and data science.
References (69)
- et al.
Graph embedding techniques, applications, and performance: a survey
Knowl.-Based Syst.
(2018) - et al.
Robust graph topologies for networked systems
IFAC Proceedings Volumes
(2012) - et al.
Effective graph resistance
Linear Algebra Appl.
(2011) - et al.
Improving robustness of complex networks via the effective graph resistance
Eur. Phys. J. B
(2014) - et al.
Semi-supervised multi-graph classification using optimal feature selection and extreme learning machine
Neurocomputing
(2018) - et al.
Approximation of graph edit distance based on hausdorff matching
Pattern Recogn.
(2015) - et al.
Graph classification based on graph set reconstruction and graph kernel feature reduction
Neurocomputing
(2018) - et al.
On extending extreme learning machine to non-redundant synergy pattern based graph classification
Neurocomputing
(2015) - et al.
Spectral distances of graphs
Linear Algebra Its Appl.
(2012) - et al.
Hplapgcn: hypergraph p-laplacian graph convolutional networks
Neurocomputing
(2019)
Representation learning on graphs: methods and applications
IEEE Data Eng. Bull.
A distance measure between attributed relational graphs for pattern recognition
IEEE Trans. Syst. Man Cybern.
The graphlet spectrum
Hunt for the unique, stable, sparse and fast feature learning on graphs
Adv. Neural Inform. Process. Syst.
Netlsd: hearing the shape of a graph
Minimizing effective resistance of a graph
SIAM Rev.
Approximate gaussian elimination for laplacians-fast, sparse, and simple, IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)
IEEE
Deep graph kernels
The multiscale laplacian graph kernel
Adv. Neural Inform. Process. Syst.
Deltacon: a principled massive-graph similarity function, in
A family of tractable graph distances
Weisfeiler-lehman graph kernels
J. Mach. Learn. Res.
A linear-time graph kernel
Biological network comparison using graphlet degree distribution
Bioinformatics
Wasserstein weisfeiler-lehman graph kernels
Adv. Neural Inform. Process Syst.
A wasserstein subsequence kernel for time series
Network similarity via multiple social theories, in
Cited by (10)
On augmenting topological graph representations for attributed graphs[Formula presented]
2023, Applied Soft ComputingDGSD: Distributed graph representation via graph statistical properties
2021, Future Generation Computer SystemsCitation Excerpt :The algorithm does not require the whole graph to be kept in memory, but instead processes it in batches. More recently, NetKI [28], a nearly linear time graph descriptor has been proposed. NetKI is based on the idea of network Kirchhoff index to extract representations from graphs and scalable on sufficiently large graphs.
A SURVEY OF GRAPH UNLEARNING
2023, arXivCircuit design completion using graph neural networks
2023, Neural Computing and Applications
Anwar Said: Mr. Said is a PhD research scholar in AI AI Lab, Department of Computer Science at Information Technology University, Lahore, Pakistan. He received MPhil (2016) degree in Computer Science from Quaid-i-Azam University, Islamabad, Pakistan. His research interests are in the area of graph representation, social network analysis, and data science.
Saeed-Ul Hassan: Dr. Hassan is the Director of AI AI Lab and the Chairperson in the De of Computer Science at Information Technology University (ITU) in Pakistan, a former Post-Doctorate Fellow at the United Nations University – with more than 15 years of hands-on experience of advanced statistical techniques, artificial intelligence, and software development client work. He earned his Ph.D. in the field Information Management from Asian Institute of Technology. He has also served as a Research Fellow at National Institute of Informatics in Japan. Dr. Hassan’s research interests lie within the areas of Data Science, Artificial Intelligence, Scientometrics, Information Retrieval and Text Mining. Dr. Hassan is also the recipient of James A. Linen III Memorial Award in recognition of his outstanding academic performance. More recently, he has been awarded Eugene Garfield Honorable Mention Award for Innovation in Citation Analysis by Clarivate Analytics, Thomson Reuters.
Waseem Abbas: Dr. Abbas is a Research Assistant Professor in the Electrical Engineering and Computer Science Department at the Vanderbilt University, Nashville, TN, USA. Previously, he was an Assistant Professor at the Information Technology University Lahore in Pakistan, and a postdoctoral research scholar at the Vanderbilt University between 2014 and 2017. He received Ph.D. (2013) and M.Sc. (2010) degrees, both in Electrical and Computer Engineering, from Georgia Institute of Technology, Atlanta, GA, and was a Fulbright scholar from 2009 till 2013. His research interests are in the areas of resilience and security of network control systems, cyber-physical systems, and graph-theoretic methods in complex networks.
Mudassir Shabbir: Dr. Shabbir is an Assistant Professor in the Department of Computer Science at the Information Technology University, Lahore, Pakistan. He received his Ph.D. from Division of Computer Science, Rutgers University, NJ USA in 2014. Previously, Mudassir has worked at Lahore University of Management Sciences, Pakistan, Los Alamos National Labs, NM, Bloomberg L.P. New York, NY, and at Rutgers University. He was Rutgers Honors Fellow for 2011-12. His main area of research is Algorithmic and Discrete Geometry and has developed new methods for the characterization and computation of succinct representations of large data sets with applications in nonparametric statistical analysis. He also works in Combinatorics and Graph Theory.