GraMMy: Graph representation learning based on micro–macro analysis

doi:10.1016/j.neucom.2022.07.013

Neurocomputing

Volume 506, 28 September 2022, Pages 84-95

https://doi.org/10.1016/j.neucom.2022.07.013 Get rights and content

Abstract

Graph Neural Networks (GNNs) are robust variants of deep network models, typically designed to learn from graph-structured data. Despite the recent advancement of GNNs, the basic message passing scheme of learning often holds back these models in effectively capturing the influence of the nodes from higher order neighbourhood. Further, the state-of-the-art approaches mostly ignore the contextual significance of the paths through which the message/information propagates to a node. In order to deal with these two issues, we propose GraMMY as a novel framework for hierarchical semantics-driven graph representation learning based on Micro-Macro analysis. The key idea here is to study the graph structure from different levels of abstraction, which not only provides an opportunity for flexible flow of information from both local and higher-order neighbours but also helps in more concretely capturing how information travels within various hierarchical structures of the graph. We incorporate the knowledge gained from micro and macro level semantics into the embedding of a node and use this to perform graph classification. Experimentations on four bio-informatics and two social datasets exhibit the superiority of GraMMy over state-of-the-art GNN-based graph classifiers.

Introduction

Graph is a pervasive structure, which is used to represent complex systems where both entities and their interconnections are equally important for realization [8]. Real-life situations, e.g., social network, biological network, recommender system, etc., are better to be modeled in terms of graphical structure, as the information about individual entities is not enough to understand the whole system [31], [34]. The rich information about their collective activities is also needed to be captured. Thus, acquiring the euclidean representation of the nodes and the graphs for solving machine learning tasks on the graph has become a fascinating area of research in recent years. Graph Neural Network (GNN) uses deep learning techniques to serve the purpose and has been proved to be extremely beneficial in many applications such as recognition, classification, clustering, prediction, and so on [25], [19], [10].

Related works and limitations: In GNN literature, most of the approaches follow more or less similar kind of method called ”Message Passing” scheme, where a GNN layer iteratively finds euclidean representation of a node by aggregating neighbours’ features and combining with existing node embedding (randomly initialized). Hence, the choice of two functions, namely Aggregate and Combine turns out to be crucial for this approach. We discuss some of the existing models from the GNN literature here. The GNN approach proposed by Scarselli et al. [24], is one of the earliest works in this domain. The approach recursively updates node latent representations by exchanging information with the neighbouring nodes until equilibrium is reached. The recurrent function is chosen to be a contraction mapping to ensure convergence. The Gated Graph Neural Network (GGNN)[2] uses a gated recurrent unit as the recurrent function and back-propagation through time (BPTT) for parameter learning. The approach does not require any condition on parameters to converge and thus reduces the number of steps. However, these models of GNN learning often find it difficult to work on larger graphs and may often suffer from the stability issue. Recently introduced Stochastic Steady-State Embedding (SSE) approach [4] uses a recurrent function which takes a weighted average of the states from previous steps and a new state to ensure the stability of the algorithm. The GraphSage model [11] overcomes the scalability issue by proposing a batch-training algorithm that samples a fixed-sized neighbourhood of a node to aggregate information. Among the various GNN models, the Graph Isomorphism Network (GIN) [32] is found to have the maximal representational power among all massage-passing technique based models. GIN achieves this by imposing a constraint on the functions used in the model, that is the Aggregate and Combine must be injective. As claimed in [32], both GIN and the Weisfeiler-Lehman test of graph isomorphism are equally powerful in a graph classification task.

Among the recent works, [3] shows that in order to discriminate between multisets of size n, at least n aggregators are needed. Therefore, this work proposes Principal Neighbourhood Aggregation (PNA), a novel architecture combining multiple aggregators to leverage the different discriminative ability of the various aggregators. Another state-of-the-art approach, Graph U-Nets (g-U-Nets)[9] proposes graph pooling (gPool) and unpooling (gUnpool), inverse operation of pooling similar to the notion of convolutions, and up-sampling operations on images. Image data is a special case of graph with nodes lying on 2D lattices. However, the aggregation and combination strategies as used by these models help GNN layer to primarily accommodate only local information from the surrounding of each node. To encode the feature of the higher-order neighbourhood of a node in its node embedding, two strategies can be applied [37]. The first one is to increase the iterations so that GNN learning process spreads all nodes’ information over the entire graph. The second strategy can be stacking more GNN layers. However, both strategies have some practical drawbacks. Increasing iterations will need a large number of training examples to train the model [36] and increasing the GNN layer will lead to vanishing gradient problem during training [17]. Though the generalized k-dimensional GNNs (k-GNNs) [20] take into account the higher-order structures by employing k-dimensional WL algorithm (k-WL), these only look into the overall feature information received from the neighbours. Consequently, similar to the other GNN models, the k-GNNs also ignore several interesting facts, such as through which path the information flows from a specific neighbour, that may be crucial for generating better embedding of the target node [23].

In order to encode the information flow within the graph, the existing Random-Walk based methods, such as Node2vec[10] and Struc2vec[22] have their own limitations. Node2vec performs graph classification task, but fails to maintain the structural equivalence within the graph. Two nodes with neighborhoods that are structurally similar but that are far apart will not have similar latent representations in Node2vec. On the other hand, Struc2vec only consider structural similarity but does not consider any feature related information. We would like to explain this with an example (see Fig. 1). Let, there are two triangles, i.e. complete graph of three nodes, one with three nodes of colour yellow and another one is with three nodes of colour green. As struc2vec only consider the structural similarity, the model will ignore the difference in feature of the nodes, i.e. the node colour, and as a result, these two triangles will be treated as equal. In a nutshell, we can say that, neither node2vec nor struc2vec can consider the feature as well as structural information of the graph simultaneously.

In this paper, we demonstrate that hierarchically analyzing the semantics behind graph structure can help GNN in better capturing the information flow from both local and higher order neighbours, while eliminating the need of increasing training samples and/or increasing layer in GNN model.

Contributions: Intending to address the above-discussed limitations of the existing models in learning global and local information together from a graph, in this paper, we propose GraMMy as a novel framework for hierarchical semantics-driven graph representation learning based on Micro-Macro analysis. GraMMy allows a flexible flow of information from higher order neighbourhood and ensures better capture of neighboring information. The hierarchical study is conducted using Locality Sensitive Hashing (LSH) as a micro–macro scalar while the semantics-driven analyses are accomplished by employing recurrent autoencoder-based context modeling scheme. We also theoretically explain how the proposed micro–macro analysis approach maintains a trade-off between information loss and flexibility of information-flow while dealing with macro and micro views of the graph structure. Thus, our major contributions are as follows.

•
We propose a GNN-based framework, GraMMy, which learns from graph by focusing on its different hierarchical levels through micro–macro analysis of the structure.
•
We theoretically investigate the variations of several statistical and network properties with the change in abstraction levels of the graph structure.
•
We capture semantics through context generation, which magnifies the flow of the information passing through various nodes of the graph in different abstraction levels.
•
We empirically evaluate our model concerning graph classification task on six benchmark datasets.

Experimental results show an average 2%–7% improvement in classification accuracy over various datasets, while adopting GraMMy as a graph representation learning framework.

Section snippets

Proposed Framework: GraMMy

An overview of the proposed framework (GraMMy) is shown in Fig. 2. As depicted in the figure, the framework is comprised of three key modules engaged in micro–macro analyses of the graph structure, semantic modeling of the graph nodes, and node embedding through flat message passing, respectively. Each module along with the relevant theoretical background is discussed in subsequent subsections.

Experimental Evaluation

We empirically validate our theoretical findings and evaluate GraMMy in comparison with state-of-the-art GNN models.

Conclusion

This paper has introduced a novel approach of graph representation learning based on the notion of hierarchical information extraction from higher-order neighbourhood. The idea is inspired by the human vision mechanism which studies an object from different levels of abstraction. Our model also offers a unique way of aggregating neighbouring node information in a context-aware fashion. After developing sufficient theoretical motivation, we have shown that our approach outperforms

CRediT authorship contribution statement

Sucheta Dawn: Conceptualization, Methodology, Software, Validation, Formal-analysis, Investigation, Data-curation, Writing-original-draft, Visualization. Monidipa Das: Conceptualization, Methodology, Writing-review-editing, Visualization, Supervision, Project-administration. Sanghamitra Bandyopadhyay: Writing-review-editing, Supervision, Funding-acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

The authors would like to acknowledge the SyMeC Project grant [BT/Med-II/NIBMG/SyMeC/2014/Vol. II] of the Department of Biotechnology (DBT), Govt. of India for the high performance computing system. We would also like to acknowledge support from J.C. Bose Fellowship [SB/S1/JCB- 033/2016 to S.B.] and INSPIRE Faculty Fellowship Research Grant [DST/INSPIRE/04/2019/001670 to M.D.] by the Department of Science and Technology, Govt. of India.

Sucheta Dawn is a Senior Research Fellow, currently working towards her Ph.D. degree from the Machine Intelligence Unit in Indian Statistical Institute, Kolkata, India. After completing B.Sc and M.Sc degrees in Mathematics, she received her M. Tech degree in Computer Science and Data Processing from Indian Institute of Technology, Kharagpur, India. Her research interests include Graph Neural Network, Recommender System, Machine Learning, and Deep Learning.

References (37)

Kurt Hornik et al.
Multilayer feedforward networks are universal approximators
Neural Networks
(1989)
Kurt Hornik
Approximation capabilities of multilayer feedforward networks
Neural Networks
(1991)
Sebastian A Rios et al.
Semantically enhanced network analysis for influencer identification in online social networks
Neurocomputing
(2019)
Karsten M Borgwardt, Cheng Soon Ong, Stefan Schönauer, SVN Vishwanathan, Alex J Smola, and Hans-Peter Kriegel. Protein...
Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua...
Gabriele Corso, Luca Cavalleri, Dominique Beaini, Pietro Liò, and Petar Veličković. Principal neighbourhood aggregation...
Hanjun Dai et al.
Learning steady-states of iterative algorithms over graphs
Asim Kumar Debnath et al.
Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. correlation with molecular orbital energies and hydrophobicity
Journal of Medicinal Chemistry
(1991)
Kim Dong-Young et al.
Customer degree centrality and supplier performance: the moderating role of resource dependence
Operations Management Research
(2020)
Otmar Ertl
Probminhash–a class of locality-sensitive hash algorithms for the (probability) jaccard similarity
IEEE Transactions on Knowledge and Data Engineering
(2020)

Fu. Sichao et al.

Hplapgcn: Hypergraph p-laplacian graph convolutional networks

Neurocomputing

(2019)

Hongyang Gao and Shuiwang Ji. Graph u-nets. In international conference on machine learning, pages 2083–2092. PMLR,...

Aditya Grover et al.

node2vec: Scalable feature learning for networks

Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. In Advances in neural...

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980,...

Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint...

Jure Leskovec et al.

Graphs over time: densification laws, shrinking diameters and possible explanations

Guohao Li et al.

Deepgcns: Can gcns go as deep as cnns?

Cited by (0)

Monidipa Das is currently a DST-INSPIRE Faculty at the Machine Intelligence Unit (MIU), in the Indian Statistical Institute (ISI) Kolkata, India. Previously she was a postdoctoral research fellow in the School of Computer Science and Engineering (SCSE), Nanyang Technological University (NTU), Singapore. She received her Ph.D. degree in computer science and engineering from the Indian Institute of Technology (IIT) Kharagpur, in 2018, and her M.E. degree in computer science and engineering from the Indian Institute of Engineering Science and Technology (IIEST), Shibpur, in 2013. Her research interests include spatial informatics, spatio-temporal data mining, soft computing, and machine learning. Dr. Das is member of the IEEE and the ACM.

Sanghamitra Bandyopadhyay received the Ph.D. degree in computer science from the Indian Statistical Institute (ISI), Kolkata, India, where she has been a Professor since 2007. She is currently the Director of ISI. She has authored or co-authored over 250 technical articles and has published five authored and edited books. Her current research interests include computational biology and bioinformatics, soft and evolutionary computation, pattern recognition, and data mining. Dr. Bandyopadhyay is a fellow of the Indian National Science Academy, the National Academy of Sciences, India, and the Indian National Academy of Engineering, India. She is a recipient of several prestigious awards, including the Humboldt Fellowship from Germany, ICTP Senior Associate, Trieste, Italy, and the Shanti Swarup Bhatnagar Prize in engineering science.

View full text

GraMMy: Graph representation learning based on micro–macro analysis

Abstract

Introduction

Section snippets

Proposed Framework: GraMMy

Experimental Evaluation

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgement

Neural Networks

Neural Networks

Neurocomputing

Learning steady-states of iterative algorithms over graphs

Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. correlation with molecular orbital energies and hydrophobicity

Journal of Medicinal Chemistry

Customer degree centrality and supplier performance: the moderating role of resource dependence

Operations Management Research

Probminhash–a class of locality-sensitive hash algorithms for the (probability) jaccard similarity

IEEE Transactions on Knowledge and Data Engineering

Hplapgcn: Hypergraph p-laplacian graph convolutional networks

Neurocomputing

node2vec: Scalable feature learning for networks

Graphs over time: densification laws, shrinking diameters and possible explanations

Deepgcns: Can gcns go as deep as cnns?