Heterogeneous graph neural networks with denoising for graph embeddings

doi:10.1016/j.knosys.2021.107899

Knowledge-Based Systems

Volume 238, 28 February 2022, 107899

https://doi.org/10.1016/j.knosys.2021.107899 Get rights and content

Abstract

With the increasing popularity of graph structures, Graph embedding, Which aims to project nodes into low dimensional space while preserving the topological structure information of graphs and the information of nodes themselves, Has attracted an increased amount of attention in recent years. most of the embedding methods based on heterogeneous graphs use a meta-path guided random walk to capture the semantic and structural correlation between different types of nodes in the graph. despite the success of the meta-path-guided heterogeneous graph embedding method, The choice of meta-path is still an open and challenging problem. the design of the meta-path scheme largely depends on domain knowledge. in this paper, We propose a heterogeneous graph neural network with denoising (HGNND) to handle the issue. considering that there are different types of nodes in heterogeneous graphs, And their features are usually distributed in different spaces, The HGNND projects features of different types of nodes into a common vector space. then, The whole heterogeneous graph is input into the graph neural network to aggregate the neighbor node information and capture the structure information of the heterogeneous graph. finally, The noise nodes that may affect the performance of the whole model are filtered out by the denoising operation. extensive experiments on three real-world datasets demonstrate that our proposed model achieves state-of-the-art performance, It further proves that the model can still effectively aggregate semantic information without using meta-paths.

Introduction

Hardware systems [1], [2], social networks, and computer systems are examples of various real-life systems with different components that interact with each other. In these systems, the interacting components can be abstracted as information networks [3]. The information network is ubiquitous and has become an essential part of modern information infrastructure. Most research models information networks as homogeneous information networks (also known as homogeneous graphs, containing the same type of objects and links) for better analysis and mining [4], [5]. The homogeneous graph modeling method often extracts only part of the information in the actual interaction system or does not distinguish the heterogeneity of objects and their relationships, resulting in irreversible information loss. In recent years, an increasing number of researchers have modeled multitype and interconnected network data as heterogeneous information networks [6] (also known as a heterogeneous graph, HG) to achieve a more complete and natural abstraction of real world data. Using heterogeneous graphs to model rich data with complex interactions can result in the retention of more comprehensive semantic and structural information. Many studies on HG data have been successfully applied in real-world systems, such as anomaly detection [7], community detection [8], [9], recommendation systems [10], transfer learning [11], and text analysis [12].

Due to the ubiquity of HG data, how to learn embeddings of HG is a crucial research problem in various graph analysis applications [6], e.g., node/graph classification [13], [14] and node clustering [15].

The purpose of traditional graph neural networks, such as GCN [16] and GAT [17], is to obtain the embedding of nodes by aggregating the neighborhood information to adapt to the downstream tasks (such as node classification). However, these methods are designed based on homogeneous graphs. They do not consider the diversity of node types and the heterogeneity of connections between nodes, and direct application to heterogeneous graphs will inevitably result in information loss. To capture the rich semantic information and structural characteristics in the HG, many related works in recent years have adopted meta-path-related semantic exploration methods, such as Metapath2vec [13], which uses meta-path guided random walks to retain the semantic and structural correlation between different types of nodes. A meta-path is a predefined relationship pattern between nodes used to capture the specific semantic relationship between nodes. For example, Fig. 1 shows an HG, which contains four types of nodes: Author (A), Paper (P), Venue (V), and Term (T); and three types of relations: “publish”, “contain”, and “write”. Fig. 2 displays several meta-paths on this HG: APA, PTP and APVPA. Meta-path APA means there is co-authorship between two authors, meta-path PTP describes the co-term structure between two papers, and meta-path APVPA means the papers written by two authors were published in the same conference (journal).

At present, most graph embedding methods use meta-paths to extract features and substructures. These methods often assume a set of given or enumerable meta-paths and then use them to calculate similarity or graph embedding. However, they still face the dilemma of meta-path selection: (1) The choice of meta-path largely depends on domain knowledge. As shown in Fig. 2, for unfamiliar or complex heterogeneous graphs, it is not easy to choose the appropriate meta-path set based on domain knowledge. (2) With the increase of metapath length, the number of paths increases exponentially, making the path search process very expensive. (3) Simply splicing the information of all kinds of meta-paths will introduce noise and affect the performance. Learning the appropriate weight for each meta-path often requires supervision information.

To meet the above challenges, there are two methods to solve this problem: one is to generate meta-path automatically [18], and the other is not to use meta-path for data mining [19]. Meta-paths must be designed for different datasets to obtain specific semantic information. To study a general heterogeneous graph embedding model framework, we explore how to generate node embeddings in heterogeneous graphs by a graph neural network without using a meta-path.

However, it is not easy to generate node embeddings in heterogeneous graphs by graph neural networks without using meta-paths. This requires us to address the following fundamental problems:

(1) How to apply heterogeneous graphs directly to conventional graph neural networks based on homogeneous graph design. Traditional graph neural networks, such as GCN [16], GAT [17], etc., are mainly based on a homogeneous graph design. If they are directly applied to a heterogeneous graph, it will easily cause information loss and noise.

(2) How to filter out noise node information aggregated by a conventional graph neural network. Unlike methods based on meta-path, conventional graph neural networks cannot mine specific semantic information in heterogeneous graphs, so the node embeddings generated by graph neural networks are susceptible to interference from noisy neighbor nodes. Therefore, we need to consider how to effectively identify and filter out these meaningless noisy nodes so that there is no interference in the model.

In this paper, we attempt to investigate graph neural networks to aggregate neighbor node information, generate feature representations of nodes in heterogeneous graphs without using meta-paths, and propose a general model named HGNND. Based on node embeddings generated by a heterogeneous graph neural network, the HGNND optimizes the embedding via node pairs sampled from the HG. In particular, different types of nodes in the heterogeneous graph are mapped into a common vector space through a feature projection matrix. At the same time, the denoising module will identify and remove noise node pairs to ensure that the model is not interfered with by meaningless semantic information. Our significant contributions are highlighted as follows:

$•$ To the best of our knowledge, we make the first attempt to use a graph neural network to aggregate the neighborhood information of HGs to obtain node embeddings without using meta-paths, which retains the high-order structure information of heterogeneous graphs but also eliminates the problem of meta-path selection.

$•$ We propose a novel HGNND model, a general model that does not need to be designed for a specific dataset. Some subtle designs, such as feature mapping and denoising, are proposed to address the disadvantage that graph neural networks cannot effectively mine the hidden information in heterogeneous graphs.

$•$ We conduct extensive experiments on three real-world datasets to validate the effectiveness of the HGNND model compared with the state-of-the-art methods.

Section snippets

Related work

A heterogeneous graph, also known as a heterogeneous information network, composed of different types of entity nodes and relationships, is an abstraction of real world data. Our work is related to network embedding, which assigns nodes in a network to low-dimensional representations and effectively preserves the network structure [19].

Preliminaries

In this section, we introduce some basic concepts and definitions of heterogeneous graph embedding.

Definition 1

Heterogeneous Graph (HG) [31]

An HG is defined as a graph G $=$ (V, E, T, $ϕ$ , $φ$ ), in which V and E are the sets of nodes and edges, respectively. Each node v and edge e are associated with their type mapping functions $ϕ$ : V $\to$ $T_{V}$ and $φ$ : E $\to$ $T_{E}$ , respectively, and $T_{V}$ and $T_{E}$ denote the sets of node and edge types, respectively, where $| T_{V} |$ $+$ $| T_{E} |$ > 2, and T $=$ $T_{V}$ $\cup$ $T_{E}$ . If $| T_{V} |$ $+$ $| T_{E} | = 2$ , there is only one type of node and one

Proposed method

This section presents a novel heterogeneous graph neural network with denoising (HGNND), a general heterogeneous graph representation learning framework. Through feature mapping and node-level aggregation operations to capture the rich semantics implied in the heterogeneous graph, the captured noise semantic information is removed by a denoising operation to make the model more robust. The framework HGNND we proposed is illustrated in Fig. 3, where the circles of different colors represent the

Experiments

In this section, we conduct extensive experiments, including clustering and classification, to validate the effectiveness of the HGNND.

Conclusion

Although meta-paths play an essential role in mining the hidden semantic information of heterogeneous graphs, the choice of meta-paths and the integration of meta-path information are still open problems. In this paper, we make the first attempt to capture relevant information in heterogeneous graphs without using the meta-path correlation method and propose the HGNND model. In the HGNND model, the node feature mapping and node feature aggregation module are proposed to learn the embedding of

Future work

Our proposed HGNND model is a general self-supervised heterogeneous graph embedding model framework. In the feature aggregation step, we can use any graph neural network method that uses an adjacency matrix and feature matrix to aggregate node features to generate node embeddings. We have used GCN [16] and GAT [17] in the feature aggregation module in our work. In future work, we can use suitable heterogeneous graph neural networks in the feature aggregation module, such as HAN [26] and HeCo

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported by a grant from the Natural Science Foundation of China (No. 61976124 and 62072070) and Social and Science Foundation of Liaoning Province (No. L20BTQ008).

References (38)

ChenJ. et al.
Highly parallelized memristive binary neural network
Neural Netw.
(2021)
SunY. et al.
Mining heterogeneous information networks: A structural analysis approach
Acm Sigkdd Explor. Newsl.
(2013)
WenS. et al.
Ckfo: Convolution kernel first operated algorithm with applications in memristor-based convolutional neural network
IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.
(2020)
R.N. Lichtenwalter, J.T. Lussier, N.V. Chawla, New perspectives and methods in link prediction, in: Proceedings of the...
V. Leroy, B.B. Cambazoglu, F. Bonchi, Cold start link prediction, in: Proceedings of the 16th ACM SIGKDD international...
WangX. et al.
A survey on heterogeneous graph embedding: Methods, techniques, applications and sources
(2020)
MaX. et al.
A comprehensive survey on graph anomaly detection with deep learning
IEEE Trans. Knowl. Data Eng.
(2021)
SuX. et al.
A comprehensive survey on community detection with deep learning
(2021)
LiuF. et al.
Deep learning for community detection: progress, challenges and opportunities
(2020)
XuJ. et al.
Gemini: A novel and universal heterogeneous graph information fusing framework for online recommendations

LuJ. et al.

Fuzzy multiple-source transfer learning

IEEE Trans. Fuzzy Syst.

(2019)

H. Linmei, T. Yang, C. Shi, H. Ji, X. Li, Heterogeneous graph attention networks for semi-supervised short text...

DongY. et al.

Metapath2vec: Scalable Representation Learning for Heterogeneous Networks

(2017)

FuT.Y. et al.

Hin2vec: Explore meta-paths in heterogeneous information networks for representation learning

LiX. et al.

Spectral clustering in heterogeneous information networks

Proc. AAAI Conf. Artif. Intell.

(2019)

KipfT.N. et al.

Semi-supervised classification with graph convolutional networks

(2017)

VelikoviP. et al.

Graph attention networks

(2017)

WangC. et al.

Unsupervised meta-path selection for text similarity measure based on heterogeneous information networks

Data Min. Knowl. Discov.

(2018)

J. Zhao, X. Wang, C. Shi, Z. Liu, Y. Ye, Network schema preserving heterogeneous information network embedding, in: C....

Cited by (11)

Representing and discovering heterogeneous interactions for financial risk assessment of SMEs
2024, Expert Systems with Applications
Small and medium-sized enterprises (SMEs) generally have weak risk management abilities, yet their financial risk assessment faces unique challenges due to limited information about those firms. The emergence of network data constructed from the interactions between the SMEs and other stakeholders creates an opportunity to enhance the assessment of SMEs’ financial risks. While existing studies have extracted features from the network data to utilize structural information, they overlook the heterogeneous interactions among the focal firms and other entities in the network. This research aims to fill the above gaps by proposing a represent-then-discover framework based on graph neural networks. The framework comprises an entity-importance based graph contrastive model (EIGC) that represents firms as the embeddings of their heterogeneous interactions and a novel subgraph-distillation based frequent interaction mining model (SDIM) that discovers heterogeneous interactions that signal firms with or without financial risks. The firm embeddings are first trained following an unsupervised paradigm and then used to build models for predicting financial risks. Our empirical evaluation results not only demonstrate the superior performance of EIGC over state-of-the-art methods but also discover unique interaction patterns that distinguish normal SMEs compared to financial risk SMEs. This study contributes an effective framework for assessing and explaining SMEs’ financial risk in the fintech era. The proposed EIGC provides a promising avenue for constructing predictive features by leveraging network data and graph neural networks.
Interpretable answer retrieval based on heterogeneous network embedding
2024, Pattern Recognition Letters
Community question answering is a rising technology based on users' autonomous interactive behaviors, such as posting their issues, answering questions based on their experience, and commenting on existing questions. As a result of its use of natural language for communication and stimulation of user interest in information sharing, it has increasingly taken the place of other channels as the main way that people learn new things. Multi-type entity characteristics fusion and poor answer interpretability are the two major concerns that currently plague community answer prediction research. The Interpretable Answer Retrieval Method Based on Heterogeneous Network Embedding (IARHNE) is what we present in this work. It combines complex entity features and generates interpretable predicted answers. In order to incorporate the interactions of several kinds of individuals in answer social retrieval, we first build a heterogeneity graph. In order to acquire entity embeddings, we secondly use the heterogeneous graph neural network. We then adopt the vector distance to convert the entity matching problem in the heterogeneous information network into a homogeneous node similarity job. Finally, using entity correlation to predict answers, we provide a list of answers to the new query and interpret them using meta-paths. Comparative studies using three authentic datasets demonstrate the benefits of IARHNE for interpretative question-answering research.
HMSG: Heterogeneous graph neural network based on Metapath SubGraph learning
2023, Knowledge-Based Systems
Heterogeneous graph neural network (HGNN) models, capable of learning low-dimensional dense vectors from heterogeneous graphs for downstream graph-mining tasks, have attracted increasing attention in recent years. For these models, metapath-based methods have been widely adopted. However, most existing metapath-based HGNN models either discard intermediate nodes within a metapath, resulting in information loss, or indiscriminately aggregate information along a metapath containing different types of nodes, resulting in unavoidable learning bias. To overcome these limitations, a new HGNN model named HMSG, is proposed in this paper to comprehensively capture structural, semantic and attribute information from both homogeneous and heterogeneous neighbors more purposefully. To achieve this, a type-specific linear transformation is first applied to transfer the node attributes to different types of nodes with the same latent factor space. In the new model, the heterogeneous graph is decomposed into multiple metapath-based homogeneous and heterogeneous subgraphs where each subgraph associates specific semantic and structural information; this is different from existing models, which mainly rely on symmetric metapaths. Subsequently, tailored attention-based message aggregation methods are independently applied to each subgraph such that information learning can be more targeted. Finally, information from different subgraphs is fused through graph-level attention to obtain a complete representation. The learned representations are evaluated by several graph-mining tasks. Results indicate that the HMSG attains the best performance in all evaluation metrics than state-of-the-art baselines. Further ablation experiments demonstrate the effectiveness of the modules designed for the HMSG.
OSGNN: Original graph and Subgraph aggregated Graph Neural Network
2023, Expert Systems with Applications
Heterogeneous Graph Embedding (HGE) is receiving a great attention from researchers, as it can be widely and effectively used to solve problems from various real-world applications. The existing HGE models mainly learn node representation directly on the whole heterogeneous graph by aggregating neighboring information, which unavoidably leads to the loss of useful high-order information. Another mainstream is to split heterogeneous graphs into different homogeneous subgraphs and then learn representations separately. However, this isolated handling way is prone to the loss of important interactions between the nodes of the same type. To address the above challenging but interesting problems, we propose an Original graph and Subgraph aggregated Graph Neural Network (OSGNN). Specifically, we first split the original heterogeneous graph into several subgraphs, and then weighted combine them to get a new meaningful homogeneous graph. Finally, the first-order and high-order information of the target node are learned from the original heterogeneous graph and the homogeneous subgraph respectively and concatenated as the final node representation. Extensive experiments on three real-world heterogeneous graphs demonstrate that the proposed framework significantly outperforms the state-of-the-art methods. The source codes of this work are available on https://github.com/ZZY-GraphMiningLab/OSGNN.
A meta-path graph-based graph homogenization framework for machine fault diagnosis
2023, Engineering Applications of Artificial Intelligence
Graph data-driven methods have swept the field of machine fault diagnosis by merits of modeling relationships between samples. Their performance is highly affected by the constructed graphs quality. Compared to the single-sensor data, multi-sensor data can provide more information, so as to construct higher-quality graphs. However, existing graph data-driven diagnosis methods using multiple sensors still have two limitations. Firstly, heterogeneous multi-sensor data are mainly processed as homogeneous data, ignoring the heterogeneity of heterogeneous multi-sensor data. Secondly, the heterogeneous graph is often with a complex graph structure, and consumes much computational cost to learn. To overcome these limitations, A meta-path graph-based graph homogenization framework for machine fault diagnosis is proposed. Heterogeneous multi-sensor data are converted into the heterogeneous graph, modeling the heterogeneity of heterogeneous multi-sensor data. Further, instead of directly inputting the heterogeneous graph into graph deep learning model, a heterogeneous graph homogenization framework is designed to generate a meta-path graph, reducing the complexity of graph structure and improving the graph quality. Finally, a graph convolutional network is used for graph feature learning, obtaining the diagnosis results. Verification experiments show that the proposed method performs better than machine learning-based and graph deep learning-based methods. In addition, discussive experiments show that the meta-path graph is with lower complexity in graph structure and a higher clustering accuracy than single-sensor data-based K-nearest neighborhood graph.
A robust feature reinforcement framework for heterogeneous graphs neural networks
2023, Future Generation Computer Systems
Citation Excerpt :
Recently, many well-performing models for heterogeneous graph neural networks (HGNNs) have emerged to address the limitations of homogeneous graphs methods. One of the most mainstream research directions in HGNNs focuses on learning good methods to obtain informative low-dimensional embeddings and some models [21–23] achieve very good results. In addition to graph embedding, some researchers have devoted effort to learning better and more precise network topologies through a combination of examining the node features and an adjacency matrix of the heterogeneous graph [24–27].
In the real world, various kinds of data are able to be represented as heterogeneous graph structures. Heterogeneous graphs with multi-typed nodes and edges contain rich messages of heterogeneity and complex semantic information. Recently, diverse heterogeneous graph neural networks (HGNNs) have emerged to solve a range of tasks in this advanced area, such as node classification, knowledge graphs, etc. Heterogeneous graph embedding is a crucial step in HGNNs. It aims to embed rich information from heterogeneous graphs into low-dimensional eigenspaces to improve the performance of downstream tasks. Yet existing methods only project high-dimensional node features into the same low-dimensional space and subsequently aggregate those heterogeneous features directly. This approach ignores the balance between the informative dimensions and the redundant dimensions in the hidden layers. Further, after the dimensionality has been reduced, all kinds of nodes features are projected into the same eigenspace but in a mixed up fashion. One final problem with HGNNs is that their experimental results are always unstable and not reproducible. To solve these issues, we design a general framework named Robust Feature Reinforcement (RFR) for HGNNs to optimize embedding performance. RFR consists of three mechanisms: separate mapping, co-segregating and population-based bandits. The separate mapping mechanism improves the ability to preserve the most informative dimensions when projecting high-dimensional vectors into a low-dimensional eigenspace. The co-segregating mechanism minimizes the contrastive loss to ensure there is a distinction between the features extracted from different types of nodes in the latent feature layers. The population-based bandits mechanism further assures the stability of the experimental results with classification tasks. Supported by rigorous experimentation on three datasets, we assessed the performance of the designed framework and can verify that our models outperform the current state-of-the-arts.

View all citing articles on Scopus

View full text

Heterogeneous graph neural networks with denoising for graph embeddings

Abstract

Introduction

Section snippets

Related work

Preliminaries

Heterogeneous Graph (HG) [31]

Proposed method

Experiments

Conclusion

Future work

Declaration of Competing Interest

Acknowledgments

Neural Netw.

Acm Sigkdd Explor. Newsl.

Ckfo: Convolution kernel first operated algorithm with applications in memristor-based convolutional neural network

IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

A survey on heterogeneous graph embedding: Methods, techniques, applications and sources

A comprehensive survey on graph anomaly detection with deep learning

IEEE Trans. Knowl. Data Eng.

A comprehensive survey on community detection with deep learning

Deep learning for community detection: progress, challenges and opportunities

Gemini: A novel and universal heterogeneous graph information fusing framework for online recommendations

Fuzzy multiple-source transfer learning

IEEE Trans. Fuzzy Syst.

Metapath2vec: Scalable Representation Learning for Heterogeneous Networks

Hin2vec: Explore meta-paths in heterogeneous information networks for representation learning

Spectral clustering in heterogeneous information networks

Proc. AAAI Conf. Artif. Intell.

Semi-supervised classification with graph convolutional networks

Graph attention networks

Unsupervised meta-path selection for text similarity measure based on heterogeneous information networks

Data Min. Knowl. Discov.