Elsevier

Knowledge-Based Systems

Volume 238, 28 February 2022, 107899
Knowledge-Based Systems

Heterogeneous graph neural networks with denoising for graph embeddings

https://doi.org/10.1016/j.knosys.2021.107899Get rights and content

Abstract

With the increasing popularity of graph structures, Graph embedding, Which aims to project nodes into low dimensional space while preserving the topological structure information of graphs and the information of nodes themselves, Has attracted an increased amount of attention in recent years. most of the embedding methods based on heterogeneous graphs use a meta-path guided random walk to capture the semantic and structural correlation between different types of nodes in the graph. despite the success of the meta-path-guided heterogeneous graph embedding method, The choice of meta-path is still an open and challenging problem. the design of the meta-path scheme largely depends on domain knowledge. in this paper, We propose a heterogeneous graph neural network with denoising (HGNND) to handle the issue. considering that there are different types of nodes in heterogeneous graphs, And their features are usually distributed in different spaces, The HGNND projects features of different types of nodes into a common vector space. then, The whole heterogeneous graph is input into the graph neural network to aggregate the neighbor node information and capture the structure information of the heterogeneous graph. finally, The noise nodes that may affect the performance of the whole model are filtered out by the denoising operation. extensive experiments on three real-world datasets demonstrate that our proposed model achieves state-of-the-art performance, It further proves that the model can still effectively aggregate semantic information without using meta-paths.

Introduction

Hardware systems [1], [2], social networks, and computer systems are examples of various real-life systems with different components that interact with each other. In these systems, the interacting components can be abstracted as information networks [3]. The information network is ubiquitous and has become an essential part of modern information infrastructure. Most research models information networks as homogeneous information networks (also known as homogeneous graphs, containing the same type of objects and links) for better analysis and mining [4], [5]. The homogeneous graph modeling method often extracts only part of the information in the actual interaction system or does not distinguish the heterogeneity of objects and their relationships, resulting in irreversible information loss. In recent years, an increasing number of researchers have modeled multitype and interconnected network data as heterogeneous information networks [6] (also known as a heterogeneous graph, HG) to achieve a more complete and natural abstraction of real world data. Using heterogeneous graphs to model rich data with complex interactions can result in the retention of more comprehensive semantic and structural information. Many studies on HG data have been successfully applied in real-world systems, such as anomaly detection [7], community detection [8], [9], recommendation systems [10], transfer learning [11], and text analysis [12].

Due to the ubiquity of HG data, how to learn embeddings of HG is a crucial research problem in various graph analysis applications [6], e.g., node/graph classification [13], [14] and node clustering [15].

The purpose of traditional graph neural networks, such as GCN [16] and GAT [17], is to obtain the embedding of nodes by aggregating the neighborhood information to adapt to the downstream tasks (such as node classification). However, these methods are designed based on homogeneous graphs. They do not consider the diversity of node types and the heterogeneity of connections between nodes, and direct application to heterogeneous graphs will inevitably result in information loss. To capture the rich semantic information and structural characteristics in the HG, many related works in recent years have adopted meta-path-related semantic exploration methods, such as Metapath2vec [13], which uses meta-path guided random walks to retain the semantic and structural correlation between different types of nodes. A meta-path is a predefined relationship pattern between nodes used to capture the specific semantic relationship between nodes. For example, Fig. 1 shows an HG, which contains four types of nodes: Author (A), Paper (P), Venue (V), and Term (T); and three types of relations: “publish”, “contain”, and “write”. Fig. 2 displays several meta-paths on this HG: APA, PTP and APVPA. Meta-path APA means there is co-authorship between two authors, meta-path PTP describes the co-term structure between two papers, and meta-path APVPA means the papers written by two authors were published in the same conference (journal).

At present, most graph embedding methods use meta-paths to extract features and substructures. These methods often assume a set of given or enumerable meta-paths and then use them to calculate similarity or graph embedding. However, they still face the dilemma of meta-path selection: (1) The choice of meta-path largely depends on domain knowledge. As shown in Fig. 2, for unfamiliar or complex heterogeneous graphs, it is not easy to choose the appropriate meta-path set based on domain knowledge. (2) With the increase of metapath length, the number of paths increases exponentially, making the path search process very expensive. (3) Simply splicing the information of all kinds of meta-paths will introduce noise and affect the performance. Learning the appropriate weight for each meta-path often requires supervision information.

To meet the above challenges, there are two methods to solve this problem: one is to generate meta-path automatically [18], and the other is not to use meta-path for data mining [19]. Meta-paths must be designed for different datasets to obtain specific semantic information. To study a general heterogeneous graph embedding model framework, we explore how to generate node embeddings in heterogeneous graphs by a graph neural network without using a meta-path.

However, it is not easy to generate node embeddings in heterogeneous graphs by graph neural networks without using meta-paths. This requires us to address the following fundamental problems:

(1) How to apply heterogeneous graphs directly to conventional graph neural networks based on homogeneous graph design. Traditional graph neural networks, such as GCN [16], GAT [17], etc., are mainly based on a homogeneous graph design. If they are directly applied to a heterogeneous graph, it will easily cause information loss and noise.

(2) How to filter out noise node information aggregated by a conventional graph neural network. Unlike methods based on meta-path, conventional graph neural networks cannot mine specific semantic information in heterogeneous graphs, so the node embeddings generated by graph neural networks are susceptible to interference from noisy neighbor nodes. Therefore, we need to consider how to effectively identify and filter out these meaningless noisy nodes so that there is no interference in the model.

In this paper, we attempt to investigate graph neural networks to aggregate neighbor node information, generate feature representations of nodes in heterogeneous graphs without using meta-paths, and propose a general model named HGNND. Based on node embeddings generated by a heterogeneous graph neural network, the HGNND optimizes the embedding via node pairs sampled from the HG. In particular, different types of nodes in the heterogeneous graph are mapped into a common vector space through a feature projection matrix. At the same time, the denoising module will identify and remove noise node pairs to ensure that the model is not interfered with by meaningless semantic information. Our significant contributions are highlighted as follows:

To the best of our knowledge, we make the first attempt to use a graph neural network to aggregate the neighborhood information of HGs to obtain node embeddings without using meta-paths, which retains the high-order structure information of heterogeneous graphs but also eliminates the problem of meta-path selection.

We propose a novel HGNND model, a general model that does not need to be designed for a specific dataset. Some subtle designs, such as feature mapping and denoising, are proposed to address the disadvantage that graph neural networks cannot effectively mine the hidden information in heterogeneous graphs.

We conduct extensive experiments on three real-world datasets to validate the effectiveness of the HGNND model compared with the state-of-the-art methods.

Section snippets

Related work

A heterogeneous graph, also known as a heterogeneous information network, composed of different types of entity nodes and relationships, is an abstraction of real world data. Our work is related to network embedding, which assigns nodes in a network to low-dimensional representations and effectively preserves the network structure [19].

Preliminaries

In this section, we introduce some basic concepts and definitions of heterogeneous graph embedding.

Definition 1

Heterogeneous Graph (HG) [31]

An HG is defined as a graph G = (V, E, T, ϕ, φ), in which V and E are the sets of nodes and edges, respectively. Each node v and edge e are associated with their type mapping functions ϕ: V TV and φ: E TE, respectively, and TV and TE denote the sets of node and edge types, respectively, where |TV| + |TE| > 2, and T = TV TE. If |TV| + |TE|=2, there is only one type of node and one

Proposed method

This section presents a novel heterogeneous graph neural network with denoising (HGNND), a general heterogeneous graph representation learning framework. Through feature mapping and node-level aggregation operations to capture the rich semantics implied in the heterogeneous graph, the captured noise semantic information is removed by a denoising operation to make the model more robust. The framework HGNND we proposed is illustrated in Fig. 3, where the circles of different colors represent the

Experiments

In this section, we conduct extensive experiments, including clustering and classification, to validate the effectiveness of the HGNND.

Conclusion

Although meta-paths play an essential role in mining the hidden semantic information of heterogeneous graphs, the choice of meta-paths and the integration of meta-path information are still open problems. In this paper, we make the first attempt to capture relevant information in heterogeneous graphs without using the meta-path correlation method and propose the HGNND model. In the HGNND model, the node feature mapping and node feature aggregation module are proposed to learn the embedding of

Future work

Our proposed HGNND model is a general self-supervised heterogeneous graph embedding model framework. In the feature aggregation step, we can use any graph neural network method that uses an adjacency matrix and feature matrix to aggregate node features to generate node embeddings. We have used GCN [16] and GAT [17] in the feature aggregation module in our work. In future work, we can use suitable heterogeneous graph neural networks in the feature aggregation module, such as HAN [26] and HeCo 

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported by a grant from the Natural Science Foundation of China (No. 61976124 and 62072070) and Social and Science Foundation of Liaoning Province (No. L20BTQ008).

References (38)

  • ChenJ. et al.

    Highly parallelized memristive binary neural network

    Neural Netw.

    (2021)
  • SunY. et al.

    Mining heterogeneous information networks: A structural analysis approach

    Acm Sigkdd Explor. Newsl.

    (2013)
  • WenS. et al.

    Ckfo: Convolution kernel first operated algorithm with applications in memristor-based convolutional neural network

    IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

    (2020)
  • R.N. Lichtenwalter, J.T. Lussier, N.V. Chawla, New perspectives and methods in link prediction, in: Proceedings of the...
  • V. Leroy, B.B. Cambazoglu, F. Bonchi, Cold start link prediction, in: Proceedings of the 16th ACM SIGKDD international...
  • WangX. et al.

    A survey on heterogeneous graph embedding: Methods, techniques, applications and sources

    (2020)
  • MaX. et al.

    A comprehensive survey on graph anomaly detection with deep learning

    IEEE Trans. Knowl. Data Eng.

    (2021)
  • SuX. et al.

    A comprehensive survey on community detection with deep learning

    (2021)
  • LiuF. et al.

    Deep learning for community detection: progress, challenges and opportunities

    (2020)
  • XuJ. et al.

    Gemini: A novel and universal heterogeneous graph information fusing framework for online recommendations

  • LuJ. et al.

    Fuzzy multiple-source transfer learning

    IEEE Trans. Fuzzy Syst.

    (2019)
  • H. Linmei, T. Yang, C. Shi, H. Ji, X. Li, Heterogeneous graph attention networks for semi-supervised short text...
  • DongY. et al.

    Metapath2vec: Scalable Representation Learning for Heterogeneous Networks

    (2017)
  • FuT.Y. et al.

    Hin2vec: Explore meta-paths in heterogeneous information networks for representation learning

  • LiX. et al.

    Spectral clustering in heterogeneous information networks

    Proc. AAAI Conf. Artif. Intell.

    (2019)
  • KipfT.N. et al.

    Semi-supervised classification with graph convolutional networks

    (2017)
  • VelikoviP. et al.

    Graph attention networks

    (2017)
  • WangC. et al.

    Unsupervised meta-path selection for text similarity measure based on heterogeneous information networks

    Data Min. Knowl. Discov.

    (2018)
  • J. Zhao, X. Wang, C. Shi, Z. Liu, Y. Ye, Network schema preserving heterogeneous information network embedding, in: C....
  • Cited by (11)

    • A meta-path graph-based graph homogenization framework for machine fault diagnosis

      2023, Engineering Applications of Artificial Intelligence
    • A robust feature reinforcement framework for heterogeneous graphs neural networks

      2023, Future Generation Computer Systems
      Citation Excerpt :

      Recently, many well-performing models for heterogeneous graph neural networks (HGNNs) have emerged to address the limitations of homogeneous graphs methods. One of the most mainstream research directions in HGNNs focuses on learning good methods to obtain informative low-dimensional embeddings and some models [21–23] achieve very good results. In addition to graph embedding, some researchers have devoted effort to learning better and more precise network topologies through a combination of examining the node features and an adjacency matrix of the heterogeneous graph [24–27].

    View all citing articles on Scopus
    View full text