Elsevier

Knowledge-Based Systems

Volume 264, 15 March 2023, 110343
Knowledge-Based Systems

HINChip: Heterogeneous Information Network Representation with Community Hierarchy Preserving

https://doi.org/10.1016/j.knosys.2023.110343Get rights and content

Abstract

Heterogeneous information network (HIN) representation aims to learn low-dimensional representations for multiple types of nodes and edges while preserving rich structural and semantic information. Existing HIN embedding methods mainly utilize meta-path, meta-graph, and network schema to guide the learned representations. However, these methods may ignore the community hierarchy information in the information network, which is commonly available in many applications. In this paper, we propose HINChip, a new approach for learning representations of Heterogeneous Information Networks with Community Hierarchy Preserving, including intra-community structures and inter-community relationships. Specifically, multiple homogeneous sub-networks are first constructed based on multiple semantics. Second, hierarchical networks are built from fine to coarse on each homogeneous sub-network, preserving intra-community structures and inter-community relationships on the corresponding semantics. The nodes learn the corresponding vectors from each level of the hierarchical networks by the existing representation learning method. Third, multi-level-integrated semantic attention is proposed to combine the node vectors of each level and weigh the importance of different homogeneous sub-networks under given semantics. The proposed HINChip considers only the topology of HIN and outperforms methods trained with node attributes and labels in the experiments conducted on four real-world datasets.

Introduction

Heterogeneous information network [1] is the kind of network that contains multiple types of nodes and edges. It is not easy to analyze HIN compared to homogeneous networks because of the heterogeneity. HIN representation [2], [3], an essential cornerstone for heterogeneous network mining, has attracted more and more attention.

To embed HIN into a potentially low-dimensional space, some recent research works [4], [5], [6], [7], [8], [9]. These works mainly deal with the heterogeneity of HIN through three structures of meta-path [1], meta-graph (meta-structure) [10], [11], [12] and network schema [13]. Usually, a meta-path is the composite relationship represented as a sequence of relations between two given nodes. The meta-graph, as another structure, can express more complex semantics than the meta-path. Unlike meta-path and meta-graph, which preserve only semantic relationships between two nodes, network schema focuses on the local structure among different node types. Sampling methods based on meta-paths, meta-graphs, and network schema adhere to a fixed combination of node types, which processes heterogeneity and maintains the uni-level structural information of nodes.

The multi-level structure of hierarchical communities portrays the relativities between nodes in a more comprehensive way, which current approaches ignore. Moreover, the hierarchical community characteristic is prevalent in HIN. Fig. 1(a) shows a toy example of academic HIN (Three types of nodes: Author(A), Paper(P), Venue(V); Two types of relations: “write/written” and “publish/published”). Under semantics “PVP” and “PAP”, there exist homogeneous sub-networks and corresponding hierarchical communities. Specifically, papers tend to form different communities due to heterogeneity (Fig. 1(b1), Fig. 1(b2)). In addition to intra-community structures, inter-community interactions also hold vital information. After coarsening the communities into super-nodes, the inter-community interactions are reflected as connected edges between super-nodes. And in the next level, super-nodes likewise tend to gather together to form new communities. This level-by-level aggregation characteristic represents the hierarchical community structure that exists between nodes shown in Fig. 1(c1) and Fig. 1(c2). Therefore, the problem we are currently confronting is preserving hierarchical communities in the process of HIN representation.

We face two challenges to solve this problem: (1) How to capture hierarchical communities? HIN contains more complex interactions between different types of nodes. It is essential to consider the influence of varying semantics when excavating the potential communities of nodes. (2) How to integrate node information of multi-levels under various semantics? The different semantics and hierarchical communities preserve information about the HIN from multiple aspects and levels. Integrating such information effectively is a difficult task.

In this paper, we propose a heterogeneous information network representation model with community hierarchy preservation named HINChip, which makes the first attempt to preserve hierarchical communities under multiple semantics. First, the similarities based on meta-paths are used to construct multiple homogeneous sub-networks whose weights of edges come from the computed similarities. Second, on each homogeneous sub-network, we recursively utilize a community detection method [14], [15] to construct a hierarchical network, so that solving the first challenge. The nodes from each level learn a set of embeddings by the existing unsupervised graph representation method. Third, we propose a multi-level-integrated semantic attention mechanism to integrate nodes’ information from various semantics and multiple levels for solving the second challenge.

The contributions of our work are as follows:

  • To the best of our knowledge, we are the first to study the hierarchical community when embedding HIN. And we construct hierarchical networks for capturing intra-community information and inter-community relationships in HIN.

  • For dealing with heterogeneity while preserving hierarchical communities, we incorporate the representations of nodes in multi-levels under different semantics by proposing the multi-level-integrated semantic attention mechanism.

  • Our method utilizes only topology for unsupervised training, and the experiment results on four datasets are even better than the methods considering attributes and supervised training methods with labels.

Section snippets

Related work

HIN representation based on meta-path. These approaches capture the structural information of HIN primarily through sampling based on meta-paths. Metapath2vec [16] utilizes meta-path guided random walks and heterogeneous skip-gram model to learn node embeddings. HHNE [17] introduces hyperbolic space to learn the similarity between nodes in random walk sequences based on meta-path. HERec [18] extracts nodes of the same type in meta-path guided random walk sequence to learn the heterogeneity in

Problem statement

This section introduces some definitions used in this paper and presents the task to be addressed.

Definition 1 Heterogeneous Information Network

A heterogeneous information network (HIN) is defined as a network G=(V,E,T,ϕ,φ), in which V and E are the sets of nodes and edges, respectively. It is also associated with a node type mapping function ϕ:VTV and an edge type mapping function φ:ETE, where TV and TE denote the sets of node and edge type and |TV|>=1 and |TE|>=1.

After that, different homogeneous sub-networks are constructed according

Hierarchical community preserving based HIN representation

We present our hierarchical community preserving based HIN representation method in this section. Fig. 2 shows the framework of the proposed HINChip. HINChip preserves uni-level information between directly linked nodes and multi-level information like hierarchical community, consisting of three main sections: (1) Constructing homogeneous sub-networks. (2) Preserving hierarchical community. (3) Multi-level-integrated semantic attention.

Experiments

This section describes five parts, datasets, baselines, node classification, node clustering, visualization, and ablation analysis of HINChip.

Conclusion

In this paper, we make the first attempt to study hierarchical communities in the process of HIN representation. Our HINChip captures both rich semantic information and high-order structural information that contains intra-community structure, even inter-community relationships. HINChip first constructs hierarchical networks under each semantics to preserve the hierarchical community structure. Then, integrating the different semantics and node information at each level through a

CRediT authorship contribution statement

Huanjing Zhao: Conceptualization, Methodology, Writing – original draft. Pinde Rui: Conceptualization, Methodology, Software. Jie Chen: Resources. Yanping Zhang: Project administration. Yi Wang: Supervision. Shu Zhao: Writing – review & editing, Project administration. Jie Tang: Writing – review & editing, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was partially supported by the National Natural Science Foundation of China (Grants # 61876001, #61602003 and # 61673020), the National High Technology Research and Development Program, China (Grant #2017YFB1401903), the Provincial Natural Science Foundation of Anhui Province, China (Grants # 1708085QF156). We also acknowledge the High-performance Computing Platform of Anhui University for providing computing resources.

References (38)

  • ZhaoZ. et al.

    An incremental method to detect communities in dynamic evolving social networks

    Knowl.-Based Syst.

    (2019)
  • SunY. et al.

    Pathsim: Meta path-based top-k similarity search in heterogeneous information networks

    Proc. VLDB Endow.

    (2011)
  • Y. Dong, Z. Hu, K. Wang, Y. Sun, J. Tang, Heterogeneous Network Representation Learning., in: 29th International Joint...
  • YangC. et al.

    Heterogeneous network representation learning: Survey, benchmark, evaluation, and beyond

    (2020)
  • L. Wang, C. Gao, C. Huang, R. Liu, W. Ma, S. Vosoughi, Embedding heterogeneous networks into hyperbolic space without...
  • P. Wang, K. Agarwal, C. Ham, S. Choudhury, C.K. Reddy, Self-Supervised Learning of Contextual Embeddings for Link...
  • Y. Lu, C. Shi, L. Hu, Z. Liu, Relation structure-aware heterogeneous information network embedding, in: Proceedings of...
  • ShiY. et al.

    Aspem: Embedding learning by aspects in heterogeneous information networks

    Proceedings of the 2018 SIAM International Conference on Data Mining

    (2018)
  • Y. Shi, Q. Zhu, F. Guo, C. Zhang, J. Han, Easing embedding learning by comprehensive transcription of heterogeneous...
  • LiZ. et al.

    TransN: Heterogeneous network representation learning by translating node embeddings

  • Z. Huang, Y. Zheng, R. Cheng, Y. Sun, N. Mamoulis, X. Li, Meta structure: Computing relevance in large heterogeneous...
  • H. Zhao, Q. Yao, J. Li, Y. Song, D.L. Lee, Meta-graph based recommendation fusion over heterogeneous information...
  • FangY. et al.

    Semantic proximity search on graphs with metagraph-based learning

  • J. Zhao, X. Wang, C. Shi, Z. Liu, Y. Ye, Network Schema Preserved Heterogeneous Information Network Embedding, in: 29th...
  • A.Y. Ng, M.I. Jordan, Y. Weiss, On spectral clustering: Analysis and an algorithm, in: Advances in Neural Information...
  • BlondelV.D. et al.

    Fast unfolding of communities in large networks

    J. Stat. Mech. Theory Exp.

    (2008)
  • Y. Dong, N.V. Chawla, A. Swami, metapath2vec: Scalable representation learning for heterogeneous networks, in:...
  • X. Wang, Y. Zhang, C. Shi, Hyperbolic heterogeneous information network embedding, in: Proceedings of the AAAI...
  • ShiC. et al.

    Heterogeneous information network embedding for recommendation

    IEEE Trans. Knowl. Data Eng.

    (2018)
  • Cited by (2)

    1

    These authors contributed equally to this work.

    View full text