HINChip: Heterogeneous Information Network Representation with Community Hierarchy Preserving
Introduction
Heterogeneous information network [1] is the kind of network that contains multiple types of nodes and edges. It is not easy to analyze HIN compared to homogeneous networks because of the heterogeneity. HIN representation [2], [3], an essential cornerstone for heterogeneous network mining, has attracted more and more attention.
To embed HIN into a potentially low-dimensional space, some recent research works [4], [5], [6], [7], [8], [9]. These works mainly deal with the heterogeneity of HIN through three structures of meta-path [1], meta-graph (meta-structure) [10], [11], [12] and network schema [13]. Usually, a meta-path is the composite relationship represented as a sequence of relations between two given nodes. The meta-graph, as another structure, can express more complex semantics than the meta-path. Unlike meta-path and meta-graph, which preserve only semantic relationships between two nodes, network schema focuses on the local structure among different node types. Sampling methods based on meta-paths, meta-graphs, and network schema adhere to a fixed combination of node types, which processes heterogeneity and maintains the uni-level structural information of nodes.
The multi-level structure of hierarchical communities portrays the relativities between nodes in a more comprehensive way, which current approaches ignore. Moreover, the hierarchical community characteristic is prevalent in HIN. Fig. 1(a) shows a toy example of academic HIN (Three types of nodes: Author(A), Paper(P), Venue(V); Two types of relations: “write/written” and “publish/published”). Under semantics “PVP” and “PAP”, there exist homogeneous sub-networks and corresponding hierarchical communities. Specifically, papers tend to form different communities due to heterogeneity (Fig. 1(b1), Fig. 1(b2)). In addition to intra-community structures, inter-community interactions also hold vital information. After coarsening the communities into super-nodes, the inter-community interactions are reflected as connected edges between super-nodes. And in the next level, super-nodes likewise tend to gather together to form new communities. This level-by-level aggregation characteristic represents the hierarchical community structure that exists between nodes shown in Fig. 1(c1) and Fig. 1(c2). Therefore, the problem we are currently confronting is preserving hierarchical communities in the process of HIN representation.
We face two challenges to solve this problem: (1) How to capture hierarchical communities? HIN contains more complex interactions between different types of nodes. It is essential to consider the influence of varying semantics when excavating the potential communities of nodes. (2) How to integrate node information of multi-levels under various semantics? The different semantics and hierarchical communities preserve information about the HIN from multiple aspects and levels. Integrating such information effectively is a difficult task.
In this paper, we propose a heterogeneous information network representation model with community hierarchy preservation named HINChip, which makes the first attempt to preserve hierarchical communities under multiple semantics. First, the similarities based on meta-paths are used to construct multiple homogeneous sub-networks whose weights of edges come from the computed similarities. Second, on each homogeneous sub-network, we recursively utilize a community detection method [14], [15] to construct a hierarchical network, so that solving the first challenge. The nodes from each level learn a set of embeddings by the existing unsupervised graph representation method. Third, we propose a multi-level-integrated semantic attention mechanism to integrate nodes’ information from various semantics and multiple levels for solving the second challenge.
The contributions of our work are as follows:
- •
To the best of our knowledge, we are the first to study the hierarchical community when embedding HIN. And we construct hierarchical networks for capturing intra-community information and inter-community relationships in HIN.
- •
For dealing with heterogeneity while preserving hierarchical communities, we incorporate the representations of nodes in multi-levels under different semantics by proposing the multi-level-integrated semantic attention mechanism.
- •
Our method utilizes only topology for unsupervised training, and the experiment results on four datasets are even better than the methods considering attributes and supervised training methods with labels.
Section snippets
Related work
HIN representation based on meta-path. These approaches capture the structural information of HIN primarily through sampling based on meta-paths. Metapath2vec [16] utilizes meta-path guided random walks and heterogeneous skip-gram model to learn node embeddings. HHNE [17] introduces hyperbolic space to learn the similarity between nodes in random walk sequences based on meta-path. HERec [18] extracts nodes of the same type in meta-path guided random walk sequence to learn the heterogeneity in
Problem statement
This section introduces some definitions used in this paper and presents the task to be addressed.
Definition 1 Heterogeneous Information Network A heterogeneous information network (HIN) is defined as a network , in which and are the sets of nodes and edges, respectively. It is also associated with a node type mapping function and an edge type mapping function , where and denote the sets of node and edge type and and .
After that, different homogeneous sub-networks are constructed according
Hierarchical community preserving based HIN representation
We present our hierarchical community preserving based HIN representation method in this section. Fig. 2 shows the framework of the proposed HINChip. HINChip preserves uni-level information between directly linked nodes and multi-level information like hierarchical community, consisting of three main sections: (1) Constructing homogeneous sub-networks. (2) Preserving hierarchical community. (3) Multi-level-integrated semantic attention.
Experiments
This section describes five parts, datasets, baselines, node classification, node clustering, visualization, and ablation analysis of HINChip.
Conclusion
In this paper, we make the first attempt to study hierarchical communities in the process of HIN representation. Our HINChip captures both rich semantic information and high-order structural information that contains intra-community structure, even inter-community relationships. HINChip first constructs hierarchical networks under each semantics to preserve the hierarchical community structure. Then, integrating the different semantics and node information at each level through a
CRediT authorship contribution statement
Huanjing Zhao: Conceptualization, Methodology, Writing – original draft. Pinde Rui: Conceptualization, Methodology, Software. Jie Chen: Resources. Yanping Zhang: Project administration. Yi Wang: Supervision. Shu Zhao: Writing – review & editing, Project administration. Jie Tang: Writing – review & editing, Supervision.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was partially supported by the National Natural Science Foundation of China (Grants # 61876001, #61602003 and # 61673020), the National High Technology Research and Development Program, China (Grant #2017YFB1401903), the Provincial Natural Science Foundation of Anhui Province, China (Grants # 1708085QF156). We also acknowledge the High-performance Computing Platform of Anhui University for providing computing resources.
References (38)
- et al.
An incremental method to detect communities in dynamic evolving social networks
Knowl.-Based Syst.
(2019) - et al.
Pathsim: Meta path-based top-k similarity search in heterogeneous information networks
Proc. VLDB Endow.
(2011) - Y. Dong, Z. Hu, K. Wang, Y. Sun, J. Tang, Heterogeneous Network Representation Learning., in: 29th International Joint...
- et al.
Heterogeneous network representation learning: Survey, benchmark, evaluation, and beyond
(2020) - L. Wang, C. Gao, C. Huang, R. Liu, W. Ma, S. Vosoughi, Embedding heterogeneous networks into hyperbolic space without...
- P. Wang, K. Agarwal, C. Ham, S. Choudhury, C.K. Reddy, Self-Supervised Learning of Contextual Embeddings for Link...
- Y. Lu, C. Shi, L. Hu, Z. Liu, Relation structure-aware heterogeneous information network embedding, in: Proceedings of...
- et al.
Aspem: Embedding learning by aspects in heterogeneous information networks
Proceedings of the 2018 SIAM International Conference on Data Mining
(2018) - Y. Shi, Q. Zhu, F. Guo, C. Zhang, J. Han, Easing embedding learning by comprehensive transcription of heterogeneous...
- et al.
TransN: Heterogeneous network representation learning by translating node embeddings
Semantic proximity search on graphs with metagraph-based learning
Fast unfolding of communities in large networks
J. Stat. Mech. Theory Exp.
Heterogeneous information network embedding for recommendation
IEEE Trans. Knowl. Data Eng.
Cited by (2)
DINE: Dynamic Information Network Embedding for Social Recommendation
2023, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
- 1
These authors contributed equally to this work.