W-MetaPath2Vec: The topic-driven meta-path-based model for large-scaled content-based heterogeneous information network representation learning

doi:10.1016/j.eswa.2019.01.015

Expert Systems with Applications

Volume 123, 1 June 2019, Pages 328-344

https://doi.org/10.1016/j.eswa.2019.01.015 Get rights and content

Highlights

•
The efficiency improvement in network representation learning.
•
Combination of network structure and topic similarity evaluation.
•
High performance in network random walk mechanism.
•
Capabilities in large-scaled information network handling.

Abstract

Recently, heterogeneous network representation learning has attracted a lot of attentions due to its potential applications. Our works in this paper are concentrated on how to leverage the output of network representation learning by combining with the topic similarity between nodes in content-based heterogeneous information network (C single bond HIN). These unique challenges come from the shortage of topic similarity evaluation between text-based nodes which limit the accuracy of the similarity search as well other network mining tasks. Moreover, the massive sizes of current real-world network also raises challenges for traditional standalone-based heterogeneous network analysis models. Different from previous network representation learning models, such as: Node2Vec or Metapath2Vec, our proposed W-MethPath2Vec model uses the topic-driven meta-path-based random walk mechanism for generating heterogeneous neighborhood of nodes as the learning features. Then, these learning nodes’ features are used to train the learning model which is used for solving various heterogeneous network mining tasks such as: node similarity search, clustering, classification, link prediction, etc. The W-MethPath2Vec model enables the simultaneous modeling of structural and topic correlations between nodes in heterogeneous networks. Moreover, the W-MethPath2Vec model is implemented in the Apache Spark-based distributed framework which enables the capability of handling large-scaled networks. We tested our W-MethPath2Vec model with the previous state-of-the-art approaches in the real-world datasets to demonstrate the effectiveness of our proposed model.

Introduction

Most of the real-world information networks are heterogeneous, where the nodes and relations are of different types. In recent years, heterogeneous information network (HIN) analysis and mining have been thoroughly studied and applied in multiple disciplines. The common HINs, such as: “World Wide Web” (WWW), social networks (Facebook, Twitter, etc.) are naturally complex and very large in size (billions of nodes and links) (Sun & Han, 2012; Sun & Han, 2013; Shi et al., 2017). Similarity searching is one of the most important task in information network mining. It supports to explore the set of relevant nodes from networks. Measuring the similarity between nodes is also considered as the basis of many other data mining tasks, such as: clustering, classification, recommendation, etc. Meta-path is an important concept of most HIN mining techniques (Sun & Han, 2012). It is defined as a sequence of relations between node types which supports to distinguish the semantics among paths connecting two nodes in a network. There are several meta-path-based approaches for solving primitive tasks of HIN mining such as: similarity search (PathSim (Sun et al., 2011), HeteSim (Shi et al., 2014), etc.), ranking and clustering (RankClus (Sun et al., 2009a, Sun, Yu and Han, 2009b), NetClus (Sun, Yu & Han, 2009), etc.). These approaches have gained notable attentions in HIN mining. Recently, researchers have intensively focused on studies related to nodes and relationships representation learning for information networks. Many algorithms have been proposed, such as: Node2Vec (Grover & Leskovec, 2016), Metapath2Vec (Dong, Chawla, & Swami, 2017), etc. Information network embedding approach can be widely applied to resolve multiple HIN mining tasks, such as: node similarity search (Sun et al., 2011, Zhang et al., 2015), clustering, classification (Gupta, Kumar, & Bhasker, 2017), link prediction, etc. In short, the network embedding techniques support to transform the network nodes and edges into low-dimensional space of feature vectors. From these generated feature vectors of nodes and edges, we can easily process the similarity measure related tasks by using out-of-the-shelf distance measure algorithms (Euclid distance, cosine similarity, etc.) Moreover, network embedding approach is capable for working effectively on large-scaled heterogeneous networks with millions of nodes. Because it take only one time for constructing the learning model. The embedding network model also can be applied reinforcement learning for the future data changes which takes less time than re-learning the overall network.

However, there are several challenges of both embedded and non-embedded network similarity measurement such as thorough evaluations on the topic similarity of text-based nodes such as “paper” in bibliographic networks (DBLP, DBIS, etc.) or “comments”, “posts”, etc. in social networks (Facebook, Twitter, etc.). Discovering topics of nodes in content-based heterogeneous information network (content-based HINs) is considered as an important task. Topic evaluation over network's nodes is widely applied in multiple systems like as building news or friend recommendation systems based on users’ interactions on social networks. There are typical works of discovering users’ topics of interests via analyzing their associated nodes like as: “comments”, “viewed tweets”, etc. (Michelson and Macskassy, 2010, Xu et al., 2011). The fact is that thoroughly evaluating the topic of nodes in the network can leverage the output accuracy of the similarity search task. For example, in DBLP network, it is much more accurate for clustering “Jiawei Hans” with the other authors who work on “data mining”. Or recommending possible new co-authorships for “Christopher Manning” with authors who are interesting on “natural language processing” and “information retrieval”, etc. are much more meaningful than other authors who not mainly focus on these two fields. As illustrated in Fig. 1-A, a common case study in DBLP such as finding top-k similar authors with meta-path A-P-V-P-A. The meta-path A-P-V-P-A indicates the relationships of two authors who usually submit their works at the same set of venues/conferences. The assumption is that, a specific venue always has multiple tracks and each track covers different topic/sub-topic. So, it is unfair to rank “Author 1”, who mainly works on “data mining” field, has the same similarity score with “Author 3” and “Author 4”, who mainly work on “text processing/NLP” field. The “Author 1” should more similar to “Author 2”, who has the same interest with “Author 1” in “data mining”, than the left two authors.

Combining the topic attributes with nodes’ relationships in similarity measure can help to improve the quality of the outputs. Additionally, by combining with topic similarity in networked data mining, we can also tackle the problem of less-linked nodes problem. Most of both homogeneous-based and heterogeneous-based similarity measure models are all considered as link-based approaches. These link-based approaches are mostly relied on the links between nodes for analyzing the similarity. Or we can say that the more nodes are connected is the more they are relevant to each other. Mostly depending on relationships between nodes leads to the drawbacks of the failure in examining less-linked nodes but in fact they are very similarity in the other aspects which do not clearly present as the network relationships. For example, two scientists who both are interesting on the “database/data mining” but they are rarely submit their works at the same venues/journals, as illustrated in Fig. 1-B.

For most of previous network representation learning models like as: Node2Vec or Metapath2vec, work on the unweighted network which means all existed paths between two nodes are binary relations (1 for existing relation and 0 for otherwise). The walker needs to travel all the paths between two nodes in order to calculate the transitional probability (π). These computed transitional probabilities are used to rank the similarity level of destination nodes with the given source node. In some case, traveling through all existed paths between two nodes is not considered as efficiency way, especially with very large networks (Vahedian, Burke, & Mobasher, 2017). Between two nodes, there some relations are considered as important whereas the others are not. For example, such as in DBLP (as illustrated in Fig. 2), “Author 1” is known as the most active researcher who mostly contributes his works on “data mining” (3 papers) field, but sometime he also focuses on “big data” field (1 paper). It is obvious that the paths which connect “Author 1” to “Author 2” and “Author 3” are more important than the paths which connect to “Author 4” and “Author 5”. Even these paths are all weighted as 1 (binary relations) but in the semantic aspect, the relationships between “Author 1”, “Author 2” and “Author 3” are more stronger than the others.

Identifying which paths are more important than the others between two nodes is critical for reducing the efforts of transitional probabilities calculation. In order to evaluate the importance of paths, we need a mechanism to assign the weight for each path. Then, only paths satisfy the weight value threshold (σ) are selected for analyzing. In network representation learning, with fixed given walk length (l), for each node we only need to examine around |l| amount of most important paths to generate its set of neighborhoods. Examining all possible paths which connected two nodes in this case is not really necessary. Only important paths should be taken in consideration. By limiting the number of paths which are needed for evaluation, we can leverage the performance of overall node random walk processes.

Last but not least, most of the existed real-world information networks are very large in size with number of nodes can be up to billions, such as: Facebook, WWW, etc. Most of the traditional approaches of network analysis are designed to work on the standalone-based environment. It is definitely hard or impossible handle the big networked data resources like as: Facebook, Twitter, etc. with a single computer. The massive sizes of these networks beyond the capabilities of current heterogeneous network mining approaches. Therefore, we need to find a new solution for dealing with the challenge of large-scaled networks. One of the most common approach for big networked data processing is the distributed computing framework, like as: Apache Hadoop, Spark. Apache Spark is considered as a best choice for massive data handling, due to its capabilities of graph-paralleled processing, such as: GraphX, GraphFrames, etc. The GraphFrames framework can effectively support for handling common graph analysis task such as: path finding, node traversal (BFS, DFS), etc. in the manner of large-scaled networks.

Our overall works in this paper are mainly focused on studies of heterogeneous network representation learning problems as well as introducing the novel approach of W-MethPath2Vec model. The W-MetaPath2Vec is a topic-driven model which aims to capture distinctive features of nodes in heterogeneous network following the predefined meta-path(s). The topic similarities are obtained by evaluating the text-based nodes which are associated with investigated nodes following defined meta-paths. For example, like as “paper” nodes between “author” nodes with meta-path(s): A-P-A (author-paper-author), A-P-V-P-A (author-paper-venue-paper-author), etc. or “comment” nodes between “user” nodes with meta-path(s): U-C-P-U (user-comment-post-comment-user), etc. This topic-driven meta-path similarity measure has been introduced in our previous works, called W-PathSim model (Pham et al., 2018). Fig. 3 shows the relationship of our previous studies (the W-PathSim model) with the current proposed model. Through experiments in real-world DBLP bibliographic networks, we have proved that our proposed W-PathSim outperforms the traditional PathSim model. The W-PathSim model leverage the meta-path-based similarity measurement by combining with the topic similarity between nodes in content-based HINs. From previous achievements, in this paper, we introduce the W-MetaPath2Vec model which is an extension our previous works for topic-driven heterogeneous network representation learning.

This extended topic-driven skip-gram model supports to guide the process of extracting nodes’ features. Then, these extracted features are used to train the learning model. Next, the W-MetaPath2Vec is implemented in Apache Spark-based GraphFrames distributed environment which enables to handle large-scaled heterogeneous networks. The ultimate goal of W-MethPath2Vec is to maximize process of node embedding via both link-based and topic-based evaluation in content-based heterogeneous networks. Our contributions in this paper can be summarized as five-folds, include:

•
First of all, we introduce the application of LDA topic model in discovering the topic distributions of content-based nodes over content-based network. Then, these topic distributions are used for the processes of evaluating topic similarity between nodes following defined meta-paths.
•
Secondly, we propose the topic-driven meta-path-based random walk mechanism which is used for generating neighborhoods of a specific node. These neighborhood nodes are used to train the network learning model. In our proposed random walk mechanism, the walker is restricted to travel to the other neighbor nodes through not only the defined meta-path(s) but also the level of topic similarity of their associated nodes. Evaluating the topic similarity while conducting the node walk makes W-MetaPath2Vec different from the previous approaches.
•
Thirdly, in our proposed W-MetaPath2Vec model the walker is guided to travel within the most important paths only. These paths are selected base on their weights of topic similarity. Only paths which their weight scores satisfy the (σ) threshold are chosen for calculating the transitional probability (π). By limiting the number of paths are needed for examining by defined (σ) threshold, the W-MetaPath2Vec model can help to leverage the time-consuming performance of node embedding processes but do not influence the output accuracy.
•
Next, we implement the W-MetaPath2Vec under the Apache Spark-based GraphFrames distributed graph computing framework in order to leverage the performance of proposed model in the context of large-scaled networks.
•
Finally, we demonstrate the experimental studies on our proposed W-MethPath2Vec model with other state-of-the-art algorithms, include: DeepWalk, Node2Vec, LINE and MetaPath2Vec on the real-world DBLP/DBIS datasets. The experimental results show that the W-MethPath2Vec model is efficient for improving the quality of heterogeneous network representation learning as well as scalable for large-scaled networks with millions of nodes.

The overall processes of our proposed W-Metapath2Vec model are illustrated in Fig. 4. From the given content-based HINs, the LDA topic model is applied to extract topic distributions from the text-based nodes such as: papers in DBLP networks. After that, the topic distributions between text-based nodes are used to support the process of calculating the transitional probability (π) between nodes following defined meta-path. This is called topic-driven meta-path random walk mechanism. Finally, we applied the heterogeneous skip-gram architecture of Metapath2Vec to train the model. The network's nodes which are embedded as n-dimensional vectors can be used to solve multiple network analysis tasks, such as: node similarity search, clustering, classification, link prediction, etc. The rest of our paper is organized in four main sections. In the second section, we discuss about the previous works and preliminaries. In the third section, we formally describe about the background concepts, methodology and implementation of our proposed W-MethPath2Vec model. In forth section, we demonstrate the experimental studies on W-MethPath2Vec model. In this section, we present detailed information about datasets usage, testing scenarios, methods and evaluation metrics. We also give discussions about the output results in this section. The final section contains our conclusions about the W-MethPath2Vec approach and our future improvements. (Fig. 5, Fig. 6, Fig. 7)

Section snippets

Heterogeneous information network analysis

The natural principle of data is interconnected which called information networks. Interactions between data node are critical paradigm of modern information infrastructure and mining (Sun & Han, 2012). Heterogeneous information networks are becoming prevalent and widely applied in several real-world applications.

From the past most of the information network mining techniques are considered homogeneous-based approach. In homogeneous network all nodes and links are considered as a same type.

Methodology and implementation

In this section, we introduce three main approaches of our W-MetaPath2Vec model which includes:

•
The approach of applying LDA model in discovering topic distributions from the given content-based heterogeneous information networks.
•
Next, we present the mechanism of topic-driven meta-path-based random walk which is used to extract neighborhood nodes from a given source node. These extracted neighborhoods play as learning features which are used to feed the network learning model.
•
Finally, we

Experiment and discussions

In this section, we conduct thorough empirical studies in order to demonstrate the effectiveness of W-MethPath2Vec model. The section is divided into two main parts, include:

•
In the first part, we evaluate the accuracy of W-MetaPath2Vec model with previous network embedding models by solving network analysis tasks include: node similarity searching, clustering and classification.
•
In the second part, we perform the experiment on the scalability of W-MetaPath2Vec with Metapath2Vec model in the

Conclusion and future works

In this paper, we formally present studies related to problems of heterogeneous information network representation learning. There are remained challenges which are related to thorough evaluations on topic of text-based nodes in content-based HIN. Moreover, we are in the era of big data, it is necessary to develop network analysis model which is capable for handling large-scaled networks. To address these challenges, our works in this papers are focused on developing the W-metapath2Vec model.

Acknowledgement

This research is funded by Vietnam National University Ho Chi Minh City (VNU-HCMC) under the grant number B2017-26-02.

References (35)

M. Gupta et al.
HeteClass: A Meta-path based framework for transductive classification of objects in heterogeneous information networks
Expert Systems with Applications
(2017)
Y. Sun et al.
Mining heterogeneous information networks: A structural analysis approach
M Zhang et al.
Top-k similarity search in heterogeneous information networks with x-star network schema
Expert Systems with Applications
(2015)
Y. Bengio et al.
Representation learning: A review and new perspectives
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2013)
D.M. Blei
Probabilistic topic models
Communications of the ACM
(2012)
D.M. Blei et al.
Latent dirichlet allocation
Journal of Machine Learning Research
(2003)
H. Chen et al.
PME: Projected metric embedding on heterogeneous networks for link prediction
X. Chen et al.
InfoGan: Interpretable representation learning by information maximizing generative adversarial nets
Advances In Neural Information Processing Systems
(2016)

S. Deerwester et al.

Indexing by latent semantic analysis

Journal of the American Society for Information Science

(1990)

Y. Dong et al.

metapath2vec: Scalable representation learning for heterogeneous networks

A. Grover et al.

node2vec: Scalable feature learning for networks

T. Hofmann

Probabilistic latent semantic analysis

K. Järvelin et al.

Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)

(2002)

G. Jeh et al.

SimRank: A measure of structural-context similarity

Cited by (21)

Representation learning using Attention Network and CNN for Heterogeneous networks
2021, Expert Systems with Applications
Citation Excerpt :
Therefore we need to propose the network embedding methods for HINs which would preserve as much semantic and structural information as possible during representation learning for HINs. Most of the existing works (Dong, Chawla, & Swami, 2017; Fu, Lee, & Lei, 2017; Li & Tang, 2019; Pham & Do, 2019; Shang et al., 2016; Wang et al., 2019; Wang, Zhang, & Shi, 2019b; Zhang, Swami, & Chawla, 2019) for HIN embedding preserve the semantic information in HIN with the help of meta-paths (Sun, Han, Yan, Yu, & Wu, 2011). Since each meta-path captures the proximity among the nodes from a particular semantic perspective, so the network embedding methods based on meta-paths retains the semantic information (Yang, Xiao, Zhang, Sun, & Han, 2020).
Network embedding (NE), also known as network representation learning (NRL), is a method to learn a low-dimensional latent representation of nodes in an information network. The real-world data is usually presented in the form of heterogeneous information network (HIN) with multiple types of nodes and edges. Because of the rich information in HINs, it is necessary for a network embedding method to incorporate this information into the low-dimensional potential representation of the nodes as much as possible. In this paper, we propose a semi-supervised representation learning model using a graph attention network and a convolutional neural network (CNN) for HINs, called RANCH. In the part of the graph attention network, we construct a heterogeneous graph attention network using heterogeneous edges to preserve the features of nodes and the structure of network. In the part of the CNN, we leverage a 1D-CNN sentence classification model from natural language processing (NLP) community by adopting edge-constrained truncated random walks to generate node sequences, which can be treated as a corpus of words and sentences. The latter part further integrates the structural information of the network on the basis of the previous part and strengthens the influence of the node’s label information on the node representation. We have performed experiments of node classification on three real-world datasets, and the result shows that our model performs better than the state-of-the-arts.
Anomaly detection method of packet loss node location in heterogeneous hash networks
2021, Computer Communications
Citation Excerpt :
Node location is more important in heterogeneous hash networks. It is not only the premise to realize various functions of heterogeneous hash networks, but also the basis to provide target location information and detect events [1]. Applications such as geographic routing, target tracking and environment detection are all implemented on the basis of sensor location information [2].
When the current method is used to detect the location anomaly of packet loss nodes in heterogeneous hash networks, the detection takes a long time, and the detection results obtained have large errors, which have the problems of low detection efficiency and low accuracy. This paper presents an anomaly detection method for packet loss node location in heterogeneous hash networks, The node distribution model and heterogeneous hash network topology are constructed, It provides relevant information for location anomaly detection of packet loss nodes, The high-pass graph filter is used to process the network signal to obtain high-frequency components, divide heterogeneous hash networks, obtain specific frequency components corresponding to sub-graph output signals, judge sub-graph signals through thresholds, establish suspected abnormal node sets, and compare suspected abnormal node sets and sub-graph node sets to realize abnormal detection of packet loss node location. The experimental results show that the method has good positioning effect, high detection efficiency, and better detection effectiveness and stability than the comparative test method.
Dynamic network embedding via structural attention
2021, Expert Systems with Applications
Citation Excerpt :
Therefore, it is of great significance to find discriminating and manageable network embedding methods. Traditionally, each node in a network is described as a one-hot vector, meanwhile networks can be represented as adjacency matrices that are both high-dimensional and sparse, which can not facilitate to the mining and analysis of large-scale networks (Pham & Do, 2019). In recent years, network representation learning has emerged as an efficient way to tackle this challenging problem, which has achieved great success in social media, knowledge base and other fields.
Network embedding aims to learn low-dimensional vector representations for each node in a network, which facilitates various learning tasks such as node classification, link prediction and so on. The majority of existing embedding methods mainly focus on static networks. However, many real-world networks are dynamic and change over time. Although a small number of very recent literatures have been developed for dynamic network embedding, they either need to be retrained without closed-form expression, or suffer high-time complexity. Additionally, a large number of real-world networks may be both large and noisy, presenting great challenges to effective network representation learning. In this paper, we propose a novel method named Dynamic Network Embedding via Structural Attention (DNESA). Specifically, we incorporate the attention mechanism into network embedding, which facilitates our method mainly concentrating on task-related parts of the given graph while avoiding or ignoring noisy parts of the network. Furthermore, we can capture the evolving characteristic of dynamic networks and learn embedding vectors of each node at different time steps by modeling the process of developing an open triad into a closed triad under the attention mechanism. Meanwhile, we carefully design an optimization function for preserving both the first-order and second-order proximities. Empirical experiments conducted on six real-world networks illustrate the efficiency of the proposed method, which outperforms state-of-the-art network embedding methods in applications including link prediction and node classification.
Structural representation learning for network alignment with self-supervised anchor links
2021, Expert Systems with Applications
Citation Excerpt :
Moreover, the fail to deal with large-scale networks, as the matrix decomposition on the whole network is often polynomial (cubic) (Bayati et al., 2009). To make the solution scalable, new supervised alignment techniques (Man, Shen, Liu, Jin, & Cheng, 2016) leverage existing network embeddings (Perozzi, Al-Rfou, & Skiena, 2014; Grover & Leskovec, 2016; Hamilton, Ying, & Leskovec, 2017; Pham & Do, 2019) to compute the alignment function directly from the latent node features. However, they often rely on a large amount of labelled data for training the latent features, which requires heavy manual labor and nevertheless domain-specific only (Zhou et al., 2018).
Network alignment, the problem of identifying similar nodes across networks, is an emerging research topic due to its ubiquitous applications in many data domains such as social-network reconciliation and protein-network analysis. While traditional alignment methods struggle to scale to large graphs, the state-of-the-art representation-based methods often rely on pre-defined anchor links, which are unavailable or expensive to compute in many applications. In this paper, we propose NAWAL, a novel, end-to-end unsupervised embedding-based network alignment framework emphasizing on structural information. The model first embeds network nodes into a low-dimension space where the structural neighborhoodship on original network is captured by the distance on the space. As the space for the input networks are learnt independently, we further leverage a generative adversarial deep neural network to reconcile the spaces without relying on hand-crafted features or domain-specific supervision. The empirical results on three real-world datasets show that NAWAL significantly outperforms state-of-the-art baselines, by over 13% of accuracy against unsupervised methods and on par or better than supervised methods. Our technique also demonstrate the robustness against adversarial conditions, such as structural noises and graph size imbalance.
HIN_DRL: A random walk based dynamic network representation learning method for heterogeneous information networks
2020, Expert Systems with Applications
Citation Excerpt :
Dong et al. (Dong, Chawla & Swami, 2017) first proposed a meta-path based random walk method to generate node sequences, and designed a heterogeneous Skip-Gram model and a new heterogeneous negative sampling method for the node sequences with multiple node types, thus extending the original Skip-Gram model to heterogeneous information networks. Pham et al. (Phu Pham & Phuc Do, 2019) recently proposed a topic-driven meta-path based model, W-MetaPath2Vec, which enhances the representation learning of heterogeneous information networks by combining the topic similarity between nodes with semantic correlations. Besides, Chen et al. proposed WTL+IBL (Chen et al., 2017) to conduct NRL for e-commerce networks, where WTL generates node sequences via weighted random walk method, and IBL distinguishes different types of nodes by considering that different types of nodes carry different attributes.
Learning the low-dimensional vector representation of networks can effectively reduce the complexity of various network analysis tasks, such as link prediction, clustering and classification. However, most of the existing network representation learning (NRL) methods are aimed at homogeneous or static networks, while the real-world networks are usually heterogeneous and tend to change dynamically over time, therefore providing an intelligent insight into the evolution of heterogeneous networks is more practical and significant. Based on this consideration, we focus on the dynamic representation learning problem for heterogeneous information networks, and propose a random walk based Dynamic Representation Learning method for Heterogeneous Information Networks (HIN_DRL), which can learn the representation of network nodes at different timestamps. Specifically, we improve the first step of the existing random walk based NRL methods, which generally include two steps: constructing node sequences through random walk process, and then learning node representations by throwing the node sequences into a homogeneous or heterogeneous Skip-Gram model. In order to construct optimized node sequences for evolving heterogeneous networks, we propose a method for automatically extracting and extending meta-paths, and propose a new method for generating node sequences via dynamic random walk based on meta-path and timestamp information of networks. We also propose two strategies for adjusting the quantity and length of node sequences during each random walk process, which makes it more effective to construct the node sequences for heterogeneous information networks at a specific timestamp, thus improving the effect of dynamic representation learning. Extensive experimental results show that compared with the state-of-art algorithms, HIN_DRL achieves better results in Macro-F1, Micro-F1 and NMI for multi-label node classification, multi-class node classification and node clustering on several real-world network datasets. Furthermore, case studies of visualization and dynamic on Microsoft Academic dataset demonstrate that HIN_DRL can learn network representation dynamically and more effectively.
An effective representation learning model for link prediction in heterogeneous information networks
2023, Computing

View all citing articles on Scopus

View full text

W-MetaPath2Vec: The topic-driven meta-path-based model for large-scaled content-based heterogeneous information network representation learning

Highlights

Abstract

Introduction

Section snippets

Heterogeneous information network analysis

Methodology and implementation

Experiment and discussions

Conclusion and future works

Acknowledgement

Expert Systems with Applications

Expert Systems with Applications

Representation learning: A review and new perspectives

IEEE Transactions on Pattern Analysis and Machine Intelligence

Probabilistic topic models

Communications of the ACM

Latent dirichlet allocation

Journal of Machine Learning Research

PME: Projected metric embedding on heterogeneous networks for link prediction

InfoGan: Interpretable representation learning by information maximizing generative adversarial nets

Advances In Neural Information Processing Systems

Indexing by latent semantic analysis

Journal of the American Society for Information Science

metapath2vec: Scalable representation learning for heterogeneous networks

node2vec: Scalable feature learning for networks

Probabilistic latent semantic analysis

Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)

SimRank: A measure of structural-context similarity