DeepEmLAN: Deep embedding learning for attributed networks

doi:10.1016/j.ins.2020.07.001

Information Sciences

Volume 543, 8 January 2021, Pages 382-397

https://doi.org/10.1016/j.ins.2020.07.001 Get rights and content

Abstract

Network embedding aims to learn the low-dimensional representations for the components in the network while maximally preserving the structure and inherent properties. Its efficiency has been proved in various real-world applications. However, most existing studies on attributed networks cannot explore both the multi-typed attributes and the semantic relationships flexibly. To address the above problem, we propose a deep model based embedding learning method for attributed networks, named DeepEmLAN. It can smoothly project different types of attributed information into the same semantic space through a deep attention model, while maintaining the topological structures. Furthermore, we design a heuristic combining strategy to generate the final embeddings, which makes the nodes sharing more neighbors, similar text-enriched or labeled attributes closer in the representational space. To demonstrate the potential of the proposed DeepEmLAN, we evaluate its performance on the challenging tasks of node classification and network reconstruction. The experimental results on several real datasets have shown that DeepEmLAN outperforms competitive state-of-the-art methods significantly.

Introduction

Network embedding, known as network representation learning (NRL), aims to represent the components (such as nodes, edges, subgraphs, etc.) with low-dimensional vectors, in which the topology and properties of the network are maximally preserved. Since low-dimensional vectors can be easily processed by various machine learning methods, network embedding has become a very hot research field and attracted tremendous attention. It has witnessed the success of the network embedding in various applications recently [36]. The DeepWalk [21], LINE [25] and Node2vec [7] have been commonly considered as powerful methods and applied in various applications. NetMF [23] is a recent study which proves theoretically that the existing models with negative sampling can be unified into the factorization of a closed-form matrix. However, directly constructing and factorizing such dense matrix is prohibitively expensive in terms of both time and space, making it not scalable for large networks. To address this problem, NetSMF [22] is proposed to efficiently sparsify the dense matrix, enabling significant improvement in embedding learning. Nevertheless, all the above methods are designed to handle the homogenous network with single-typed nodes and edges.

In fact, the real-world networks are much more complicated, containing not only multi-typed nodes and edges but also a rich set of attributes. Depending on the network topology and attributed property, we categorize the homogeneous network into two types: Simple network and Attributed network, shown in Table 1. It is mainly consistent with the recent work [5] which is designed for attributed heterogenous network. The difference is that we consider both the multi-typed attributes and the semantic relationships hidden in attributes, and design a deep embedding learning model for attributed homogeneous network. It is worth mentioning that the attributed network is one of the most important networks [16].

As can be seen from Table 1, we focus on embedding learning for attributed networks. The attributes in different networks may refer to different contents. For example, in social networks, the attributes may be users’ opinions or comments; while in citation networks, the attributes may be the topics or keywords of papers. Additionally, labels (such as groups or community categories) are discrete and can be intrinsically considered as a type of important attribute. Taking these information into account often helps us improve the performance in many complex network analysis tasks, such as identifying high influential users [24], community detection [11], [35] and link prediction [31], [6], etc. Due to its vital importance and challenges, there have been tremendous attempts in the literature to investigate embedding learning for attributed networks. TADW [33] was proposed to incorporate the text features of nodes into NRL process through matrix factorizing. But the model can only handle the text attributes. AANE [9] was designed to model and incorporate the attribute proximities of nodes into NRL process in a distributed way. However, the model was trained in an unsupervised mode, and it cannot utilize the discriminative labels in the network. LANE [10] was able to smoothly incorporate label information into the attributed network embedding while preserving the correlation between diverse details. But this method suffered low learning efficiency due to the massive matrix operations. Indeed, the above algorithms are essentially the linear model, which is insufficient to capture the nonlinear information relations in complex network. Thus, the deep learning theory was introduced into some recent embedding algorithms to preserve the complicated relationships in attributed network. SEANO [15] was designed as an inductive deep learning framework to learn the robust representations that preserved the information of topologies, attributes and labels, jointly. CANE [27] was developed based on the attention mechanism to learn the more discriminative representations for the nodes in an attributed network.

To the best of our knowledge, no work has been devoted to exploring both the multi-typed attributes and the semantic relationships in an efficient way. In this paper, we develop a unified deep model to capture both rich attributed information and topology structure. The training process of DeepEmLAN consists of three parts which focus on preserving the information of topologies, attributes, and labels, respectively. Moreover, the three components are closely connected and interacted with each other. The topological information is captured by modeling the first-order and second-order proximities between nodes; the text attributes are processed with attention mechanism to capture the different roles of nodes in the process of interacting with different neighbors; the labels of the nodes are predicted using the mutual attention vectors obtained in the second part through multi-level nonlinear mapping. Besides, the parameters are adjusted with different elements which makes the embeddings are more adaptive for the subsequent machine learning tasks. Finally, we present a heuristic method to combine the temporary vectors obtained in the training process to generate the final representational vector for each node. In order to evaluate the proposed model, some extensive experiments are conducted on the tasks of multi-label classification and network reconstruction. The experimental results indicate that DeepEmLAN achieves significantly better performance compared with the state-of-the-art embedding methods. The main contributions of this work are summarized as follows.

•
We propose a unified deep model to learn the representational vector for each node of the network by considering both multi-typed attributes and the semantic relationships, simultaneously. The model can preserve and balance the mutual influences resulted from different types of information.
•
We present a heuristic combining method to generate the final representation for each node. It can make the nodes sharing more neighbors and similar text or label-enriched attributes closer in the representational space.
•
We extensively evaluate the proposed DeepEmLAN on the tasks of multi-label classification and network reconstruction with several real-world datasets. The experimental results indicate that DeepEmLAN model outperforms the competitive baselines significantly.

The remainder of this paper is organized as follows. Section 2 briefly reviews the related work. The problem to be solved in this paper is formulated in Section 3. In Section 4, we present the unified deep model to capture both topological and rich attributed information. The algorithm and the complexity analysis are also given in this section. Section 5 proves the effectiveness of the DeepEmLAN with the experimental results and analysis. Finally, Section 6 concludes this study and discusses our future work.

Section snippets

Related work

Network representation Learning (NRL), or network embedding, has received tremendous attention recently due to its great significance. Numerous NRL algorithms have been proposed to learn the efficient representations for the components in the networks. A typical NRL method is to learn the representation by preserving the topological similarities between nodes in the network. The topological similarities include the first-order [1], the second-order [25] and the higher-order [7], [20], [21]

Problem formulation

In this section, we first define some notations used in this paper, as shown in Table 2. We then give some definitions and formulate the problem to be solved in this paper.

Definition 1 Homogeneous Network

A homogenous network [2] is a network with only one typed nodes and edges. It is commonly denoted as a graph $G = (V, E)$ , where V is a set of nodes, and E is a set of edges between the nodes.

Definition 2 Attributed Network

An attributed network [9] is defined as a graph $G = (V, E, A)$ , in which each node $v \in V$ is associated with one or several types of attributes,

The framework

In this paper, we aim to design a novel unified embedding model for the attributed network to preserve and balance these attributes efficiently. As is pointed out in [3], deep learning is beneficial for modeling the complicated nonlinear relations, and it has successfully applied in various applications or fields including network embedding. Inspired by this idea, we propose a deep learning based model to capture the nonlinear relationships between nodes’ attributes. Besides, nodes often show

Experiments

To evaluate the performance of the proposed DeepEmLAN, we carry out some experiments on the tasks of semi-supervised multi-label classification and network reconstruction with three real-world datasets. The experimental results fully demonstrate the effectiveness and efficiency of DeepEmLAN in balancing the interaction of various information. It also improves the discrimination of the representations. In order to prove the distinctive ability of the representations learned by DeepEmLAN, we also

Conclusion and future work

In this paper, we propose an embedding learning method for attributed networks. It can smoothly project different typed attributes and topological structures into the same semantic space through a deep attention model, while preserving these information maximally. Furthermore, we design a heuristic method to generate the final representations, which make the nodes sharing more neighbors, similar text-enriched and labeled attributes closer in the representational space. The experimental results

CRediT authorship contribution statement

Zhongying Zhao: Project administration, Methodology, Supervision, Writing - original draft, Writing - review & editing. Hui Zhou: Investigation, Visualization, Data curation, Writing - original draft. Chao Li: Project administration, Formal analysis, Validation, Writing - original draft, Writing - review & editing. Jie Tang: Writing - review & editing. Qingtian Zeng: Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This research is supported by the National Natural Science Foundation of China (Grant No. 61303167, 61702306, U1811264), the National Key R&D Plan (Grant No. 2018YFC0831002), the Taishan Scholar Program of Shandong Province (Grant No. ts20190936), the Natural Science Foundation of Shandong Province (Grant No. ZR2018BF013), the Innovative Research Foundation of Qingdao (Grant No. 18-2-2-41-jch), Open Project Foundation of Intelligent Information Processing Key Laboratory of Shanxi Province

References (37)

Z. Zhao et al.
An incremental method to detect communities in dynamic evolving social networks
Knowl.-Based Syst.
(2019)
H. Zhou et al.
Rank2vec: Learning node embeddings with local structure and global ranking
Expert Syst. Appl.
(2019)
A. Ahmed et al.
Distributed large-scale natural graph factorization
H. Cai et al.
A comprehensive survey of graph embedding: Problems, techniques, and applications
IEEE Trans. Knowl. Data Eng.
(2018)
S. Cao et al.
Deep neural networks for learning graph representations
S. Cavallari et al.
Learning community embedding with community detection and node embedding on graphs
Y. Cen et al.
Representation learning for attributed multiplex heterogeneous network
H. Chen et al.
PME: Projected metric embedding on heterogeneous networks for link prediction
A. Grover et al.
node2vec: Scalable feature learning for networks
W. Hamilton et al.
Inductive representation learning on large graphs

X. Huang et al.

Accelerated attributed network embedding

X. Huang et al.

Label informed attributed network embedding

D. Jin et al.

Detecting communities with multiplex semantics by distinguishing background, general and specialized topics

IEEE Trans. Knowl. Data Eng.

(2020)

D. Kingma et al.

Adam: A method for stochastic optimization

T.N. Kipf et al.

Semi-supervised classification with graph convolutional networks

M. Li et al.

Long-tail hashtag recommendation for micro-videos with graph convolutional network

J. Liang et al.

Semi-supervised embedding in attributed networks with outliers

L. Liao et al.

Attributed social network embedding

IEEE Trans. Knowl. Data Eng.

(2018)

Cited by (40)

Deep joint adversarial learning for anomaly detection on attribute networks
2024, Information Sciences
Attribute network anomaly detection has attracted growing interest in recent years, which aims to separate the points whose behavior is clearly different from others. The complex interactions between the network structure and node attributes result in difficulty in detecting anomalous nodes on attribute networks. To alleviate the above mentioned issue, in this paper, we design a deep joint adversarial learning representation framework (JAANE) for attribute network anomaly detection, by capturing the consistency and complementarity between network structure and node attributes. Specifically, JAANE utilizes a weight-sharing encoder to learn the attribute embedding and structure embedding in a shared latent space. Then, the feature fusion module fuses the learned attribute embedding and structure embedding into the fused node embedding to capture the consistency and complementarity between them. Finally, the fused node embedding is regularized via adversarial learning, and the anomaly nodes outside the regularized hypersphere space can be effectively detected. The experiment results on the real-world datasets indicate that the proposed JAANE performs better than other state-of-the-art, which demonstrates the effectiveness of the proposed method. The source code and data are released in https://haoyfan.github.io/.
Dual-channel embedding learning model for partially labeled attributed networks
2023, Pattern Recognition
Network embedding is an important fundamental work in many network application tasks, which encodes the input network from the high-dimensional and sparse topological space into a low-dimensional and dense vector space. Recently, there has been a growing interest in embedding learning on Partially Labeled Attributed Networks (PLANs) due to the increasing occurrence of node attributes and partially available category labels in real-world networks. Semi-supervised embedding learning is a standard approach employed in PLANs, utilizing category labels to supervise the learning process. However, the semi-supervised learning procedure can fail when labels are scarce, noisy, or unreliable. Additionally, most existing embedding algorithms have not successfully integrated heterogeneous information, such as labels, attributes, and structure. To address these issues, we develop a new model, the Dual-Channel Network Embedding (DcNE), which integrates different types of network information into embeddings from a mutual information (MI) perspective. Specifically, we construct a dual-channel information propagation framework to encode the input network in semi-supervised and self-supervised learning paradigms in parallel. Furthermore, a redundancy elimination module is implemented to capture and eliminate the redundant information between the two encoders. Finally, we propose a unified optimization model that integrates the two learning paradigms to collaborate effectively. In the experiments, we demonstrate the effectiveness of DcNE in various network analysis tasks using real-world datasets, establishing its superiority over state-of-the-art baselines.
Robust representation learning for heterogeneous attributed networks
2023, Information Sciences
The aim of heterogeneous attributed network embedding is mapping network into low-dimensional representations while preserving topological structure and attributed content. However, when the content similarity of two closely related nodes is small in the heterogeneous attributed network, we find that the embedding vectors obtained by combining network structure information and content information are bad. To tackle this problem, we propose a new robust representation learning model for Heterogeneous Attributed Networks Embedding with Dual-Space (HANE-DS), whose idea is to embed heterogeneous attributed network into dual spaces. Firstly, a heterogeneous attributed network is converted to a heterogeneous structure network and a heterogeneous content network. Secondly, the network nodes of the heterogeneous structure network are embedded into the structure space by using our proposed the structure embedding model, while the network nodes of the heterogeneous content network are embedded into the content space by applying our proposed the content embedding model. Finally, by utilizing the learning embedding vectors for downstream tasks, our approach can capture more comprehensive, more prosperous, and more reasonable information. Up to present, this is the first paper that focuses on the heterogeneous attributed network embedding based on Dual-Space. Three real-world networks and five synthetic networks to evaluate the HANE-DS model, and experimental results show that the HANE-DS can outperform baselines. For classification and clustering, the HANE-DS is superior to the baseline methods. For link prediction, the HANE-DS is in the top three of the baseline methods. For recommendation, the HANE-DS is also ahead of the baseline methods in terms of HR@K. On the synthetic networks, it is verified that the baseline models are sensitive to the content similarity of two closely related nodes. In comparison, our model is insensitive to that situation. In summary, the HANE-DS is a robust model for the heterogeneous attributed network embedding.
Graph label prediction based on local structure characteristics representation
2022, Pattern Recognition
Citation Excerpt :
The advantages of these algorithms are that they have higher learning and prediction abilities. In addition, compared with the network embedding representation methods of DeepEmLAN [12], DNE-SBP [13], and GraphAIR [14], the graph neural network has better node representation. However, it is difficult to solve practical problems in prediction tasks using GNN.
A recent study has shown that the real-time anti-noise challenges faced by molecular activity prediction algorithms can be solved by using the part structure features of the molecular graph. However, the sub-structures selected by this method are distributed in a scattered manner such that although they include as many block features as possible, they do not fully consider the connections between these blocks. Therefore, this study was conducted to fully consider the physical interpretation of the betweenness centrality node in the graph, and a sub-structure was obtained by depth-first search (DFS) from this node. This sub-structure not only contains the characteristics of each region but also retains the connections between each region. Then, a cascading multi-layer perception (MLP) model was designed to learn the characteristic representation of the graph from its local structure features. Experiments demonstrated that the performance of our algorithm is superior to that of other algorithms when evaluated on different datasets.
A novel link prediction algorithm based on inductive matrix completion
2022, Expert Systems with Applications
Citation Excerpt :
In the link prediction domain, dimension reduction can be used to map higher complex dimensional network space to lower dimensional space by preserving both the microscopic structure (pairwise node similarity) and mesoscopic topological structure. One common dimension reduction technique used for link prediction is network embedding, which aims to specify the components of a network with low-dimensional vectors, in which the structural topology information and properties of the network are maximally preserved (Zhao et al., 2021). Network embedding is widely used not only in link prediction, but also in other network analysis domains, this is because low-dimensional vectors captured by network embedding can be easily processed by various classical machine learning methods.
Link prediction refers to predicting the connection probability between two nodes in terms of existing observable network information, such as network structural topology and node properties. Although traditional similarity-based methods are simple and efficient, their generalization performance varies widely in different networks. In this paper, we propose a novel link prediction approach ICP based on inductive matrix completion, which recoveries node connection probability matrix by applying node features to a low-rank matrix. The approach first explores a comprehensive node feature representation by combining different structural topology information with node importance properties via feature construction and selection. The selected node features are then used as the input of a supervised learning task for solving the low-rank matrix. The node connection probability matrix is finally recovered by a bi-linear function, which predicts the connection probability between two nodes with their features and the low-rank matrix. In order to demonstrate the ICP superiority, we took eleven related efforts including two recent methods proposed in 2020 as baseline methods, and it is shown that ICP has stable performance and good universality in twelve different real networks. Compared with the baseline methods, the improvements of ICP in terms of the average AUC results are ranging from 3.81% $\sim$ 12.77% and its AUC performance is improved by 0.08% $\sim$ 3.54% compared with the best baseline method. The limitation of ICP lies in its high computational complexity due to the feature construction, but the complexity can be reduced by replacing complex features with node semantic attributes if there are additional data available. Moreover, it provides a potential link prediction solution for large-scale networks, since inductive matrix completion is a supervised learning task, in which the underlying low-rank matrix can be solved by representative nodes instead of all their nodes.
Deep cognitive diagnosis model for predicting students’ performance
2022, Future Generation Computer Systems
Citation Excerpt :
Most of the data mining methods utilize matrix factorization to infer students’ hidden properties, and design pruning strategies (i.e., strict local utility and strict sub-tree utility) to improve the efficiency [15]. In addition, more and more researchers have applied deep learning into the task of cognitive modeling [16,17]. The previous studies are of great importance.
Cognitive model is playing very important role in predicting students’ performance and recommending learning resources. Thus, it has received a great deal of attention from researchers. However, most of the existing work design models from the aspect of students, ignoring the internal relation between problems and skills. To address this problem, we propose a deep cognitive diagnosis framework to obtain students’ mastery of skills and problems by enhancing traditional cognitive diagnosis methods with deep learning. First, we model the skill proficiency of students according to their responses to objective and subjective problems. Second, students’ mastery on problems is modeled based on attention mechanism and neural network, considering both the importance and the interactions of skills. Finally, considering the facts that students may carelessly select or simply guess the answer, we predict students’ performance via the proposed model. Extensive experiments are carried out on two real-world data sets, and the results have proved the effectiveness and interpretability of this work.

View all citing articles on Scopus

View full text

DeepEmLAN: Deep embedding learning for attributed networks

Abstract

Introduction

Section snippets

Related work

Problem formulation

The framework

Experiments

Conclusion and future work

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgment

Knowl.-Based Syst.

Expert Syst. Appl.

Distributed large-scale natural graph factorization

A comprehensive survey of graph embedding: Problems, techniques, and applications

IEEE Trans. Knowl. Data Eng.

Deep neural networks for learning graph representations

Learning community embedding with community detection and node embedding on graphs

Representation learning for attributed multiplex heterogeneous network

PME: Projected metric embedding on heterogeneous networks for link prediction

node2vec: Scalable feature learning for networks

Inductive representation learning on large graphs

Accelerated attributed network embedding

Label informed attributed network embedding

Detecting communities with multiplex semantics by distinguishing background, general and specialized topics

IEEE Trans. Knowl. Data Eng.

Adam: A method for stochastic optimization

Semi-supervised classification with graph convolutional networks

Long-tail hashtag recommendation for micro-videos with graph convolutional network

Semi-supervised embedding in attributed networks with outliers

Attributed social network embedding

IEEE Trans. Knowl. Data Eng.