Elsevier

Information Sciences

Volume 543, 8 January 2021, Pages 382-397
Information Sciences

DeepEmLAN: Deep embedding learning for attributed networks

https://doi.org/10.1016/j.ins.2020.07.001Get rights and content

Abstract

Network embedding aims to learn the low-dimensional representations for the components in the network while maximally preserving the structure and inherent properties. Its efficiency has been proved in various real-world applications. However, most existing studies on attributed networks cannot explore both the multi-typed attributes and the semantic relationships flexibly. To address the above problem, we propose a deep model based embedding learning method for attributed networks, named DeepEmLAN. It can smoothly project different types of attributed information into the same semantic space through a deep attention model, while maintaining the topological structures. Furthermore, we design a heuristic combining strategy to generate the final embeddings, which makes the nodes sharing more neighbors, similar text-enriched or labeled attributes closer in the representational space. To demonstrate the potential of the proposed DeepEmLAN, we evaluate its performance on the challenging tasks of node classification and network reconstruction. The experimental results on several real datasets have shown that DeepEmLAN outperforms competitive state-of-the-art methods significantly.

Introduction

Network embedding, known as network representation learning (NRL), aims to represent the components (such as nodes, edges, subgraphs, etc.) with low-dimensional vectors, in which the topology and properties of the network are maximally preserved. Since low-dimensional vectors can be easily processed by various machine learning methods, network embedding has become a very hot research field and attracted tremendous attention. It has witnessed the success of the network embedding in various applications recently [36]. The DeepWalk [21], LINE [25] and Node2vec [7] have been commonly considered as powerful methods and applied in various applications. NetMF [23] is a recent study which proves theoretically that the existing models with negative sampling can be unified into the factorization of a closed-form matrix. However, directly constructing and factorizing such dense matrix is prohibitively expensive in terms of both time and space, making it not scalable for large networks. To address this problem, NetSMF [22] is proposed to efficiently sparsify the dense matrix, enabling significant improvement in embedding learning. Nevertheless, all the above methods are designed to handle the homogenous network with single-typed nodes and edges.

In fact, the real-world networks are much more complicated, containing not only multi-typed nodes and edges but also a rich set of attributes. Depending on the network topology and attributed property, we categorize the homogeneous network into two types: Simple network and Attributed network, shown in Table 1. It is mainly consistent with the recent work [5] which is designed for attributed heterogenous network. The difference is that we consider both the multi-typed attributes and the semantic relationships hidden in attributes, and design a deep embedding learning model for attributed homogeneous network. It is worth mentioning that the attributed network is one of the most important networks [16].

As can be seen from Table 1, we focus on embedding learning for attributed networks. The attributes in different networks may refer to different contents. For example, in social networks, the attributes may be users’ opinions or comments; while in citation networks, the attributes may be the topics or keywords of papers. Additionally, labels (such as groups or community categories) are discrete and can be intrinsically considered as a type of important attribute. Taking these information into account often helps us improve the performance in many complex network analysis tasks, such as identifying high influential users [24], community detection [11], [35] and link prediction [31], [6], etc. Due to its vital importance and challenges, there have been tremendous attempts in the literature to investigate embedding learning for attributed networks. TADW [33] was proposed to incorporate the text features of nodes into NRL process through matrix factorizing. But the model can only handle the text attributes. AANE [9] was designed to model and incorporate the attribute proximities of nodes into NRL process in a distributed way. However, the model was trained in an unsupervised mode, and it cannot utilize the discriminative labels in the network. LANE [10] was able to smoothly incorporate label information into the attributed network embedding while preserving the correlation between diverse details. But this method suffered low learning efficiency due to the massive matrix operations. Indeed, the above algorithms are essentially the linear model, which is insufficient to capture the nonlinear information relations in complex network. Thus, the deep learning theory was introduced into some recent embedding algorithms to preserve the complicated relationships in attributed network. SEANO [15] was designed as an inductive deep learning framework to learn the robust representations that preserved the information of topologies, attributes and labels, jointly. CANE [27] was developed based on the attention mechanism to learn the more discriminative representations for the nodes in an attributed network.

To the best of our knowledge, no work has been devoted to exploring both the multi-typed attributes and the semantic relationships in an efficient way. In this paper, we develop a unified deep model to capture both rich attributed information and topology structure. The training process of DeepEmLAN consists of three parts which focus on preserving the information of topologies, attributes, and labels, respectively. Moreover, the three components are closely connected and interacted with each other. The topological information is captured by modeling the first-order and second-order proximities between nodes; the text attributes are processed with attention mechanism to capture the different roles of nodes in the process of interacting with different neighbors; the labels of the nodes are predicted using the mutual attention vectors obtained in the second part through multi-level nonlinear mapping. Besides, the parameters are adjusted with different elements which makes the embeddings are more adaptive for the subsequent machine learning tasks. Finally, we present a heuristic method to combine the temporary vectors obtained in the training process to generate the final representational vector for each node. In order to evaluate the proposed model, some extensive experiments are conducted on the tasks of multi-label classification and network reconstruction. The experimental results indicate that DeepEmLAN achieves significantly better performance compared with the state-of-the-art embedding methods. The main contributions of this work are summarized as follows.

  • We propose a unified deep model to learn the representational vector for each node of the network by considering both multi-typed attributes and the semantic relationships, simultaneously. The model can preserve and balance the mutual influences resulted from different types of information.

  • We present a heuristic combining method to generate the final representation for each node. It can make the nodes sharing more neighbors and similar text or label-enriched attributes closer in the representational space.

  • We extensively evaluate the proposed DeepEmLAN on the tasks of multi-label classification and network reconstruction with several real-world datasets. The experimental results indicate that DeepEmLAN model outperforms the competitive baselines significantly.

The remainder of this paper is organized as follows. Section 2 briefly reviews the related work. The problem to be solved in this paper is formulated in Section 3. In Section 4, we present the unified deep model to capture both topological and rich attributed information. The algorithm and the complexity analysis are also given in this section. Section 5 proves the effectiveness of the DeepEmLAN with the experimental results and analysis. Finally, Section 6 concludes this study and discusses our future work.

Section snippets

Related work

Network representation Learning (NRL), or network embedding, has received tremendous attention recently due to its great significance. Numerous NRL algorithms have been proposed to learn the efficient representations for the components in the networks. A typical NRL method is to learn the representation by preserving the topological similarities between nodes in the network. The topological similarities include the first-order [1], the second-order [25] and the higher-order [7], [20], [21]

Problem formulation

In this section, we first define some notations used in this paper, as shown in Table 2. We then give some definitions and formulate the problem to be solved in this paper.

Definition 1 Homogeneous Network

A homogenous network [2] is a network with only one typed nodes and edges. It is commonly denoted as a graph G=(V,E), where V is a set of nodes, and E is a set of edges between the nodes.

Definition 2 Attributed Network

An attributed network [9] is defined as a graph G=(V,E,A), in which each node vV is associated with one or several types of attributes,

The framework

In this paper, we aim to design a novel unified embedding model for the attributed network to preserve and balance these attributes efficiently. As is pointed out in [3], deep learning is beneficial for modeling the complicated nonlinear relations, and it has successfully applied in various applications or fields including network embedding. Inspired by this idea, we propose a deep learning based model to capture the nonlinear relationships between nodes’ attributes. Besides, nodes often show

Experiments

To evaluate the performance of the proposed DeepEmLAN, we carry out some experiments on the tasks of semi-supervised multi-label classification and network reconstruction with three real-world datasets. The experimental results fully demonstrate the effectiveness and efficiency of DeepEmLAN in balancing the interaction of various information. It also improves the discrimination of the representations. In order to prove the distinctive ability of the representations learned by DeepEmLAN, we also

Conclusion and future work

In this paper, we propose an embedding learning method for attributed networks. It can smoothly project different typed attributes and topological structures into the same semantic space through a deep attention model, while preserving these information maximally. Furthermore, we design a heuristic method to generate the final representations, which make the nodes sharing more neighbors, similar text-enriched and labeled attributes closer in the representational space. The experimental results

CRediT authorship contribution statement

Zhongying Zhao: Project administration, Methodology, Supervision, Writing - original draft, Writing - review & editing. Hui Zhou: Investigation, Visualization, Data curation, Writing - original draft. Chao Li: Project administration, Formal analysis, Validation, Writing - original draft, Writing - review & editing. Jie Tang: Writing - review & editing. Qingtian Zeng: Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This research is supported by the National Natural Science Foundation of China (Grant No. 61303167, 61702306, U1811264), the National Key R&D Plan (Grant No. 2018YFC0831002), the Taishan Scholar Program of Shandong Province (Grant No. ts20190936), the Natural Science Foundation of Shandong Province (Grant No. ZR2018BF013), the Innovative Research Foundation of Qingdao (Grant No. 18-2-2-41-jch), Open Project Foundation of Intelligent Information Processing Key Laboratory of Shanxi Province

References (37)

  • Z. Zhao et al.

    An incremental method to detect communities in dynamic evolving social networks

    Knowl.-Based Syst.

    (2019)
  • H. Zhou et al.

    Rank2vec: Learning node embeddings with local structure and global ranking

    Expert Syst. Appl.

    (2019)
  • A. Ahmed et al.

    Distributed large-scale natural graph factorization

  • H. Cai et al.

    A comprehensive survey of graph embedding: Problems, techniques, and applications

    IEEE Trans. Knowl. Data Eng.

    (2018)
  • S. Cao et al.

    Deep neural networks for learning graph representations

  • S. Cavallari et al.

    Learning community embedding with community detection and node embedding on graphs

  • Y. Cen et al.

    Representation learning for attributed multiplex heterogeneous network

  • H. Chen et al.

    PME: Projected metric embedding on heterogeneous networks for link prediction

  • A. Grover et al.

    node2vec: Scalable feature learning for networks

  • W. Hamilton et al.

    Inductive representation learning on large graphs

  • X. Huang et al.

    Accelerated attributed network embedding

  • X. Huang et al.

    Label informed attributed network embedding

  • D. Jin et al.

    Detecting communities with multiplex semantics by distinguishing background, general and specialized topics

    IEEE Trans. Knowl. Data Eng.

    (2020)
  • D. Kingma et al.

    Adam: A method for stochastic optimization

  • T.N. Kipf et al.

    Semi-supervised classification with graph convolutional networks

  • M. Li et al.

    Long-tail hashtag recommendation for micro-videos with graph convolutional network

  • J. Liang et al.

    Semi-supervised embedding in attributed networks with outliers

  • L. Liao et al.

    Attributed social network embedding

    IEEE Trans. Knowl. Data Eng.

    (2018)
  • Cited by (40)

    • Graph label prediction based on local structure characteristics representation

      2022, Pattern Recognition
      Citation Excerpt :

      The advantages of these algorithms are that they have higher learning and prediction abilities. In addition, compared with the network embedding representation methods of DeepEmLAN [12], DNE-SBP [13], and GraphAIR [14], the graph neural network has better node representation. However, it is difficult to solve practical problems in prediction tasks using GNN.

    • A novel link prediction algorithm based on inductive matrix completion

      2022, Expert Systems with Applications
      Citation Excerpt :

      In the link prediction domain, dimension reduction can be used to map higher complex dimensional network space to lower dimensional space by preserving both the microscopic structure (pairwise node similarity) and mesoscopic topological structure. One common dimension reduction technique used for link prediction is network embedding, which aims to specify the components of a network with low-dimensional vectors, in which the structural topology information and properties of the network are maximally preserved (Zhao et al., 2021). Network embedding is widely used not only in link prediction, but also in other network analysis domains, this is because low-dimensional vectors captured by network embedding can be easily processed by various classical machine learning methods.

    • Deep cognitive diagnosis model for predicting students’ performance

      2022, Future Generation Computer Systems
      Citation Excerpt :

      Most of the data mining methods utilize matrix factorization to infer students’ hidden properties, and design pruning strategies (i.e., strict local utility and strict sub-tree utility) to improve the efficiency [15]. In addition, more and more researchers have applied deep learning into the task of cognitive modeling [16,17]. The previous studies are of great importance.

    View all citing articles on Scopus
    View full text