Elsevier

Neurocomputing

Volume 468, 11 January 2022, Pages 198-210
Neurocomputing

Joint network embedding of network structure and node attributes via deep autoencoder

https://doi.org/10.1016/j.neucom.2021.10.032Get rights and content

Abstract

Network embedding aims to learn a low-dimensional vector for each node in networks, which is effective in a variety of applications such as network reconstruction and community detection. However, the majority of the existing network embedding methods merely exploit the network structure and ignore the rich node attributes, which tend to generate sub-optimal network representation. To learn more desired network representation, diverse information of networks should be exploited. In this paper, we develop a novel deep autoencoder framework to fuse topological structure and node attributes named FSADA. We firstly design a multi-layer autoencoder which consists of multiple non-linear functions to capture and preserve the highly non-linear network structure and node attribute information. Particularly, we adopt a pre-processing procedure to pre-process the original information, which can better facilitate to extract the intrinsic correlations between topological structure and node attributes. In addition, we design an enhancement module that combines topology and node attribute similarity to construct pairwise constraints on nodes, and then a graph regularization is introduced into the framework to enhance the representation in the latent space. Our extensive experimental evaluations demonstrate the superior performance of the proposed method.

Introduction

In reality, many complex systems take the form of networks, such as social networks [1], academic citation networks [2], and biological networks [3]. With the rapid growth of network data, it brings many challenges to deal with these large-scale network data. Recently, network embedding has aroused many research interests, which aims at learning a low-dimensional vector for each node to extract intrinsic features of the network. Then the learned embedding representation paves the way for various network tasks, such as node classification [4] [5], community detection [6], node clustering [7] [8], and link prediction [9]. Therefore, network representation learning plays an essential role in network analysis, and an appropriate and informative embedding of the network can derive effective analysis of the network.

For network embedding, some methods have been put forward (e.g., DeepWalk [10], LINE [11], node2vec [12]), which primarily focused on preserving the plain structure information to learn the network embedding. However, networks in reality are always sparse and the number of observable links is limited, thus merely utilizing the structure information is possible leading to suboptimal representation. In addition to the network structure, most of the information networks in our daily life have rich attribute information within each node, such as academic topics in citation networks, or users’ education background in social networks. The topological structure and attributes are highly associated according to the sociological theories like the principle of homophily [13], which claims that node attributes are highly correlated with network structure. For instance, in academic networks, there is a high correlation between the paper topics and the citations of papers, they influence and determine each other. In paper [14] [15], it has been validated that utilizing structure and node attributes simultaneously can achieve better embedding performance. Therefore, node attributes can be employed to learn more informative network embedding than using network structure information alone. Besides, incorporating more types of information can filter out the noisy information, accurately characterize the network and finally derive desirable network representation.

Although many network embedding methods have been proposed and achieved reasonable results, we still face three great challenges: First, the existing methods treated topological structure and node attributes as linear, as well as assumed that the correlations between them are also linear. However, it has been demonstrated that the underlying structure of the network and attribute information is typically better to be considered as highly non-linear [16] [17]. In addition, the intrinsic interactions between two types of information are also highly non-linear [18]. Second, how to effectively capture the complex correlations between the topological structure and node attribute information is difficult. The previous works simply combined the individual structure and node attributes using a shallow model without fully considering the complex interactions between them. Third, how to effectively integrate the different types of information to learn the desired network embedding remains to be addressed. The integration should follow the essential principle that nodes with large proximity of structure and attributes in the original network should be embedded closely to each other in the latent space. As mentioned above, an effective method is expected to simultaneously capture and preserve the non-linear topological structure and node attributes, as well as the intrinsic correlations between them.

In this paper, we propose an embedding framework fusing topological structure and node attributes via a deep autoencoder named FSADA. Comparing with the existing methods, FSADA tackles the above three main challenges in a unified framework. The main novelty of our proposed framework is that it not only considers the high non-linear topological structure but also preserves the significant node attribute information. Moreover, unlike previous algorithms that simply combine the structure and node attribute information, our method fully considering the complex interactions between two types of information and utilize the pre-processing procedure to capture the deep and complex relationship between them. Furthermore, we design an enhancement scheme to improve the effectiveness and efficiency of the network embedding.

In detail, we develop a deep autoencoder, which has several layers with non-linear functions to preserve the non-linear topological structure and node attributes simultaneously. We define the loss function based on the structural second-order proximity and node attribute proximity, and then we can obtain the embedding by minimizing the final loss function. In particular, although the network structure and node attributes are multi-views of the network, they are from different sources. Therefore, we adopt a pre-processing procedure to pre-process the two types of information and obtain the dense and non-redundant higher-order features. Then, the higher-order features are horizontally concatenated as the input to the autoencoder, which can help to excavate the complex correlations between the structure and node attribute information. Furthermore, to further enhance the performance and derive more effective and efficient network embedding, we design an enhancement module. In the enhancement module, we construct pairwise constraints on nodes and introduce a graph regularization into the objective function to reinforce the latent embedding. In summary, our work makes the following contributions:

  • (1) We develop a multi-layer autoencoder that consists of multiple non-linear functions to capture and preserve the high non-linear topological structure and node attribute information.

  • (2) The pre-processing procedure is adopted to pre-process the original topological structure and node attributes, and then the obtained higher-order features are fed into the autoencoder for further learning, which facilitates to excavate the complex interactions between them.

  • (3) To further improve the performance of embedding, we design an enhancement module. We implement it by combining structure and attribute similarity to construct pairwise constraints, and introduce a graph regularization into the framework to enhance the node representation in the latent space.

  • (4) We conduct extensive experiments on real-world networks in network reconstruction and node classification tasks to evaluate the effectiveness of the model. Experimental results demonstrate that our model can learn informative and high-quality embedding and significantly outperforms the state-of-the-art methods in two network analysis tasks.

Section snippets

Structure preserving network embedding

Recently, network embedding has attracted wide attention and a variety of network embedding methods have been proposed. For instance, DeepWalk [10] was the pioneer network embedding method, which exploited the skip-gram model and a stream of short random walks to learn the node representation. node2vec [12] was an extension of DeepWalk, which combined the two kinds of search style to explore and capture the high-order proximity in network. M-NMF [19] incorporated the community structure and

Notations and problem definition

Throughout the paper, we use uppercase alphabets to denote matrices and lowercase alphabets to denote scalars. The i-th row of matrix A is represented by ai and the (i,j)th element of the matrix is denoted by aij. We also use Tr(A) to represent the trace of the matrix, and ||A||Fto represent the Frobenius norm of the matrix A.

We assume that a network is given as an undirected graph G(V,E,T), where V denotes the set of n nodes, E denotes the set of edges. The topological structure of graph G can

Overall framework

The overall architecture of the framework consists of three main modules is illustrated in Fig. 2. In the first module, we represent the network structure and node attributes in the form of graph. In the second module, we propose a deep autoencoder with pre-processing producer to capture and preserve the non-linear topological structure, node attributes and the complex correlations between them. In the third module, we design an enhancement to constrain nodes which have similar structure and

Experiments

We perform extensive experiments to validate the effectiveness of FSADA algorithm, and answer the following questions in the subsequent experiments:

  • RQ1:How about the performance of FSADA in the network reconstruction task compared with the baseline methods?

  • RQ2:How effective of FSADA in the node classification task compared with the state-of-the-art methods?

  • RQ3: How about the impact of the hyperparameters, pre-processing procedure, and enhancement scheme of FSADA?

Conclusion

In this paper, we develop a deep network embedding framework FSADA. The framework can tackle the data sparsity, non-linear structure and attribute proximity preserving, and complex correlations capturing in a unified framework. More specifically, we propose a deep autoencoder with the pre-processing procedure, which can capture the non-linear network structure, node attributes and the complex correlations between them. It jointly optimizes the reconstruction loss of the second-order proximity

CRediT authorship contribution statement

Yu Pan: Conceptualization, Methodology, Writing - original draft. Junhua Zou: Software, Validation. Junyang Qiu: Visualization, Investigation. Shuaihui Wang: Data curation. Guyu Hu: Writing - review & editing, Formal analysis. Zhisong Pan: Funding acquisition, Project administration.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This work was funded by the National Nature Science Foundation of China (No. 62076251) and The National Key Research Development Program of China (No. 2017YFB0802800).

Yu Pan received the M.S. in Computer application technology from Northeastern University. She is pursuing her Ph.D. degree in School of Computer Science and Technology, Army Engineering University, Nanjing. Her main research interests include data processing and mining in social networks and machine learning.

References (44)

  • J. Chen et al.

    Unsupervised feature selection based extreme learning machine for clustering

    Neurocomputing

    (2020)
  • S. Gao, L. Denoyer, P. Gallinari, Temporal link prediction by integrating content and structure information, in:...
  • B. Perozzi, R. Al-Rfou, S. Skiena, Deepwalk: online learning of social representations, in: The 20th ACM SIGKDD...
  • J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, Q. Mei, LINE: large-scale information network embedding, in: Proceedings of...
  • A. Grover, J. Leskovec, node2vec: Scalable feature learning for networks, in: Proceedings of the 22nd ACM SIGKDD...
  • M. McPherson et al.

    Birds of a feather: Homophily in social networks

    Annual Review of Sociology

    (2001)
  • D. Zhang, J. Yin, X. Zhu, C. Zhang, Homophily, structure, and content augmented network representation learning, in:...
  • D. Yang, S. Wang, C. Li, X. Zhang, Z. Li, From properties to links: Deep network embedding on incomplete graphs, in:...
  • D. Luo, C.H.Q. Ding, F. Nie, H. Huang, Cauchy graph embedding, in: Proceedings of the 28th International Conference on...
  • Y. Liu, Z. Gu, Y. Cheung, K.A. Hua, Multi-view manifold learning for media interestingness prediction, in: Proceedings...
  • P. Cui, X. Wang, J. Pei, W. Zhu, A survey on network embedding, CoRR abs/1711.08752 (2017)....
  • X. Wang, P. Cui, J. Wang, J. Pei, W. Zhu, S. Yang, Community preserving network embedding, in: Proceedings of the...
  • Cited by (12)

    • Geometry interaction network alignment

      2022, Neurocomputing
      Citation Excerpt :

      Network Embedding. As the network embedding technology that could capture the latent characteristics of a network effectively [25–28], it has the potential to improve network alignment performance. Most current network representations are performed in Euclidean space, as mentioned in Section 1, Euclidean space can not fit well with the common hierarchical structure in data.

    • Single-particle optimization for network embedding preserving both local and global information

      2022, Swarm and Evolutionary Computation
      Citation Excerpt :

      Though the methods proposed in [8–13] have made some progress, it is challenging for their established shallow models to capture the nonlinear information of networks. Therefore, some deep models, such as graph convolutional networks [14], generative adversarial networks [15], and deep autoencoders [16], have recently been incorporated into network embedding to save the locally nonlinear properties of nodes and have performed better than the shallow models on some specific network tasks. Apart from the local properties of nodes, the community structure, which is one of the most important global network features, has been also considered by some recent works to enhance the quality of network embedding.

    View all citing articles on Scopus

    Yu Pan received the M.S. in Computer application technology from Northeastern University. She is pursuing her Ph.D. degree in School of Computer Science and Technology, Army Engineering University, Nanjing. Her main research interests include data processing and mining in social networks and machine learning.

    Junhua Zou is pursuing for the Ph.D. degree in Army Engineering University of PLA. He mainly applies computer vision, machine learning.

    Junyang Qiu is currently an engineer of Jiangnan Institute of Computing Technology, Wuxi. He received the Ph.D. degree in School of Information Technology, Deakin University, Australia. His research interests include malware analysis and machine learning.

    Shaihui Wang received the B.S. and M.S. degrees in Atmospheric Physics and Atmospheric Environment from College of Meteorology, PLA University of Science & Technology in 2006 and 2010, respectively. He is now a Ph.D. student at the Army Engineering University, Nanjing. His main research interests include data processing and mining in social networks and machine learning.

    Guyu Hu received the B.S. degree in Radio Communication from Zhejiang University in 1983 (Hangzhou, China), M.S. degree in Computer Application Technology from Nanjing Institute of Communications in 1989 (Nanjing, China), and Ph.D. degree in Communications and Information Systems from Nanjing Institute of Communications in 1992. Since 1990, he devotes to the research on network management. From 1997, he has been a full professor in the PLA University of Science and Technology, China. From 1998, his research interests include intelligent of network management, mainly on failure-finding from data with pattern recognition, machine learning and neural networks.

    Zhisong Pan is currently a professor of Army Engineering University, Nanjing, China. He received the Ph.D. degree in Computational Intelligence from PLA University of Science and Technology, Nanjing, China, in 2013. His current research includes computer vision and machine learning.

    View full text