Joint network embedding of network structure and node attributes via deep autoencoder
Introduction
In reality, many complex systems take the form of networks, such as social networks [1], academic citation networks [2], and biological networks [3]. With the rapid growth of network data, it brings many challenges to deal with these large-scale network data. Recently, network embedding has aroused many research interests, which aims at learning a low-dimensional vector for each node to extract intrinsic features of the network. Then the learned embedding representation paves the way for various network tasks, such as node classification [4] [5], community detection [6], node clustering [7] [8], and link prediction [9]. Therefore, network representation learning plays an essential role in network analysis, and an appropriate and informative embedding of the network can derive effective analysis of the network.
For network embedding, some methods have been put forward (e.g., DeepWalk [10], LINE [11], node2vec [12]), which primarily focused on preserving the plain structure information to learn the network embedding. However, networks in reality are always sparse and the number of observable links is limited, thus merely utilizing the structure information is possible leading to suboptimal representation. In addition to the network structure, most of the information networks in our daily life have rich attribute information within each node, such as academic topics in citation networks, or users’ education background in social networks. The topological structure and attributes are highly associated according to the sociological theories like the principle of homophily [13], which claims that node attributes are highly correlated with network structure. For instance, in academic networks, there is a high correlation between the paper topics and the citations of papers, they influence and determine each other. In paper [14] [15], it has been validated that utilizing structure and node attributes simultaneously can achieve better embedding performance. Therefore, node attributes can be employed to learn more informative network embedding than using network structure information alone. Besides, incorporating more types of information can filter out the noisy information, accurately characterize the network and finally derive desirable network representation.
Although many network embedding methods have been proposed and achieved reasonable results, we still face three great challenges: First, the existing methods treated topological structure and node attributes as linear, as well as assumed that the correlations between them are also linear. However, it has been demonstrated that the underlying structure of the network and attribute information is typically better to be considered as highly non-linear [16] [17]. In addition, the intrinsic interactions between two types of information are also highly non-linear [18]. Second, how to effectively capture the complex correlations between the topological structure and node attribute information is difficult. The previous works simply combined the individual structure and node attributes using a shallow model without fully considering the complex interactions between them. Third, how to effectively integrate the different types of information to learn the desired network embedding remains to be addressed. The integration should follow the essential principle that nodes with large proximity of structure and attributes in the original network should be embedded closely to each other in the latent space. As mentioned above, an effective method is expected to simultaneously capture and preserve the non-linear topological structure and node attributes, as well as the intrinsic correlations between them.
In this paper, we propose an embedding framework fusing topological structure and node attributes via a deep autoencoder named FSADA. Comparing with the existing methods, FSADA tackles the above three main challenges in a unified framework. The main novelty of our proposed framework is that it not only considers the high non-linear topological structure but also preserves the significant node attribute information. Moreover, unlike previous algorithms that simply combine the structure and node attribute information, our method fully considering the complex interactions between two types of information and utilize the pre-processing procedure to capture the deep and complex relationship between them. Furthermore, we design an enhancement scheme to improve the effectiveness and efficiency of the network embedding.
In detail, we develop a deep autoencoder, which has several layers with non-linear functions to preserve the non-linear topological structure and node attributes simultaneously. We define the loss function based on the structural second-order proximity and node attribute proximity, and then we can obtain the embedding by minimizing the final loss function. In particular, although the network structure and node attributes are multi-views of the network, they are from different sources. Therefore, we adopt a pre-processing procedure to pre-process the two types of information and obtain the dense and non-redundant higher-order features. Then, the higher-order features are horizontally concatenated as the input to the autoencoder, which can help to excavate the complex correlations between the structure and node attribute information. Furthermore, to further enhance the performance and derive more effective and efficient network embedding, we design an enhancement module. In the enhancement module, we construct pairwise constraints on nodes and introduce a graph regularization into the objective function to reinforce the latent embedding. In summary, our work makes the following contributions:
(1) We develop a multi-layer autoencoder that consists of multiple non-linear functions to capture and preserve the high non-linear topological structure and node attribute information.
(2) The pre-processing procedure is adopted to pre-process the original topological structure and node attributes, and then the obtained higher-order features are fed into the autoencoder for further learning, which facilitates to excavate the complex interactions between them.
(3) To further improve the performance of embedding, we design an enhancement module. We implement it by combining structure and attribute similarity to construct pairwise constraints, and introduce a graph regularization into the framework to enhance the node representation in the latent space.
(4) We conduct extensive experiments on real-world networks in network reconstruction and node classification tasks to evaluate the effectiveness of the model. Experimental results demonstrate that our model can learn informative and high-quality embedding and significantly outperforms the state-of-the-art methods in two network analysis tasks.
Section snippets
Structure preserving network embedding
Recently, network embedding has attracted wide attention and a variety of network embedding methods have been proposed. For instance, DeepWalk [10] was the pioneer network embedding method, which exploited the skip-gram model and a stream of short random walks to learn the node representation. node2vec [12] was an extension of DeepWalk, which combined the two kinds of search style to explore and capture the high-order proximity in network. M-NMF [19] incorporated the community structure and
Notations and problem definition
Throughout the paper, we use uppercase alphabets to denote matrices and lowercase alphabets to denote scalars. The row of matrix A is represented by and the element of the matrix is denoted by . We also use to represent the trace of the matrix, and to represent the Frobenius norm of the matrix A.
We assume that a network is given as an undirected graph , where V denotes the set of n nodes, E denotes the set of edges. The topological structure of graph G can
Overall framework
The overall architecture of the framework consists of three main modules is illustrated in Fig. 2. In the first module, we represent the network structure and node attributes in the form of graph. In the second module, we propose a deep autoencoder with pre-processing producer to capture and preserve the non-linear topological structure, node attributes and the complex correlations between them. In the third module, we design an enhancement to constrain nodes which have similar structure and
Experiments
We perform extensive experiments to validate the effectiveness of FSADA algorithm, and answer the following questions in the subsequent experiments:
RQ1:How about the performance of FSADA in the network reconstruction task compared with the baseline methods?
RQ2:How effective of FSADA in the node classification task compared with the state-of-the-art methods?
RQ3: How about the impact of the hyperparameters, pre-processing procedure, and enhancement scheme of FSADA?
Conclusion
In this paper, we develop a deep network embedding framework FSADA. The framework can tackle the data sparsity, non-linear structure and attribute proximity preserving, and complex correlations capturing in a unified framework. More specifically, we propose a deep autoencoder with the pre-processing procedure, which can capture the non-linear network structure, node attributes and the complex correlations between them. It jointly optimizes the reconstruction loss of the second-order proximity
CRediT authorship contribution statement
Yu Pan: Conceptualization, Methodology, Writing - original draft. Junhua Zou: Software, Validation. Junyang Qiu: Visualization, Investigation. Shuaihui Wang: Data curation. Guyu Hu: Writing - review & editing, Formal analysis. Zhisong Pan: Funding acquisition, Project administration.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
This work was funded by the National Nature Science Foundation of China (No. 62076251) and The National Key Research Development Program of China (No. 2017YFB0802800).
Yu Pan received the M.S. in Computer application technology from Northeastern University. She is pursuing her Ph.D. degree in School of Computer Science and Technology, Army Engineering University, Nanjing. Her main research interests include data processing and mining in social networks and machine learning.
References (44)
- et al.
Label-dependent node classification in the network
Neurocomputing
(2012) - et al.
Clustering via adaptive and locality-constrained graph learning and unsupervised ELM
Neurocomputing
(2020) - et al.
HNS: hierarchical negative sampling for network representation learning
Inf. Sci.
(2021) - et al.
Deep heterogeneous network embedding based on siamese neural networks
Neurocomputing
(2020) - et al.
A scalable attribute-aware network embedding system
Neurocomputing
(2019) - et al.
Social Network Analysis: Methods and Applications
(1994) The structure of scientific collaboration networks
Proceedings of the National Academy of Sciences
(2001)- et al.
A survey of visualization tools for biological network analysis
BioData Mining
(2008) - J. Tang, C.C. Aggarwal, H. Liu, Node classification in signed social networks, in: Proceedings of the 2016 SIAM...
- Y. Li, C. Sha, X. Huang, Y. Zhang, Community detection in attributed graphs: An embedding approach, in: Proceedings of...
Unsupervised feature selection based extreme learning machine for clustering
Neurocomputing
Birds of a feather: Homophily in social networks
Annual Review of Sociology
Cited by (12)
Additive autoencoder for dimension estimation
2023, NeurocomputingGeometry interaction network alignment
2022, NeurocomputingCitation Excerpt :Network Embedding. As the network embedding technology that could capture the latent characteristics of a network effectively [25–28], it has the potential to improve network alignment performance. Most current network representations are performed in Euclidean space, as mentioned in Section 1, Euclidean space can not fit well with the common hierarchical structure in data.
Single-particle optimization for network embedding preserving both local and global information
2022, Swarm and Evolutionary ComputationCitation Excerpt :Though the methods proposed in [8–13] have made some progress, it is challenging for their established shallow models to capture the nonlinear information of networks. Therefore, some deep models, such as graph convolutional networks [14], generative adversarial networks [15], and deep autoencoders [16], have recently been incorporated into network embedding to save the locally nonlinear properties of nodes and have performed better than the shallow models on some specific network tasks. Apart from the local properties of nodes, the community structure, which is one of the most important global network features, has been also considered by some recent works to enhance the quality of network embedding.
Deep manifold matrix factorization autoencoder using global connectivity for link prediction
2023, Applied IntelligenceFunctional Module Detection Based on Deep Network Embedding of Edge Weighing Information in PPIN
2023, Jisuanji Gongcheng/Computer Engineering
Yu Pan received the M.S. in Computer application technology from Northeastern University. She is pursuing her Ph.D. degree in School of Computer Science and Technology, Army Engineering University, Nanjing. Her main research interests include data processing and mining in social networks and machine learning.
Junhua Zou is pursuing for the Ph.D. degree in Army Engineering University of PLA. He mainly applies computer vision, machine learning.
Junyang Qiu is currently an engineer of Jiangnan Institute of Computing Technology, Wuxi. He received the Ph.D. degree in School of Information Technology, Deakin University, Australia. His research interests include malware analysis and machine learning.
Shaihui Wang received the B.S. and M.S. degrees in Atmospheric Physics and Atmospheric Environment from College of Meteorology, PLA University of Science & Technology in 2006 and 2010, respectively. He is now a Ph.D. student at the Army Engineering University, Nanjing. His main research interests include data processing and mining in social networks and machine learning.
Guyu Hu received the B.S. degree in Radio Communication from Zhejiang University in 1983 (Hangzhou, China), M.S. degree in Computer Application Technology from Nanjing Institute of Communications in 1989 (Nanjing, China), and Ph.D. degree in Communications and Information Systems from Nanjing Institute of Communications in 1992. Since 1990, he devotes to the research on network management. From 1997, he has been a full professor in the PLA University of Science and Technology, China. From 1998, his research interests include intelligent of network management, mainly on failure-finding from data with pattern recognition, machine learning and neural networks.
Zhisong Pan is currently a professor of Army Engineering University, Nanjing, China. He received the Ph.D. degree in Computational Intelligence from PLA University of Science and Technology, Nanjing, China, in 2013. His current research includes computer vision and machine learning.