Detecting community in attributed networks by dynamically exploring node attributes and topological structure☆
Introduction
Network (sometimes called graph) is an efficient and effective tool to model and characterizes many complex systems from nature and society, where each node represents an entity and each edge denote the relation between a pair of nodes. For instance, in social networks, nodes correspond to individuals and edges to the relations among them [1], [2]. About cancer networks biological modules (genes, proteins, etc.), they are denoted by nodes and biological interactions among genes, such as protein–protein interactions, are described by edges [3], [4]. There are various complex networks, including social networks, gene regulation networks, transportation networks, ecological networks, scientist collaboration networks.
Interestingly, some critical underlying mechanisms of complex systems can be revealed by exploiting the topological structure of networks. And, networks analysis aims at extracting interesting graph patterns, which may shed light on revealing structure and function of underlying systems. For example, the hub nodes (nodes with large degree) are more likely to be pathogenic genes, which are key bio-markers for cancer diagnosis and therapy [5]. The centrality of nodes in scientist collaboration networks measures the reputation of researchers, which is useful to track the evolution of research topics and tendency [6]. In the vehicle-shareability network, given a collection of trips (specified by origin, destination and start time), networks analysis determines the minimum number of vehicles needed to serve all the trips without incurring any delay to the passengers [7].
Community is one of the typical graph patterns, which corresponds to a group of nodes that are well connected inside and weakly connected outside [8], [9]. The complexity of networks makes it impossible to analyze it as a whole, which can be inferred from communities in networks. Actually, great evidence demonstrates that community is ubiquitous and sheds light on exploring the structure-function relation of networks. For example, communities in social networks correspond to the groups of individual with similar backgrounds or hobbies [10], [11], [12]. Communities in gene regulation networks are very likely to be protein complexes or pathways, which execute specific functions, such as cells death and signal transduction [13]. To this end, it is promising to extract communities in networks.
Great efforts have been devoted to community detection [9], [14], [15], [16], [17], [18], which involves two issues, i.e. how to characterize community structure and how to extract communities from networks. The first issue aims to predefine an objective to quantify the connectivity of community and there are various quantitative functions for communities. For example, Newman et al. [9] presents the well-known modularity by comparing networks with random ones under the assumption that random networks are not expected to exhibit module structures. Normalized cut counts the number of edges connecting nodes belonging to various communities, ignoring connectivity within communities [14]. Modularity density [15] is defined as the normalized difference between the number of edges within and across communities, which can overcome the resolution limit of modularity at large extent [16]. On the second issue, many algorithms detect communities by optimizing the corresponding quantitative functions. To exploit latent structure of networks, the matrix decomposition based algorithms first obtain representation of vertices in latent spaces, and then identify community based on the extracted features. Typical algorithms include nonnegative matrix factorization (NMF) [19], [20] and spectral clustering [17], [21], [22]. More algorithms can be referred to Ref. [23].
These algorithms for community detection traditionally focus on networks without attributes that only provide a partial representation of the involved systems. Actually, attributed networks are more precise to describe and characterize the underlying complex systems. For example, in cancer networks, genes interact with each other via regulation and molecular binding [24]. However, great evidence demonstrates genes with the same or similar functions are more likely to co-workers for critical biology processes [25]. Moreover, social networks contains at least three different dimensions, where the structural dimension describes the interaction, compositional dimension comprises attributes of nodes, and affiliation dimension indicates group membership [26]. And, individuals in social networks with similar behaviors or habits are more likely to form communities [27]. Thus, it is desirable to detect community in attributed networks, which aims to detect groups of nodes sharing common characteristics considering both the structure and attributes [28], [29], [30].
However, it is highly non-trivial to detection community in attributed networks since both the connectivity and attributes are simultaneously taken into considerations, while community detection in networks without attributes only focuses on the connectivity of communities. Although it is difficult, many algorithms have been developed for community detection in attributed networks, which can be generally classified into three categories: weight modification, combination and probabilistic methods. The first class of methods transform the attributed network into a weighted graph, where weights on edges correspond to attribute similarity. Then, any graph clustering algorithms can be directly applied for the constructed networks. The difference of these algorithms lies on how to define attribute similarity of nodes. Neville et al. [31] make use of matching coefficient similarity metric to quantify the number of attribute values the nodes have in common, while Steinhaeuser et al. [32] extend matching coefficient for both discrete and continuous attributes. weight modification methods incorporate node attributes into structure of networks, while combination methods integrate structural information and attribute similarity. The most intuitive strategy is the weighted linear function [33] and its variations [34].
The probability methods take the observation of attributes and structure of networks to construct a model to fit the attributed networks. Zhou et al. [35] adopt the random walk to identify community in attributed networks based on the assumption that more attribute values two vertices are, the more paths via the common attribute nodes exit. Liu et al. [36] propose a generative model for community detection in attributed networks using both topic similarity and community closeness, while Xu et al. [37] transform community detection in attributed networks into a statistical inference problem. Li et al. [38] demonstrates that embedding of attributed networks improves the accuracy of algorithms and proposes an embedding based on approach (CDE) for community structures in attributed networks. Wang et al. [39] develops the semantic community identification (SCI) algorithm for attributed networks. Recently, great evidence demonstrates that simultaneously learning parameters and structures of networks can significantly improve the accuracy of algorithms [40]. There are also some interesting applications of community in attributed networks. For example, Cao et al. [41] present a novel dynamic game model for community detection in smart grid to promote sustainable prosumer management. Furthermore, they also formulate community detection in attributed networks as a dynamic cluster formation game, where each node’s feasible action set can be constrained by every cluster in a discrete-time dynamical system [42].
However, there are still many unsolved problems. For example, the attributes and topological structure are heterogeneous, which are difficult to fuse. Moreover, current algorithms simply combine attributes and structure of networks via a weighted linear function, failing to fully exploit the relation between them. To overcome these problems, we present the NMFjGO algorithm by joint nonnegative matrix factorization and graph embedding for community detection in attributed networks, which is shown in Fig. 1. NMFjGO consists of two major components: joint factorization of attribute and dynamic learning matrices as well as dynamic learning. To explore the implicit structure of attributed networks, we jointly factorize the dynamic matrix and the feature matrix using NMF. For further improving the performance of algorithm, we dynamically fuse the adjacency network and feature matrix. The greatest advantage of dynamic embedding is that both the topological and implicit structure are taken into account for community detection, which can escape the local minima by a large extent. In all, the contribution of this study can be summarized as
- –
We propose a novel strategy to integrate node attributes and topological structure of networks, which avoids the heterogeneity of various features by casting the community detection problem in attributed networks into multiple networks clustering problem. In this case, the graph clustering for homogeneous networks can be easily extended for community detection in attributed networks.
- –
We develop NMFjGO for community detection in attributed networks, where the attributes and topological structure are dynamically explored during the optimization procedure, which further improves the accuracy of algorithms.
- –
The experimental results on various attributed networks demonstrate that the proposed algorithm dramatically improves the accuracy of algorithms without increasing the time complexity.
The rest of the paper is organized as follows. Section 2 introduces the related work of others about our algorithm. Section 3 introduces notations and terminologies. The procedure and analysis of NMFjGO are depicted in Section 4. The experimental results on various attributed networks are handled in Section 5. Finally, we draw conclusions in Section 6.
Section snippets
Related work
Detecting community in attributed networks is highly non-trivial largely due to two reasons. First of all, there is no clear definition for community in attributed networks because it is difficult to simultaneously characterize both edge and attribute of community. The most intuitive strategy for community detection in attributed networks incorporates the node attributes into edges of networks, and then directly applies the non-attributed network community detection algorithms to the integrated
Preliminaries
Prior to giving the detailed description of the proposed algorithm, we first introduce notations and terminologies that are widely used in the forthcoming section.
Given an undirected network with node set ( is the number of node) and edge set , the weighted adjacent matrix is constructed, where element denotes the weight on edge . is symmetric since is undirected, i.e. . The degree of the th node is defined as the sum of weights
Algorithm
NMFjGO detects community in attributed networks, which consists of objective function, optimization and discovery of attribute community, as shown in Fig. 1. The procedures of the NMFjGO algorithm, parameter selection and complexity analysis are addressed in this section.
Experiments
To fully evaluate the performance of the NMFjGO algorithm, seven state-of-the-art algorithms are selected for a comparative comparison, including CDE [38], SCI [39], NMF [19], Spectral Clustering (SP) [17], EdMot [61], Random walk (RW) [62] and Subspace Clustering [63]. CDE and SCI are selected because they are two recent algorithms for community detection in attributed networks with an excellent performance. NMF is chosen largely due to the fact that the proposed algorithm is also based on
Conclusion
Network is powerful to describe and characterize the complex systems in nature and society. And, community detection is a hot topic in network analysis since the community structure sheds light on revealing the structure-function of the underlying complex systems. Although great efforts have been devoted to community detection in complex networks, vast majority of them solely focus on extracting communities in networks by ignoring node attributes. Actually, attributed networks are more precise
CRediT authorship contribution statement
Zhihao Huang: Data curation, Formal analysis, Investigation, Methodology, Validation, Visualization, Writing - original draft. Xiaoxiong Zhong: Conceptualization, Methodology, Visulalization, Writing - original draft. Qiang Wang: Conceptualization, Methodology, Visulalization, Writing - original draft. Maoguo Gong: Funding acquisition, Project administration, Resources, Writing - original draft, Write - review & editing. Xiaoke Ma: Funding acquisition, Project administration, Resources,
Acknowledgments
This work was supported by the NSFC, China (No. 61772394), Natural Science Basic Research Plan in Shaanxi Province of China (No. 2019JM-240), Scientific Research Foundation for the Returned Overseas Chinese Scholars of Shaanxi Province, China (No. 2018003) and Natural Science Basic Research Plan in Ningbo City, China (No. 2018A610048). The authors thank Dr. Chaofeng Sha, Dr. Wang Xiao, Dr. Peizhen Li and Dr. Xiao Huang for providing the datasets and sources codes.
References (66)
- et al.
Semi-supervised spectral algorithms for community detection in complex networks based on equivalence of clustering methods
Physica A
(2018) - et al.
Clustering based on random graph model embedding vertex features
Pattern Recognit. Lett.
(2010) - et al.
Network structure exploration in networks with node attributes
Physica A
(2016) - et al.
The structure and dynamics of multilayer networks
Phys. Rep.
(2014) - et al.
Nonnegative self-representation with a fixed rank constraint for subspace clustering
Inform. Sci.
(2020) - et al.
Identity and search in social networks
Science
(2002) - et al.
Social networks and cooperation in hunter-gatherers
Nature
(2012) - et al.
Modeling disease progression using dynamics of pathway connectivity
Bioinformatics
(2014) - et al.
Oncogenic signaling pathways in the cancer genome atlas
Cell
(2018) Integrated genomic and molecular characterization of cervical cancer
Nature
(2017)
Quantifying the evolution of individual scientific impact
Science
Addressing the minimum fleet problem in on-demand urban mobility
Nature
Random graph models of social networks
Proc. Natl. Acad. Sci. USA
Finding and evaluating community structure in networks
Phys. Rev. E
Evolutionary nonnegative matrix factorization algorithms for community detection in dynamic networks
IEEE Trans. Knowl. Data Eng.
Semi-supervised clustering algorithm for community structure detection in complex networks
Physica A
Community detection in multi-layer networks using joint nonnegative matrix factorization
IEEE Trans. Knowl. Data Eng.
Revealing pathway dynamics in heart diseases by analyzing multiple differential networks
PLoS Comput. Biol.
Normalized cuts and image segmentation
IEEE Trans. Pattern Anal. Mach. Intell.
Quantative function for community detection
Phys. Rev. E
Resolution limit in community detection
Proc. Natl. Acad. Sci. USA
An integrative framework for protein interaction and methylation data to discover epigenetic modules
IEEE/ACM Trans. Comput. Biol. Bioinform.
Learning the parts of objects by nonnegative matrix factorization
Nature
Algorithms for non-negative matrix factorization
Eigenspaces of networks reveal the overlapping and hierarchical community structure more precisely
J. Stat. Mech. Theory Exp.
On evolutionary spectral clustering
IEEE Trans. Knowl. Data Eng.
Community detection in networks: A user guide
Phys. Rep.
Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia
New Engl. J. Med.
Genomic characterization of metastatic breast cancers
Nature
Social network analysis: methods and applications
Finding community structure in mega-scale social networks
Structure and inference in annotated networks
Nature Commun.
Cited by (28)
A new community detection method for simplified networks by combining structure and attribute information
2024, Expert Systems with ApplicationsA novel nonnegative matrix factorization-based model for attributed graph clustering by incorporating complementary information
2024, Expert Systems with ApplicationsIntegrating heterogeneous structures and community semantics for unsupervised community detection in heterogeneous networks
2024, Expert Systems with ApplicationsJoint orthogonal symmetric non-negative matrix factorization for community detection in attribute network
2024, Knowledge-Based SystemsDynamic community detection including node attributes
2023, Expert Systems with Applications
- ☆
No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.knosys.2020.105760.