Community detection in social networks using hybrid merging of sub-communities
Introduction
In recent years, community detection has been in the center of attention due to its wide use in data mining, information retrieval and social network analysis. Most of the complex networks usually have modular or community structure and appear as a combination of groups that are fairly independent of each other. Vertices of the same community usually share some common behaviors. For instance people of the same community usually have a set of common properties such as having similar hobbies, working on a research with the same topic and so on. Thus, finding communities enables us not only to extract useful information of complex networks but also to understand how different groups or communities in a network evolve.
The issue of community detection closely corresponds to the idea of graph partitioning in computer science and graph theory, and hierarchical clustering in sociology. Recently, the computer revolution has provided scholars with a huge amount of data and computational resources to process and analyze these data. The size of real networks one can potentially handle has also grown considerably, reaching millions or even billions of vertices. The need to deal with such a large number of units has produced a deep change in the way that graphs are approached (Fortunato et al., 2010).
Since moderate-to-large networks are becoming ubiquitous in our real world, current methods are not satisfactory from the time complexity point of view. In this paper, we present an effective algorithm for finding communities of the graph with a good time and space complexity and also with an acceptable quality of output which is comparable with the existing outputs of recent community detection algorithms. We follow a bottom up approach in which we start community detection by considering every vertex or two vertices as preliminary communities. Then based on a well known criterion which is called “modularity” (Newman and Girvan, 2004), we merge these preliminary communities.
Merging subcommunities must be repeated several times. Although merging all pairs of neighbor communities with highest increase in modularity (i.e. pairwise merging) is a good idea but it is too slow. Merging multiple communities together is more quick but it is less accurate. Therefore, we use both of them and call it “Hybrid” merging. We also use a vertex similarity measure to find small communities which we denote them as preliminary communities and then apply the modularity maximization strategy on these preliminary communities that will result in community detection with better modularity value. Merging is stopped when the maximum modularity achieved.
The structure of the paper is as follows: In the next section we present a review of the literature. In Section 4 we provide a detail discussion of our work which is followed by complexity analysis of the algorithm. Finally in Section 6 we present the result of our experiments.
Section snippets
Related works
The most well-known algorithm for community detection was proposed by Girvan and Newman (2002). This method is historically important due to the opening a new era in the field of community detection. This method uses a new similarity measure called edge betweenness. Edge betweenness is referred to the number of shortest paths between all vertex pairs that run along that edge. The algorithm has a complexity on a sparse graph. In the following we will refer to it as GN. In another work (
Evaluation criteria
Finding ideal algorithms of community detection aims at two main goals, i.e. improving the accuracy in the determination of meaningful modules and reducing the computational complexity of the algorithm. Reducing the computational complexity is a well defined objective: in many cases (i.e. this work) it is possible to compute analytically the complexity of an algorithm, in others one can derive it from simulations of the algorithm on systems of different sizes. The main problem is then to
Our work
Our idea for community detection is generally based on finding small communities (i.e. sub-communities) and then merging them in order to obtain real communities of a graph. Like communities, subcommunities are vertices with dense relationship in which most or all of their neighbors are in common.
In this approach, for each subcommunity ci we try to find a neighbor subcommunity cj so that merging them will result in increasing the modularity value. If there exist several such neighbor
Complexity analysis
As we know , so , and . Our proposed algorithm has three parts: weighting algorithm with time complexity and space complexity O(m), preliminary community detection with time complexity and space complexity O(m) and finally merging stage with time complexity . The total time complexity of the proposed community detection isand the total space complexity of the purposed algorithm is
In the weighting algorithm, we
Experimental results
Our proposed algorithm is implemented in C# and since the implementation of the other algorithms are platform dependent, we assess the performance of the algorithm by analytically computing the complexity as it is seen in the previous section.
To assess the accuracy, we conducted our experiment on three different set of real and artificial networks (http://www.cc.gatech.edu/dimacs10/archive/clustering.shtml,, Yan and Gregory, 2012, Lancichinetti et al., 2008). As it is mentioned in Section 3,
Conclusion
We proposed a modularity maximization algorithm for community detection with time complexity . The algorithm utilized a vertex similarity measure to find small preliminary communities to be used as a start point in merging stage. As we compared our algorithm with some of well-known algorithms on several real benchmark graphs, our algorithm showed better performance. For some real networks, while the proposed algorithm has lower time complexity, the performance was comparable with
References (26)
Community detection in graphs
Physics Reports
(2010)- Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. Journal of...
- et al.
Finding community structure in very large networks
Physical Review E
(2004) - Danon L, Duch J, Diaz-Guilera A, Arenas A. Comparing community structure identification. Journal of Statistical...
- Donetti L, Munoz MA. Detecting network communities: a new systematic and efficient algorithm. Journal of Statistical...
- et al.
Community identification using extremal optimization
Physical Review E
(2005) - et al.
Resolution limit in community detection
Proceedings of the National Academy of Sciences USA
(2007) - et al.
Community structure in social and biological networks
Proceedings of the National Academy of Sciences USA
(2002) - et al.
The performance of modularity maximization in practical contexts
Physical Review E
(2010) - et al.
Mesoscopic analysis of networksapplications to exploratory analysis and data clustering
Chaos
(2011)
Finding overlapping communities in networks by label propagation
New Journal of Physics
Limits of modularity maximization in community detection
Physical Review E
Cited by (36)
Community detection in complex network based on an improved random algorithm using local and global network information
2022, Journal of Network and Computer ApplicationsCitation Excerpt :In the non-overlapped category, each network node can be included in only one community. In comparison, a network node may be included in more than one community in the overlapped category (Liu et al., 2016; Arab and Afsharchi, 2014). In this paper, we consider non-overlapped methods because overlapped methods are generally related to social networks, while in this paper, we focus on complex networks (Coscia et al., 2011).
A novel relevance-based information interaction model for community detection in complex networks
2022, Expert Systems with ApplicationsInfluence propagation: Interest groups and node ranking models
2020, Physica A: Statistical Mechanics and its ApplicationsCommunity detection in complex networks using structural similarity
2018, Physica A: Statistical Mechanics and its ApplicationsCitation Excerpt :In the following, we summarize some related proposals in community detection. In [4], a community detection approach was proposed that uses hybrid merging of sub-communities method. In this approach, all nodes are assumed unlabeled and each edge in the network graph is assigned by a weight.
Overlapping communities detection based on spectral analysis of line graphs
2018, Physica A: Statistical Mechanics and its ApplicationsCitation Excerpt :The community structure is one of prominent features of networks [1] with a topology structure of “external loose and inner tight” [2].
DDSE: A novel evolutionary algorithm based on degree-descending search strategy for influence maximization in social networks
2018, Journal of Network and Computer ApplicationsCitation Excerpt :Experimental results show that NewGreedy significantly outperforms CELF. Wang et al. (2010) improved the efficiency by exploiting the community property (Arab et al., 2014) of social networks and proposed a community-based algorithm named CGA, which is applicable to both IC model and LT model. Moreover, Goyal et al. (2011) proposed CELF++, and demonstrated it is 35–55% faster than CELF.