Community detection in social networks using hybrid merging of sub-communities

https://doi.org/10.1016/j.jnca.2013.08.008Get rights and content

Abstract

Network vertices are often divided into groups or communities with dense connections within communities and sparse connections between communities. Community detection has recently attracted considerable attention in the field of data mining and social network analysis. Existing community detection methods require too much space and are very time consuming for moderate-to-large networks. We propose a bottom up community detection method in which starting with fine-grained communities we find real communities of a network. Merging preliminary small communities is done in a hybrid way to maximize two quality functions: modularity and NMI. We show that our way of community detection is better or as effective as the other community detection algorithms while it has better time and space complexity.

Introduction

In recent years, community detection has been in the center of attention due to its wide use in data mining, information retrieval and social network analysis. Most of the complex networks usually have modular or community structure and appear as a combination of groups that are fairly independent of each other. Vertices of the same community usually share some common behaviors. For instance people of the same community usually have a set of common properties such as having similar hobbies, working on a research with the same topic and so on. Thus, finding communities enables us not only to extract useful information of complex networks but also to understand how different groups or communities in a network evolve.

The issue of community detection closely corresponds to the idea of graph partitioning in computer science and graph theory, and hierarchical clustering in sociology. Recently, the computer revolution has provided scholars with a huge amount of data and computational resources to process and analyze these data. The size of real networks one can potentially handle has also grown considerably, reaching millions or even billions of vertices. The need to deal with such a large number of units has produced a deep change in the way that graphs are approached (Fortunato et al., 2010).

Since moderate-to-large networks are becoming ubiquitous in our real world, current methods are not satisfactory from the time complexity point of view. In this paper, we present an effective algorithm for finding communities of the graph with a good time and space complexity and also with an acceptable quality of output which is comparable with the existing outputs of recent community detection algorithms. We follow a bottom up approach in which we start community detection by considering every vertex or two vertices as preliminary communities. Then based on a well known criterion which is called “modularity” (Newman and Girvan, 2004), we merge these preliminary communities.

Merging subcommunities must be repeated several times. Although merging all pairs of neighbor communities with highest increase in modularity (i.e. pairwise merging) is a good idea but it is too slow. Merging multiple communities together is more quick but it is less accurate. Therefore, we use both of them and call it “Hybrid” merging. We also use a vertex similarity measure to find small communities which we denote them as preliminary communities and then apply the modularity maximization strategy on these preliminary communities that will result in community detection with better modularity value. Merging is stopped when the maximum modularity achieved.

The structure of the paper is as follows: In the next section we present a review of the literature. In Section 4 we provide a detail discussion of our work which is followed by complexity analysis of the algorithm. Finally in Section 6 we present the result of our experiments.

Section snippets

Related works

The most well-known algorithm for community detection was proposed by Girvan and Newman (2002). This method is historically important due to the opening a new era in the field of community detection. This method uses a new similarity measure called edge betweenness. Edge betweenness is referred to the number of shortest paths between all vertex pairs that run along that edge. The algorithm has a complexity O(n3) on a sparse graph. In the following we will refer to it as GN. In another work (

Evaluation criteria

Finding ideal algorithms of community detection aims at two main goals, i.e. improving the accuracy in the determination of meaningful modules and reducing the computational complexity of the algorithm. Reducing the computational complexity is a well defined objective: in many cases (i.e. this work) it is possible to compute analytically the complexity of an algorithm, in others one can derive it from simulations of the algorithm on systems of different sizes. The main problem is then to

Our work

Our idea for community detection is generally based on finding small communities (i.e. sub-communities) and then merging them in order to obtain real communities of a graph. Like communities, subcommunities are vertices with dense relationship in which most or all of their neighbors are in common.

In this approach, for each subcommunity ci we try to find a neighbor subcommunity cj so that merging them will result in increasing the modularity value. If there exist several such neighbor

Complexity analysis

As we know D=i=1ndi/n, so D=2m/n, and m=n×D/2. Our proposed algorithm has three parts: weighting algorithm with time complexity O(Rm) and space complexity O(m), preliminary community detection with time complexity O(mlog(m)) and space complexity O(m) and finally merging stage with time complexity O(mlog(n)). The total time complexity of the proposed community detection isR×D×n+D×n×log(n)+log(n)×D×nand the total space complexity of the purposed algorithm isO(2m))

In the weighting algorithm, we

Experimental results

Our proposed algorithm is implemented in C# and since the implementation of the other algorithms are platform dependent, we assess the performance of the algorithm by analytically computing the complexity as it is seen in the previous section.

To assess the accuracy, we conducted our experiment on three different set of real and artificial networks (http://www.cc.gatech.edu/dimacs10/archive/clustering.shtml,, Yan and Gregory, 2012, Lancichinetti et al., 2008). As it is mentioned in Section 3,

Conclusion

We proposed a modularity maximization algorithm for community detection with time complexity O(n.log(n)). The algorithm utilized a vertex similarity measure to find small preliminary communities to be used as a start point in merging stage. As we compared our algorithm with some of well-known algorithms on several real benchmark graphs, our algorithm showed better performance. For some real networks, while the proposed algorithm has lower time complexity, the performance was comparable with

References (26)

  • S. Fortunato

    Community detection in graphs

    Physics Reports

    (2010)
  • Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. Journal of...
  • A. Clauset et al.

    Finding community structure in very large networks

    Physical Review E

    (2004)
  • Danon L, Duch J, Diaz-Guilera A, Arenas A. Comparing community structure identification. Journal of Statistical...
  • Donetti L, Munoz MA. Detecting network communities: a new systematic and efficient algorithm. Journal of Statistical...
  • J. Duch et al.

    Community identification using extremal optimization

    Physical Review E

    (2005)
  • S. Fortunato et al.

    Resolution limit in community detection

    Proceedings of the National Academy of Sciences USA

    (2007)
  • M. Girvan et al.

    Community structure in social and biological networks

    Proceedings of the National Academy of Sciences USA

    (2002)
  • B.H. Good et al.

    The performance of modularity maximization in practical contexts

    Physical Review E

    (2010)
  • C. Granell et al.

    Mesoscopic analysis of networksapplications to exploratory analysis and data clustering

    Chaos

    (2011)
  • S. Gregory

    Finding overlapping communities in networks by label propagation

    New Journal of Physics

    (2010)
  • ...
  • A. Lancichinetti et al.

    Limits of modularity maximization in community detection

    Physical Review E

    (2011)
  • Cited by (36)

    • Community detection in complex network based on an improved random algorithm using local and global network information

      2022, Journal of Network and Computer Applications
      Citation Excerpt :

      In the non-overlapped category, each network node can be included in only one community. In comparison, a network node may be included in more than one community in the overlapped category (Liu et al., 2016; Arab and Afsharchi, 2014). In this paper, we consider non-overlapped methods because overlapped methods are generally related to social networks, while in this paper, we focus on complex networks (Coscia et al., 2011).

    • Influence propagation: Interest groups and node ranking models

      2020, Physica A: Statistical Mechanics and its Applications
    • Community detection in complex networks using structural similarity

      2018, Physica A: Statistical Mechanics and its Applications
      Citation Excerpt :

      In the following, we summarize some related proposals in community detection. In [4], a community detection approach was proposed that uses hybrid merging of sub-communities method. In this approach, all nodes are assumed unlabeled and each edge in the network graph is assigned by a weight.

    • Overlapping communities detection based on spectral analysis of line graphs

      2018, Physica A: Statistical Mechanics and its Applications
      Citation Excerpt :

      The community structure is one of prominent features of networks [1] with a topology structure of “external loose and inner tight” [2].

    • DDSE: A novel evolutionary algorithm based on degree-descending search strategy for influence maximization in social networks

      2018, Journal of Network and Computer Applications
      Citation Excerpt :

      Experimental results show that NewGreedy significantly outperforms CELF. Wang et al. (2010) improved the efficiency by exploiting the community property (Arab et al., 2014) of social networks and proposed a community-based algorithm named CGA, which is applicable to both IC model and LT model. Moreover, Goyal et al. (2011) proposed CELF++, and demonstrated it is 35–55% faster than CELF.

    View all citing articles on Scopus
    View full text