Elsevier

Information Sciences

Volume 522, June 2020, Pages 180-192
Information Sciences

CoVeC: Coarse-grained vertex clustering for efficient community detection in sparse complex networks

https://doi.org/10.1016/j.ins.2020.03.004Get rights and content

Highlights

  • The first interactions are the most costly in the Louvain method (LM) for community detection.

  • We present CoVeC: a Coarse-grained Vertex Clustering for efficient community detection in sparse complex networks.

  • CoVeC pre-processes the original graph to forward a graph of reduced size to the LM.

  • CoVeC+LM can be a way faster option than the standalone LM, yet similarly effective, for sparse complex networks.

  • Considering different real-world and synthetic networks, a mean processing time reduction of 47% is achieved along with a mean modularity reduction of only 0.4%.

Abstract

This paper tackles the problem of community detection in large-scale graphs. In the literature devoted to this topic, an iterative algorithm, called Louvain Method (LM), stands out as an effective and fast solution for this problem. However, the first iterations of the LM are the most costly. To overcome this issue, this paper introduces CoVeC, a Coarse-grained Vertex Clustering for efficient community detection in sparse complex networks. CoVeC pre-processes the original graph in order to forward a graph of reduced size to the LM. The subsequent group formation, including the maximization of group quality, as per the modularity metric, is left to the LM. We evaluate our proposal using real-world and synthetic networks, presenting distinct sizes and sparsity levels. Overall, our experimental results show that CoVeC can be a way faster option than the first iterations of the LM, yet similarly effective. In fact, for sparser graphs, the combo CoVeC+LM outperforms the standalone LM and its variations, attaining a mean processing time reduction of 47% and a mean modularity reduction of only 0.4%.

Introduction

In a simple way, a graph is a structure composed of vertices that can be connected by edges, indicating a relation between a pair of vertices. Conveniently, graphs can represent various real-world relationships [1]. Actually, large-scale graphs can represent complex systems or large-scale networked services, such as Facebook and Twitter, or other common services in domains such as WWW [2], [3].

In several real complex networks, the distribution of edges between the vertices is highly heterogeneous. This leads, in certain cases, to a high concentration of edges within groups of vertices and a low concentration of edges between distinct groups. This characteristic of real networks is typically called a community structure. Communities are groups of vertices that probably share some common proprieties or perform similar positions in the studied graph. In this context, considering the analysis of graphs representing real complex networks, a common problem is community detection [4], [5], [6], [7], which is the main focus of this article.

The detection of communities plays a very important role in analyzing the structure of complex networks. For instance, it helps in identifying and visualizing the internal structure of the network, detecting potentially useful information, and mine the relationships between individuals [8]. Note that many complex networks tend to be organized into communities, i.e., tightly-knit modules of nodes. Identifying these communities by only using the information encoded in the network topology is a challenge and an important task [9].

Considering examples from different disciplines, we can observe that communities often play important roles in the organization of networks. For example, in online social networks, communities correspond to groups of friends who attended the same school, or neighborhood, or even share common interests. In a protein-protein interaction network, groups (or communities) of proteins are those that functionally interact with each other, possibly contributing to the same cellular function. In brain networks of interconnected neurons, communities can correspond to specialized functional components, such as visual and auditory systems. In this sense, detecting network communities allows us to discover functionally related objects study interactions between modules, infer missing attribute values, and predict unobserved connections [10].

Currently, the scale of complex networks is incredibly large. For example, the WWW has an estimated size of over one trillion documents (1012 pages), being the Web the largest network humanity has ever built [11]. It exceeds in size even the human brain (1011 neurons). In other words, a high-speed and high-quality community detection algorithm is crucial for the analysis of the current large-scale networks.

In this sense, the community detection task in graphs consists of finding groups of vertices that have one or more characteristics in common. For example, the neighborhood vertices share may be used as a characteristic to define a community. In this case, intuitively, a community will have a number of vertices that are well connected with each other. When talking about community detection, clustering may be used as a synonym for it. Nevertheless, clustering is a more general concept. As expected, there are several methodologies—and variations—to detect communities in graphs. Each methodology presents its strengths and applicability. In the same way, there is a number of metrics to evaluate how a community is a well representative [12] (i.e., how the members of a community are similar). In this work, we consider community detection methodologies based on modularity. More precisely, modularity is an important metric, commonly used to evaluate the quality of a community structure in a graph [13], [14]. Intuitively, the modularity is a metric that shows how densely connected are the vertices of a given group or community. The higher/lower the value of the modularity, the more/less connected are the vertices of a given community, when compared to a random distribution of edges interconnecting these vertices.

The community detection problem can then be formalized as a modularity maximization problem from the partitions of a graph. It has been proved that modularity optimization is an NP-complete problem [15]. In other words, it is probably impossible to find an optimal solution in a time growing polynomially with the size of the graph. However, there are currently several methods to detect fairly good approximations of the modularity maximum in a reasonable time [16], [17]. Among these methods, the Louvain Method (LM) [18] stands out for quickly and efficiently identifying communities (i.e, with high modularity) in large-scale graphs. The LM has been widely adopted in practice because of its speed and high quality of results [19]. In fact, until the present day, the LM continues to be one of the most widely used tools for serial community detection [20].

LM thus employs an iterative heuristic based on modularity maximization. Initially, the LM assigns each vertex of the graph to a different community. In its first stage, for all vertices of the graph, the LM checks, in a greedy way, whether there is any gain in modularity by changing the membership of vertex i from the community it is actually belonging (ci) to a neighbor community (cj). Vertex i is then moved to cj in the case there is a gain on the modularity. This process is repeated for each vertex until there is no gain of modularity.

The second stage of the LM simply generates a new (reduced) graph where the new single vertices are the communities discovered by the previous stage. Here, a run of the first and second stages will be called an LM cycle. Starting a new LM cycle, the reduced graph is fed to the first stage, so that new communities that maximize modularity are detected. LM cycles continue while there is a gain of modularity.

The LM has computational complexity O(nlog n) [18], where n is the total number of vertices in the graph. The LM initiates with a graph of n vertices and, after each LM cycle, the trend is towards graphs with a smaller number of vertices (communities). Therefore, the first LM cycle accounts for most of the overall computational cost. In the literature, preprocess the original graph via fast algorithms is often used as a way to speed up this method. For example, one may eliminate redundant edges to offload the first step of the LM. However, most of the existing methods may not be efficient under certain conditions (e.g. in sparse networks) or may reflect the real communities of the network, as we will discuss in Section 2.

In this paper, we propose CoVeC, a Coarse-Grained Vertex Clustering method. CoVeC pre-processes the original graph in order to forward a graph of reduced size to the LM. CoVeC does a similar job as the first costly round of the LM, i.e., to generate a reduced graph with communities still close to the original graph structure. The subsequent maximization of quality, as per the modularity metric, is left to the LM. As a consequence, we show that the hybrid community detector CoVeC+LM tends to outperform LM in terms of execution time, at the cost of a slight reduction in the modularity of the found communities.

We evaluate the performance of the proposed combo CoVeC+LM by comparing it with the original LM, as well as with two other enhancements of the LM process: Fire-Forest and Local Sparsification methods (related work is discussed in Section 2). Our evaluations rely on large-scale real-world, as well as synthetic, networks. Networks we evaluate present distinct sizes and sparsity. Overall, our experimental results show that CoVeC can be a way faster option than the first iterations of the LM, yet similarly effective. In fact, for sparser graphs, CoVeC outperforms the LM and its variations, attaining a mean time reduction of 47% and mean modularity (quality) reduction of only 0.4%.

The remainder of this article is organized as follows: first, we present the related work in Section 2. Then, in Section 3, we present a detailed description of the CoVeC and the necessary modifications to the LM. In Section 4, we present our evaluation methodology and the performance of the CoVeC. Moreover, we evaluate the sensibility of the CoVeC to the choice of its parameters. Finally, in Section 5, we conclude our work.

Section snippets

Related work

Despite the low complexity of the LM, using it to detect communities in graphs with a very large number of vertices can be computationally costly. To alleviate this problem, there are several ways to reduce the runtime of the LM in the literature. Some of these methods are quite simple. For example, Ozaki et al. [21] have analyzed the LM processes that most contribute to slow down its runtime. Authors, then, simply randomly choose the community of the neighbor node—instead of considering all

CoVeC: a coarse-grained vertex clustering

In this section, we first formalize the CoVeC. We present a pseudo-algorithm and an illustrative example of its functioning. Then, we present a temporal complexity analysis. Finally, we present a space complexity analysis of the CoVeC.

Evaluation methodology and analysis

In this section, we first present our evaluation methodology (Section 4.1). We then evaluate the CoVeC sensibility to the choice of its parameters (Section 4.2). We also compared the performance of CoVeC, comparing it with the original LM and two other methods that simplify the LM input graph (Section 4.3). Finally, we evaluate the CoVeC+LM performance in the face of the network density (Section 4.4).

Conclusion and future work

Community detection is a key problem in graph analytics. Basically, this problem consists of finding groups of vertices that have one or more characteristics in common. The Louvain Method (LM) is a well-known method that stands out for quickly and efficiently identifying communities in large-scale graphs. However, the LM method presents a costly first round that may impose a high execution time on the method.

To overcome the costly first round of the LM method, in this paper, we propose CoVeC, a

Declaration of Competing Interest

The authors whose names are listed immediately below certify that they have NO affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers’ bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in

References (37)

  • S. Fortunato

    Community detection in graphs

    Phys. Rep.

    (2010)
  • M. Fazlali et al.

    Adaptive parallel Louvain community detection on a multicore platform

    Microprocess. Microsyst.

    (2017)
  • J.A. Bondy et al.

    Graph Theory with Applications

    (1976)
  • H. Kwak et al.

    What is twitter, a social network or a news media?

    Proc. 19th Int. Conf. on World Wide Web

    (2010)
  • A. Mislove et al.

    Measurement and analysis of online social networks

    Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement

    (2007)
  • S. Harenberg et al.

    Community detection in large-scale networks: a survey and empirical evaluation

    Wiley Interdiscip. Rev.

    (2014)
  • B.S. Khan, M.A. Niazi, Network community detection: a review and visual survey, 2017,...
  • Y. Zhao

    A survey on theoretical advances of community detection in networks

    Wiley Interdiscip. Rev.

    (2017)
  • L. Chaudhary et al.

    Community detection using an enhanced Louvain method in complex networks

    International Conference on Distributed Computing and Internet Technology

    (2019)
  • I. Gutiérrez et al.

    A new community detection algorithm based on fuzzy measures

    International Conference on Intelligent and Fuzzy Systems

    (2019)
  • J. Yang et al.

    Community detection in networks with node attributes

    2013 IEEE 13th International Conference on Data Mining

    (2013)
  • A.-L. Barabási

    Network Science

    (2016)
  • T. Chakraborty et al.

    Metrics for community analysis: a survey

    ACM Comput. Surv. (CSUR)

    (2017)
  • M. Girvan et al.

    Community structure in social and biological networks

    Proc. Natl. Acad. Sci.

    (2002)
  • M.E. Newman

    Modularity and community structure in networks

    Proc. Natl. Acad. Sci.

    (2006)
  • U. Brandes, D. Delling, M. Gaertler, R. Görke, M. Hoefer, Z. Nikoloski, D. Wagner, Maximizing modularity is hard,...
  • A. Lancichinetti et al.

    Benchmark graphs for testing community detection algorithms

    Phys. Rev. E

    (2008)
  • J. Leskovec et al.

    Empirical comparison of algorithms for network community detection

    Proc. 19th Int. Conf. on World Wide Web

    (2010)
  • Cited by (11)

    View all citing articles on Scopus

    This work was funded in part by research project grants from CAPES, CNPq, FAPERJ, and FAPESP.

    View full text