CoVeC: Coarse-grained vertex clustering for efficient community detection in sparse complex networks

doi:10.1016/j.ins.2020.03.004

Information Sciences

Volume 522, June 2020, Pages 180-192

https://doi.org/10.1016/j.ins.2020.03.004 Get rights and content

Highlights

•
The first interactions are the most costly in the Louvain method (LM) for community detection.
•
We present CoVeC: a Coarse-grained Vertex Clustering for efficient community detection in sparse complex networks.
•
CoVeC pre-processes the original graph to forward a graph of reduced size to the LM.
•
CoVeC+LM can be a way faster option than the standalone LM, yet similarly effective, for sparse complex networks.
•
Considering different real-world and synthetic networks, a mean processing time reduction of 47% is achieved along with a mean modularity reduction of only 0.4%.

Abstract

This paper tackles the problem of community detection in large-scale graphs. In the literature devoted to this topic, an iterative algorithm, called Louvain Method (LM), stands out as an effective and fast solution for this problem. However, the first iterations of the LM are the most costly. To overcome this issue, this paper introduces CoVeC, a Coarse-grained Vertex Clustering for efficient community detection in sparse complex networks. CoVeC pre-processes the original graph in order to forward a graph of reduced size to the LM. The subsequent group formation, including the maximization of group quality, as per the modularity metric, is left to the LM. We evaluate our proposal using real-world and synthetic networks, presenting distinct sizes and sparsity levels. Overall, our experimental results show that CoVeC can be a way faster option than the first iterations of the LM, yet similarly effective. In fact, for sparser graphs, the combo CoVeC+LM outperforms the standalone LM and its variations, attaining a mean processing time reduction of 47% and a mean modularity reduction of only 0.4%.

Introduction

In a simple way, a graph is a structure composed of vertices that can be connected by edges, indicating a relation between a pair of vertices. Conveniently, graphs can represent various real-world relationships [1]. Actually, large-scale graphs can represent complex systems or large-scale networked services, such as Facebook and Twitter, or other common services in domains such as WWW [2], [3].

In several real complex networks, the distribution of edges between the vertices is highly heterogeneous. This leads, in certain cases, to a high concentration of edges within groups of vertices and a low concentration of edges between distinct groups. This characteristic of real networks is typically called a community structure. Communities are groups of vertices that probably share some common proprieties or perform similar positions in the studied graph. In this context, considering the analysis of graphs representing real complex networks, a common problem is community detection [4], [5], [6], [7], which is the main focus of this article.

The detection of communities plays a very important role in analyzing the structure of complex networks. For instance, it helps in identifying and visualizing the internal structure of the network, detecting potentially useful information, and mine the relationships between individuals [8]. Note that many complex networks tend to be organized into communities, i.e., tightly-knit modules of nodes. Identifying these communities by only using the information encoded in the network topology is a challenge and an important task [9].

Considering examples from different disciplines, we can observe that communities often play important roles in the organization of networks. For example, in online social networks, communities correspond to groups of friends who attended the same school, or neighborhood, or even share common interests. In a protein-protein interaction network, groups (or communities) of proteins are those that functionally interact with each other, possibly contributing to the same cellular function. In brain networks of interconnected neurons, communities can correspond to specialized functional components, such as visual and auditory systems. In this sense, detecting network communities allows us to discover functionally related objects study interactions between modules, infer missing attribute values, and predict unobserved connections [10].

Currently, the scale of complex networks is incredibly large. For example, the WWW has an estimated size of over one trillion documents (10¹² pages), being the Web the largest network humanity has ever built [11]. It exceeds in size even the human brain (10¹¹ neurons). In other words, a high-speed and high-quality community detection algorithm is crucial for the analysis of the current large-scale networks.

In this sense, the community detection task in graphs consists of finding groups of vertices that have one or more characteristics in common. For example, the neighborhood vertices share may be used as a characteristic to define a community. In this case, intuitively, a community will have a number of vertices that are well connected with each other. When talking about community detection, clustering may be used as a synonym for it. Nevertheless, clustering is a more general concept. As expected, there are several methodologies—and variations—to detect communities in graphs. Each methodology presents its strengths and applicability. In the same way, there is a number of metrics to evaluate how a community is a well representative [12] (i.e., how the members of a community are similar). In this work, we consider community detection methodologies based on modularity. More precisely, modularity is an important metric, commonly used to evaluate the quality of a community structure in a graph [13], [14]. Intuitively, the modularity is a metric that shows how densely connected are the vertices of a given group or community. The higher/lower the value of the modularity, the more/less connected are the vertices of a given community, when compared to a random distribution of edges interconnecting these vertices.

The community detection problem can then be formalized as a modularity maximization problem from the partitions of a graph. It has been proved that modularity optimization is an NP-complete problem [15]. In other words, it is probably impossible to find an optimal solution in a time growing polynomially with the size of the graph. However, there are currently several methods to detect fairly good approximations of the modularity maximum in a reasonable time [16], [17]. Among these methods, the Louvain Method (LM) [18] stands out for quickly and efficiently identifying communities (i.e, with high modularity) in large-scale graphs. The LM has been widely adopted in practice because of its speed and high quality of results [19]. In fact, until the present day, the LM continues to be one of the most widely used tools for serial community detection [20].

LM thus employs an iterative heuristic based on modularity maximization. Initially, the LM assigns each vertex of the graph to a different community. In its first stage, for all vertices of the graph, the LM checks, in a greedy way, whether there is any gain in modularity by changing the membership of vertex i from the community it is actually belonging (c_i) to a neighbor community (c_j). Vertex i is then moved to c_j in the case there is a gain on the modularity. This process is repeated for each vertex until there is no gain of modularity.

The second stage of the LM simply generates a new (reduced) graph where the new single vertices are the communities discovered by the previous stage. Here, a run of the first and second stages will be called an LM cycle. Starting a new LM cycle, the reduced graph is fed to the first stage, so that new communities that maximize modularity are detected. LM cycles continue while there is a gain of modularity.

The LM has computational complexity O(nlog n) [18], where n is the total number of vertices in the graph. The LM initiates with a graph of n vertices and, after each LM cycle, the trend is towards graphs with a smaller number of vertices (communities). Therefore, the first LM cycle accounts for most of the overall computational cost. In the literature, preprocess the original graph via fast algorithms is often used as a way to speed up this method. For example, one may eliminate redundant edges to offload the first step of the LM. However, most of the existing methods may not be efficient under certain conditions (e.g. in sparse networks) or may reflect the real communities of the network, as we will discuss in Section 2.

In this paper, we propose CoVeC, a Coarse-Grained Vertex Clustering method. CoVeC pre-processes the original graph in order to forward a graph of reduced size to the LM. CoVeC does a similar job as the first costly round of the LM, i.e., to generate a reduced graph with communities still close to the original graph structure. The subsequent maximization of quality, as per the modularity metric, is left to the LM. As a consequence, we show that the hybrid community detector CoVeC+LM tends to outperform LM in terms of execution time, at the cost of a slight reduction in the modularity of the found communities.

We evaluate the performance of the proposed combo CoVeC+LM by comparing it with the original LM, as well as with two other enhancements of the LM process: Fire-Forest and Local Sparsification methods (related work is discussed in Section 2). Our evaluations rely on large-scale real-world, as well as synthetic, networks. Networks we evaluate present distinct sizes and sparsity. Overall, our experimental results show that CoVeC can be a way faster option than the first iterations of the LM, yet similarly effective. In fact, for sparser graphs, CoVeC outperforms the LM and its variations, attaining a mean time reduction of 47% and mean modularity (quality) reduction of only 0.4%.

The remainder of this article is organized as follows: first, we present the related work in Section 2. Then, in Section 3, we present a detailed description of the CoVeC and the necessary modifications to the LM. In Section 4, we present our evaluation methodology and the performance of the CoVeC. Moreover, we evaluate the sensibility of the CoVeC to the choice of its parameters. Finally, in Section 5, we conclude our work.

Section snippets

Related work

Despite the low complexity of the LM, using it to detect communities in graphs with a very large number of vertices can be computationally costly. To alleviate this problem, there are several ways to reduce the runtime of the LM in the literature. Some of these methods are quite simple. For example, Ozaki et al. [21] have analyzed the LM processes that most contribute to slow down its runtime. Authors, then, simply randomly choose the community of the neighbor node—instead of considering all

CoVeC: a coarse-grained vertex clustering

In this section, we first formalize the CoVeC. We present a pseudo-algorithm and an illustrative example of its functioning. Then, we present a temporal complexity analysis. Finally, we present a space complexity analysis of the CoVeC.

Evaluation methodology and analysis

In this section, we first present our evaluation methodology (Section 4.1). We then evaluate the CoVeC sensibility to the choice of its parameters (Section 4.2). We also compared the performance of CoVeC, comparing it with the original LM and two other methods that simplify the LM input graph (Section 4.3). Finally, we evaluate the CoVeC+LM performance in the face of the network density (Section 4.4).

Conclusion and future work

Community detection is a key problem in graph analytics. Basically, this problem consists of finding groups of vertices that have one or more characteristics in common. The Louvain Method (LM) is a well-known method that stands out for quickly and efficiently identifying communities in large-scale graphs. However, the LM method presents a costly first round that may impose a high execution time on the method.

To overcome the costly first round of the LM method, in this paper, we propose CoVeC, a

Declaration of Competing Interest

The authors whose names are listed immediately below certify that they have NO affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers’ bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in

References (37)

S. Fortunato
Community detection in graphs
Phys. Rep.
(2010)
M. Fazlali et al.
Adaptive parallel Louvain community detection on a multicore platform
Microprocess. Microsyst.
(2017)
J.A. Bondy et al.
Graph Theory with Applications
(1976)
H. Kwak et al.
What is twitter, a social network or a news media?
Proc. 19th Int. Conf. on World Wide Web
(2010)
A. Mislove et al.
Measurement and analysis of online social networks
Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement
(2007)
S. Harenberg et al.
Community detection in large-scale networks: a survey and empirical evaluation
Wiley Interdiscip. Rev.
(2014)
B.S. Khan, M.A. Niazi, Network community detection: a review and visual survey, 2017,...
Y. Zhao
A survey on theoretical advances of community detection in networks
Wiley Interdiscip. Rev.
(2017)
L. Chaudhary et al.
Community detection using an enhanced Louvain method in complex networks
International Conference on Distributed Computing and Internet Technology
(2019)
I. Gutiérrez et al.
A new community detection algorithm based on fuzzy measures
International Conference on Intelligent and Fuzzy Systems
(2019)

J. Yang et al.

Community detection in networks with node attributes

2013 IEEE 13th International Conference on Data Mining

(2013)

A.-L. Barabási

Network Science

(2016)

T. Chakraborty et al.

Metrics for community analysis: a survey

ACM Comput. Surv. (CSUR)

(2017)

M. Girvan et al.

Community structure in social and biological networks

Proc. Natl. Acad. Sci.

(2002)

M.E. Newman

Modularity and community structure in networks

Proc. Natl. Acad. Sci.

(2006)

U. Brandes, D. Delling, M. Gaertler, R. Görke, M. Hoefer, Z. Nikoloski, D. Wagner, Maximizing modularity is hard,...

A. Lancichinetti et al.

Benchmark graphs for testing community detection algorithms

Phys. Rev. E

(2008)

J. Leskovec et al.

Empirical comparison of algorithms for network community detection

Proc. 19th Int. Conf. on World Wide Web

(2010)

Cited by (11)

A clustering method based on multi-positive–negative granularity and attenuation-diffusion pattern
2024, Information Fusion
As an important part of machine learning, clustering methods have been continuously paid attention to. Current clustering methods divide data objects usually based on Euclidean metric, which is a basic and effective metric method. However, with the high dimensionality of data and the diversification of data representation, the complexity of the spatial structure of real-world data continues to rise. Classical clustering methods face many challenges such as insufficient clustering effectiveness, the sensitivity of clustering method parameters, and lack of stability of clustering results. Aiming at the above problems, this paper designs a non-Euclidean metric and constructs a multi-granularity staged clustering method based on the metric. First of all, this paper uses the sequential relationship of each feature of the data to construct a similarity measure between objects from the perspective of positive and negative granularity to improve the clustering algorithm’s understanding of complex spatial structure data. Secondly, this paper designs the attenuation-diffusion pattern divides and conquers according to the distribution characteristics of data objects in different patterns, and uses the heuristic idea to effectively cluster the data in stages from local to global. Again, based on the above, this paper proposes a clustering method based on multi-positive-negative granularity and attenuation-diffusion pattern, which can effectively deal with the challenges brought by complex spatial structure data to clustering methods. Finally, the effectiveness and robustness of the proposed method and advanced clustering methods are compared and analyzed on UCI real data sets. Experimental results show that the method proposed in this paper has obvious advantages in clustering results on complex spatial structure data. In addition, in the two directions of non-Euclidean metrics and multi-granularity clustering, the method proposed in this paper provides a new perspective for effectively dealing with the design of clustering methods on complex spatial structure data.
Multiple bipolar fuzzy measures: an application to community detection problems for networks with additional information
2024, arXiv
Community Detection Problem Based on Polarization Measures: An Application to Twitter: The COVID-19 Case in Spain
2024, arXiv
An improved label propagation algorithm based on community core node and label importance for community detection in sparse network
2023, Applied Intelligence
An improved Girvan–Newman community detection algorithm using trust-based centrality
2023, Journal of Ambient Intelligence and Humanized Computing
SGAE: Stacked Graph Autoencoder for Deep Clustering
2023, IEEE Transactions on Big Data

View all citing articles on Scopus

^☆: This work was funded in part by research project grants from CAPES, CNPq, FAPERJ, and FAPESP.

View full text

CoVeC: Coarse-grained vertex clustering for efficient community detection in sparse complex networks☆

Highlights

Abstract

Introduction

Section snippets

Related work

CoVeC: a coarse-grained vertex clustering

Evaluation methodology and analysis

Conclusion and future work

Declaration of Competing Interest

Phys. Rep.

Microprocess. Microsyst.

Graph Theory with Applications

What is twitter, a social network or a news media?

Proc. 19th Int. Conf. on World Wide Web

Measurement and analysis of online social networks

Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement

Community detection in large-scale networks: a survey and empirical evaluation

Wiley Interdiscip. Rev.

A survey on theoretical advances of community detection in networks

Wiley Interdiscip. Rev.

Community detection using an enhanced Louvain method in complex networks

International Conference on Distributed Computing and Internet Technology

A new community detection algorithm based on fuzzy measures

International Conference on Intelligent and Fuzzy Systems

Community detection in networks with node attributes

2013 IEEE 13th International Conference on Data Mining

Network Science

Metrics for community analysis: a survey

ACM Comput. Surv. (CSUR)

Community structure in social and biological networks

Proc. Natl. Acad. Sci.

Modularity and community structure in networks

Proc. Natl. Acad. Sci.

Benchmark graphs for testing community detection algorithms

Phys. Rev. E

Empirical comparison of algorithms for network community detection

Proc. 19th Int. Conf. on World Wide Web