Abstract
Overlapping community detection has become an important challenge in networks analysis that motivates researchers to propose community detection methods that best fit existing complex and non-disjoint structures in real-world networks such as social, scientific and collaborative networks. Existing overlapping community detection methods usually build large overlaps between communities, larger than expected, and do not allow users to interact with the system to regulate this size, except those allowing to include hard constraints. To solve these issues, we propose a novel non-disjoint community detection method, referred to as CDCO, which easily allows users to interact with the system and regulate overlaps between communities based on existing relationships between nodes in the network. In the same way that allowing to analysts to control the number of communities or the minimal number of actors in the community, CDCO allows to regulate overlaps using an \(\alpha\) parameter which can favor or penalize overlaps. The regulation of overlaps is introduced in the objective criterion and optimized iteratively during the community detection process. Extensive experiments, conducted on both simulated and real-world networks having different sizes of overlaps, show the importance of the regulation of overlaps when a non-disjoint partitioning of the network is needed and show that CDCO outperforms existing conventional methods in terms of both F-measure and NMI.





Similar content being viewed by others
Notes
Available at : http://snap.stanford.edu/data/com-Amazon.html.
Available at : http://snap.stanford.edu/data/com-DBLP.html.
Available at : http://snap.stanford.edu/data/com-Youtube.html.
Available at : http://snap.stanford.edu/data/com-LiveJournal.html.
References
Chakrabarti D, Faloutsos C. Graph mining: laws, generators, and algorithms. ACM Comput Surv. 2006;38(1):2.
Agarwal N, Liu H, Tang L, Yu PS. Identifying the influential bloggers in a community. In: Proceedings of the 2008 international conference on web search and data mining, ACM, pp. 207–218, 2008.
Bedi P, Sharma C. Community detection in social networks. Interdiscip Rev Data Min Knowl Discov. 2016;6(3):115–35.
Li W, Jiang S, Jin Q. Overlap community detection using spectral algorithm based on node convergence degree. Future Gener Comput Syst. 2018;79:408–16.
Liben-Nowell D, Kleinberg J. The link-prediction problem for social networks. J Assoc Inf Sci Technol. 2007;58(7):1019–31.
He K, Li Y, Soundarajan S, Hopcroft JE. Hidden community detection in social networks. Inf Sci. 2018;425:92–106.
Huang M, Zou G, Zhang B, Liu Y, Yajun G, Jiang K. Overlapping community detection in heterogeneous social networks via the user model. Inf Sci. 2018;432:164–84.
N’Cir C-E, Cleuziou G, Essoussi N. Generalization of c-means for identifying non-disjoint clusters with overlap regulation. Pattern Recogn Lett. 2014;45:92–8.
Lim S, Ryu S, Kwon S, Jung K, Lee J-G. Linkscan*: overlapping community detection using the link-space transformation. In: 2014 IEEE 30th international conference on data engineering, pp. 292–303, 2014.
Fortunato S. Community detection in graphs. Phys Rep. 2010;486(3):75–174.
Hajkacem MAB, N’cir C-EB, Essoussi N. One-pass mapreduce-based clustering method for mixed large scale data. J Intell Inf Syst. 2019;52(3):619–36.
Mori J, Sugiyama T, Matsuo Y. Real-world oriented information sharing using social networks. In Proceedings of the 2005 international ACM SIGGROUP conference on Supporting group work, ACM, pp. 81–84, 2005.
Stanley W. Advances in social network analysis: research in the social and behavioral sciences. Thousand Oaks: Sage Publications; 1994.
Wang F-Y, Carley KM, Zeng D, Mao W. Social computing: From social informatics to social intelligence. IEEE Intell Syst. 2007;22(2):79–83.
Tang L, Liu H. Community detection and mining in social media. Synth Lect Data Min Knowl Discov. 2010;2(1):1–137.
Raghavan UN, Albert R, Kumara S. Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E Stat Nonlinear Soft Matter Phys. 2007a;76(3 Pt 2):036106.
Wasserman S, Faust K, et al. Social network analysis: methods and applications, vol. 8. Cambridge: Cambridge University Press; 1994.
Abello J, Resende MGC, Sudarsky S. Massive quasi-clique detection. In: Latin American symposium on theoretical informatics, Springer, pp. 598–612, 2002.
Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Stat Mech Theory Exp. 2008;2008(10):P10008.
Ovelgönne M, Geyer-Schulz A. An ensemble learning strategy for graph clustering. Graph Partit Graph Clust. 2012;588:187.
Hoff PD, Raftery AE, Handcock MS. Latent space approaches to social network analysis. J Am Stat Assoc. 2002;97(460):1090–8.
Borg I, Groenen P. Modern multidimensional scaling: theory and applications. J Educ Meas. 2003;40(3):277–80.
Raghavan UN, Albert R, Kumara S. Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E. 2007b;76(3):036106.
Newman MEJ. Finding community structure in networks using the eigenvectors of matrices. Phys Rev E. 2006;74(3):036104.
Xie J, Kelley S, Szymanski BK. Overlapping community detection in networks: the state-of-the-art and comparative study. ACM Comput Surv. 2013;45(4):1–35.
Palla G, Derényi I, Farkas I, Vicsek T. Uncovering the overlapping community structure of complex networks in nature and society. Nature. 2005;435(7043):814–8.
Adamcsek B, Palla G, Farkas IJ, Derényi I, Vicsek T. Cfinder: locating cliques and overlapping modules in biological networks. Bioinformatics. 2006;22(8):1021–3.
Kumpula JM, Kivelä M, Kaski K, Saramäki J. Sequential algorithm for fast clique percolation. Phys Rev E. 2008;78(2):026109.
Lancichinetti A, Fortunato S. Community detection algorithms: a comparative analysis. Phys Rev E. 2009;80(5):056117.
Rosvall M, Bergstrom CT. Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci. 2008;105(4):1118–23.
Lancichinetti A, Fortunato S, Kertész J. Detecting the overlapping and hierarchical community structure in complex networks. New J Phys. 2009;11(3):033015.
Lancichinetti A, Radicchi F, Ramasco JJ, Fortunato S. Finding statistically significant communities in networks. PloS One. 2011;6(4):e18961.
Lee C, Reid F, McDaid A, Hurley N. Detecting highly overlapping community structure by greedy clique expansion. In: Workshop on social network mining and analysis, 2010.
Ahn Y-Y, James PB, Sune L. Link communities reveal multiscale complexity in networks. Nature. 2010;466(7307):761.
Wu Z, Lin Y, Wan H, Tian S. A fast and reasonable method for community detection with adjustable extent of overlapping. In: 2010 IEEE international conference on intelligent systems and knowledge engineering, IEEE, pp. 376–379, 2010.
Evans TS, Lambiotte R. Line graphs of weighted networks for overlapping communities. Eur Phys J B. 2010;77(2):265–72.
Xie J, Szymanski BK. Community detection using a neighborhood strength driven label propagation algorithm. In: 2011 IEEE network science workshop, IEEE, pp. 188–195, 2011.
Gregory S. Finding overlapping communities in networks by label propagation. New J Phys. 2010;12(10):103018.
Mirkin B. The method of principal clusters. Autom Remote Control. 1987;48:1379–88.
Depril D, Mechelen I, Wilderjans T. Lowdimensional additive overlapping clustering. CLA J. 2012;29(10):297–32020.
Depril D, Van Mechelen I, Mirkin BG. Algorithms for additive clustering of rectangular data tables. Comput Stat Data Anal. 2008;52(11):4923–38.
Maiza MI, N’cir CB, Essoussi N. Overlap regulation for additive overlapping clustering methods. In: 2016 IEEE Tenth international conference on research challenges in information science (RCIS), pp. 1–6, 2016.
Amigó E, Gonzalo J, Artiles J, Verdejo F. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf Retr. 2009;12(4):461–86.
McDaid AF, Greene D, Hurley N. Normalized mutual information to evaluate overlapping community finding algorithms. arXiv preprint arXiv:1110.2515, 2011.
Altaf-Ul-Amin MD, Yoko S, Kenji M, Ken K, Shigehiko K. Development and implementation of an algorithm for detection of protein complexes in large interaction networks. BMC Bioinform. 2006;7(1):207.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
This article does not contain any studies with human participants or animals performed by any of the authors. On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A
Appendix A
Given a network containing 8 connected nodes as described in Fig. 6. We assume we want to organize the network into 2 communities \(C_1\) and \(C_2\) using the proposed CDCO method. We considered in this illustrative example that nodes \(v_2\) and \(v_4\) are initialized as the community attractor of \(C_1\) and \(C_2\) respectively. We give in the following the different steps of the proposed method to decide the assignment of node \(v_8\) to \(C_1\), to \(C_2\) or to both \(C_1\) and \(C_2\). We give a step by step execution of the proposed method using two cases : the first case when \(\alpha =0\) which leads to a non-disjoint assignment of node \(v_8\) while in the second case we increase the value of \(\alpha =2\) to reduce the overlaps between the two communities. We show how we can easily adjust the size of overlaps.
Results of the proposed method on an illustrative example. a A network containing eight nodes. We considered a partitioning of the network into two communities \(\{C_1, C_2\}\) where \(\breve{a}_{1}=v_2\) and \(\breve{a}_{2}=v_4\) b results of CDCO with \(\alpha =0\), c results of CDCO with \(\alpha =2\)
A.1 First Case with \(\alpha =0\)
-
1.
Evaluate the degree of connectivity between \(v_8\) and each community using Eq. (3):
$$\begin{aligned}&{\text {Conn}}(v_8, C_1)= {\text {Conn}}(v_8,v_2)= \frac{1}{3+2-1}= \frac{1}{4} = 0.25 \\&{\text {Conn}}(v_8, C_2)= {\text {Conn}}(v_8,v_4)= \frac{2}{3+2-2}=\frac{2}{3}= 0.66 \end{aligned}$$ -
2.
Evaluate the degree of connectivity between \(v_8\) and the combination of the community using Eq. (4):
$$\begin{aligned} {\text {Conn}}(v_8, (C_1C_2) )= {\text {Conn}}(v_8,(v_2\cup v_4) )= \frac{3}{3+4-3}=\frac{3}{4} =0.75 \end{aligned}$$We can show from these results that the degree of connectivity of the node \(v_8\) is maximal if assigned to both \(C_1\) and \(C_2\). However, to decide its final assignment we must evaluate the local error of node \(v_8\) using Eq. (7) for each of these alternatives and take the alternative with the minimal error.
-
3.
Evaluate the local error of \(v_8\) when assigned to the nearest community (\(C_2\)) using Eq. (7):
$$\left( {1 + \frac{2}{3}} \right)^{0} \left( {1 - \frac{2}{3}} \right) = 0.33$$ -
4.
Evaluate the local error of \(v_8\) when assigned to the first and to the second community (\(C_2C_1\)) using Eq. (7):
$$\begin{aligned} \left( 2+\frac{3}{4}\right) ^0 \left( 1-\frac{3}{4}\right) = 0.25 \end{aligned}$$
We show now that the minimal error is obtained when \(v_8\) is assigned to both (\(C_2C_1\)). Therefore, to minimize the objective criterion (Eq. 5), \(v_8\) must be assigned to the first and to the second community. These steps must be repeated for each node \(v_i\) in the network to build the partitioning matrix C. We report in Fig. 6b the obtained partitioning on this illustrative example by using the proposed method with \(\alpha =0\).
A.2 Second Case: Reduce Overlaps by Using \(\alpha =2\)
-
1.
The first and the second steps described in the first case still valid for this case. We will compute now local error with \(\alpha =2\)
-
2.
Evaluate the local error of \(v_8\) when assigned to the nearest community (\(C_2\)) using Eq. (7):
$$\begin{aligned} \left( 1+\frac{2}{3}\right) ^2 \left( 1-\frac{2}{3}\right) = 0.9 \end{aligned}$$ -
3.
Evaluate the local error of \(v_8\) when assigned to the first and the second community (\(C_2C_1\)) using Eq. (7):
$$\begin{aligned} \left( 2+\frac{3}{4}\right) ^2 \left( 1-\frac{3}{4}\right) = 1.8 \end{aligned}$$
We show now that the minimal error is obtained when \(v_8\) is only assigned to the second community (\(C_2\)). Therefore, to minimize the objective criterion (Eq. 5), \(v_8\) must be assigned to the second community (\(C_2\)). These steps must be repeated for each node \(v_i\) in the network to build the partitioning matrix C. We report in Fig. 6c the obtained partitioning on this illustrative example by using the proposed method with \(\alpha =2\).
Rights and permissions
About this article
Cite this article
Ben NCir, CE., Maiza, I., Bouaguel, W. et al. Disjoint and Non-Disjoint Community Detection with Control of Overlaps Between Communities. SN COMPUT. SCI. 2, 15 (2021). https://doi.org/10.1007/s42979-020-00391-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-020-00391-w