Skip to main content
Log in

Disjoint and Non-Disjoint Community Detection with Control of Overlaps Between Communities

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Overlapping community detection has become an important challenge in networks analysis that motivates researchers to propose community detection methods that best fit existing complex and non-disjoint structures in real-world networks such as social, scientific and collaborative networks. Existing overlapping community detection methods usually build large overlaps between communities, larger than expected, and do not allow users to interact with the system to regulate this size, except those allowing to include hard constraints. To solve these issues, we propose a novel non-disjoint community detection method, referred to as CDCO, which easily allows users to interact with the system and regulate overlaps between communities based on existing relationships between nodes in the network. In the same way that allowing to analysts to control the number of communities or the minimal number of actors in the community, CDCO allows to regulate overlaps using an \(\alpha\) parameter which can favor or penalize overlaps. The regulation of overlaps is introduced in the objective criterion and optimized iteratively during the community detection process. Extensive experiments, conducted on both simulated and real-world networks having different sizes of overlaps, show the importance of the regulation of overlaps when a non-disjoint partitioning of the network is needed and show that CDCO outperforms existing conventional methods in terms of both F-measure and NMI.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. Available at : http://snap.stanford.edu/data/com-Amazon.html.

  2. Available at : http://snap.stanford.edu/data/com-DBLP.html.

  3. Available at : http://snap.stanford.edu/data/com-Youtube.html.

  4. Available at : http://snap.stanford.edu/data/com-LiveJournal.html.

References

  1. Chakrabarti D, Faloutsos C. Graph mining: laws, generators, and algorithms. ACM Comput Surv. 2006;38(1):2.

    Article  Google Scholar 

  2. Agarwal N, Liu H, Tang L, Yu PS. Identifying the influential bloggers in a community. In: Proceedings of the 2008 international conference on web search and data mining, ACM, pp. 207–218, 2008.

  3. Bedi P, Sharma C. Community detection in social networks. Interdiscip Rev Data Min Knowl Discov. 2016;6(3):115–35.

    Article  Google Scholar 

  4. Li W, Jiang S, Jin Q. Overlap community detection using spectral algorithm based on node convergence degree. Future Gener Comput Syst. 2018;79:408–16.

    Article  Google Scholar 

  5. Liben-Nowell D, Kleinberg J. The link-prediction problem for social networks. J Assoc Inf Sci Technol. 2007;58(7):1019–31.

    Article  Google Scholar 

  6. He K, Li Y, Soundarajan S, Hopcroft JE. Hidden community detection in social networks. Inf Sci. 2018;425:92–106.

    Article  MathSciNet  Google Scholar 

  7. Huang M, Zou G, Zhang B, Liu Y, Yajun G, Jiang K. Overlapping community detection in heterogeneous social networks via the user model. Inf Sci. 2018;432:164–84.

    Article  MathSciNet  Google Scholar 

  8. N’Cir C-E, Cleuziou G, Essoussi N. Generalization of c-means for identifying non-disjoint clusters with overlap regulation. Pattern Recogn Lett. 2014;45:92–8.

    Article  Google Scholar 

  9. Lim S, Ryu S, Kwon S, Jung K, Lee J-G. Linkscan*: overlapping community detection using the link-space transformation. In: 2014 IEEE 30th international conference on data engineering, pp. 292–303, 2014.

  10. Fortunato S. Community detection in graphs. Phys Rep. 2010;486(3):75–174.

    Article  MathSciNet  Google Scholar 

  11. Hajkacem MAB, N’cir C-EB, Essoussi N. One-pass mapreduce-based clustering method for mixed large scale data. J Intell Inf Syst. 2019;52(3):619–36.

    Article  Google Scholar 

  12. Mori J, Sugiyama T, Matsuo Y. Real-world oriented information sharing using social networks. In Proceedings of the 2005 international ACM SIGGROUP conference on Supporting group work, ACM, pp. 81–84, 2005.

  13. Stanley W. Advances in social network analysis: research in the social and behavioral sciences. Thousand Oaks: Sage Publications; 1994.

    Google Scholar 

  14. Wang F-Y, Carley KM, Zeng D, Mao W. Social computing: From social informatics to social intelligence. IEEE Intell Syst. 2007;22(2):79–83.

    Article  Google Scholar 

  15. Tang L, Liu H. Community detection and mining in social media. Synth Lect Data Min Knowl Discov. 2010;2(1):1–137.

    Article  MathSciNet  Google Scholar 

  16. Raghavan UN, Albert R, Kumara S. Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E Stat Nonlinear Soft Matter Phys. 2007a;76(3 Pt 2):036106.

    Article  Google Scholar 

  17. Wasserman S, Faust K, et al. Social network analysis: methods and applications, vol. 8. Cambridge: Cambridge University Press; 1994.

    Book  Google Scholar 

  18. Abello J, Resende MGC, Sudarsky S. Massive quasi-clique detection. In: Latin American symposium on theoretical informatics, Springer, pp. 598–612, 2002.

  19. Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Stat Mech Theory Exp. 2008;2008(10):P10008.

    Article  Google Scholar 

  20. Ovelgönne M, Geyer-Schulz A. An ensemble learning strategy for graph clustering. Graph Partit Graph Clust. 2012;588:187.

    Article  MathSciNet  Google Scholar 

  21. Hoff PD, Raftery AE, Handcock MS. Latent space approaches to social network analysis. J Am Stat Assoc. 2002;97(460):1090–8.

    Article  MathSciNet  Google Scholar 

  22. Borg I, Groenen P. Modern multidimensional scaling: theory and applications. J Educ Meas. 2003;40(3):277–80.

    Article  Google Scholar 

  23. Raghavan UN, Albert R, Kumara S. Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E. 2007b;76(3):036106.

    Article  Google Scholar 

  24. Newman MEJ. Finding community structure in networks using the eigenvectors of matrices. Phys Rev E. 2006;74(3):036104.

    Article  MathSciNet  Google Scholar 

  25. Xie J, Kelley S, Szymanski BK. Overlapping community detection in networks: the state-of-the-art and comparative study. ACM Comput Surv. 2013;45(4):1–35.

    Article  Google Scholar 

  26. Palla G, Derényi I, Farkas I, Vicsek T. Uncovering the overlapping community structure of complex networks in nature and society. Nature. 2005;435(7043):814–8.

    Article  Google Scholar 

  27. Adamcsek B, Palla G, Farkas IJ, Derényi I, Vicsek T. Cfinder: locating cliques and overlapping modules in biological networks. Bioinformatics. 2006;22(8):1021–3.

    Article  Google Scholar 

  28. Kumpula JM, Kivelä M, Kaski K, Saramäki J. Sequential algorithm for fast clique percolation. Phys Rev E. 2008;78(2):026109.

    Article  Google Scholar 

  29. Lancichinetti A, Fortunato S. Community detection algorithms: a comparative analysis. Phys Rev E. 2009;80(5):056117.

    Article  Google Scholar 

  30. Rosvall M, Bergstrom CT. Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci. 2008;105(4):1118–23.

    Article  Google Scholar 

  31. Lancichinetti A, Fortunato S, Kertész J. Detecting the overlapping and hierarchical community structure in complex networks. New J Phys. 2009;11(3):033015.

    Article  Google Scholar 

  32. Lancichinetti A, Radicchi F, Ramasco JJ, Fortunato S. Finding statistically significant communities in networks. PloS One. 2011;6(4):e18961.

    Article  Google Scholar 

  33. Lee C, Reid F, McDaid A, Hurley N. Detecting highly overlapping community structure by greedy clique expansion. In: Workshop on social network mining and analysis, 2010.

  34. Ahn Y-Y, James PB, Sune L. Link communities reveal multiscale complexity in networks. Nature. 2010;466(7307):761.

    Article  Google Scholar 

  35. Wu Z, Lin Y, Wan H, Tian S. A fast and reasonable method for community detection with adjustable extent of overlapping. In: 2010 IEEE international conference on intelligent systems and knowledge engineering, IEEE, pp. 376–379, 2010.

  36. Evans TS, Lambiotte R. Line graphs of weighted networks for overlapping communities. Eur Phys J B. 2010;77(2):265–72.

    Article  Google Scholar 

  37. Xie J, Szymanski BK. Community detection using a neighborhood strength driven label propagation algorithm. In: 2011 IEEE network science workshop, IEEE, pp. 188–195, 2011.

  38. Gregory S. Finding overlapping communities in networks by label propagation. New J Phys. 2010;12(10):103018.

    Article  Google Scholar 

  39. Mirkin B. The method of principal clusters. Autom Remote Control. 1987;48:1379–88.

    MATH  Google Scholar 

  40. Depril D, Mechelen I, Wilderjans T. Lowdimensional additive overlapping clustering. CLA J. 2012;29(10):297–32020.

    Article  MathSciNet  Google Scholar 

  41. Depril D, Van Mechelen I, Mirkin BG. Algorithms for additive clustering of rectangular data tables. Comput Stat Data Anal. 2008;52(11):4923–38.

    Article  MathSciNet  Google Scholar 

  42. Maiza MI, N’cir CB, Essoussi N. Overlap regulation for additive overlapping clustering methods. In: 2016 IEEE Tenth international conference on research challenges in information science (RCIS), pp. 1–6, 2016.

  43. Amigó E, Gonzalo J, Artiles J, Verdejo F. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf Retr. 2009;12(4):461–86.

    Article  Google Scholar 

  44. McDaid AF, Greene D, Hurley N. Normalized mutual information to evaluate overlapping community finding algorithms. arXiv preprint arXiv:1110.2515, 2011.

  45. Altaf-Ul-Amin MD, Yoko S, Kenji M, Ken K, Shigehiko K. Development and implementation of an algorithm for detection of protein complexes in large interaction networks. BMC Bioinform. 2006;7(1):207.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chiheb-Eddine Ben NCir.

Ethics declarations

Conflict of interest

This article does not contain any studies with human participants or animals performed by any of the authors. On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A

Appendix A

Given a network containing 8 connected nodes as described in Fig. 6. We assume we want to organize the network into 2 communities \(C_1\) and \(C_2\) using the proposed CDCO method. We considered in this illustrative example that nodes \(v_2\) and \(v_4\) are initialized as the community attractor of \(C_1\) and \(C_2\) respectively. We give in the following the different steps of the proposed method to decide the assignment of node \(v_8\) to \(C_1\), to \(C_2\) or to both \(C_1\) and \(C_2\). We give a step by step execution of the proposed method using two cases : the first case when \(\alpha =0\) which leads to a non-disjoint assignment of node \(v_8\) while in the second case we increase the value of \(\alpha =2\) to reduce the overlaps between the two communities. We show how we can easily adjust the size of overlaps.

Fig. 6
figure 6

Results of the proposed method on an illustrative example. a A network containing eight nodes. We considered a partitioning of the network into two communities \(\{C_1, C_2\}\) where \(\breve{a}_{1}=v_2\) and \(\breve{a}_{2}=v_4\) b results of CDCO with \(\alpha =0\), c results of CDCO with \(\alpha =2\)

A.1 First Case with \(\alpha =0\)

  1. 1.

    Evaluate the degree of connectivity between \(v_8\) and each community using Eq. (3):

    $$\begin{aligned}&{\text {Conn}}(v_8, C_1)= {\text {Conn}}(v_8,v_2)= \frac{1}{3+2-1}= \frac{1}{4} = 0.25 \\&{\text {Conn}}(v_8, C_2)= {\text {Conn}}(v_8,v_4)= \frac{2}{3+2-2}=\frac{2}{3}= 0.66 \end{aligned}$$
  2. 2.

    Evaluate the degree of connectivity between \(v_8\) and the combination of the community using Eq. (4):

    $$\begin{aligned} {\text {Conn}}(v_8, (C_1C_2) )= {\text {Conn}}(v_8,(v_2\cup v_4) )= \frac{3}{3+4-3}=\frac{3}{4} =0.75 \end{aligned}$$

    We can show from these results that the degree of connectivity of the node \(v_8\) is maximal if assigned to both \(C_1\) and \(C_2\). However, to decide its final assignment we must evaluate the local error of node \(v_8\) using Eq. (7) for each of these alternatives and take the alternative with the minimal error.

  3. 3.

    Evaluate the local error of \(v_8\) when assigned to the nearest community (\(C_2\)) using Eq. (7):

    $$\left( {1 + \frac{2}{3}} \right)^{0} \left( {1 - \frac{2}{3}} \right) = 0.33$$
  4. 4.

    Evaluate the local error of \(v_8\) when assigned to the first and to the second community (\(C_2C_1\)) using Eq. (7):

    $$\begin{aligned} \left( 2+\frac{3}{4}\right) ^0 \left( 1-\frac{3}{4}\right) = 0.25 \end{aligned}$$

We show now that the minimal error is obtained when \(v_8\) is assigned to both (\(C_2C_1\)). Therefore, to minimize the objective criterion (Eq. 5), \(v_8\) must be assigned to the first and to the second community. These steps must be repeated for each node \(v_i\) in the network to build the partitioning matrix C. We report in Fig. 6b the obtained partitioning on this illustrative example by using the proposed method with \(\alpha =0\).

A.2 Second Case: Reduce Overlaps by Using \(\alpha =2\)

  1. 1.

    The first and the second steps described in the first case still valid for this case. We will compute now local error with \(\alpha =2\)

  2. 2.

    Evaluate the local error of \(v_8\) when assigned to the nearest community (\(C_2\)) using Eq. (7):

    $$\begin{aligned} \left( 1+\frac{2}{3}\right) ^2 \left( 1-\frac{2}{3}\right) = 0.9 \end{aligned}$$
  3. 3.

    Evaluate the local error of \(v_8\) when assigned to the first and the second community (\(C_2C_1\)) using Eq. (7):

    $$\begin{aligned} \left( 2+\frac{3}{4}\right) ^2 \left( 1-\frac{3}{4}\right) = 1.8 \end{aligned}$$

We show now that the minimal error is obtained when \(v_8\) is only assigned to the second community (\(C_2\)). Therefore, to minimize the objective criterion (Eq. 5), \(v_8\) must be assigned to the second community (\(C_2\)). These steps must be repeated for each node \(v_i\) in the network to build the partitioning matrix C. We report in Fig. 6c the obtained partitioning on this illustrative example by using the proposed method with \(\alpha =2\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ben NCir, CE., Maiza, I., Bouaguel, W. et al. Disjoint and Non-Disjoint Community Detection with Control of Overlaps Between Communities. SN COMPUT. SCI. 2, 15 (2021). https://doi.org/10.1007/s42979-020-00391-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-020-00391-w

Keywords