Skip to main content

Advertisement

Log in

An interaction-based method for detecting overlapping community structure in real-world networks

  • Regular Paper
  • Published:
International Journal of Data Science and Analytics Aims and scope Submit manuscript

Abstract

A central theme of network analysis, these days, is the detection of community structure as it offers a coarse-grained view of the network at hand. A more interesting and challenging task in network analysis involves the detection of overlapping community structure due to its wide-spread applications in synthesising and interpreting the data arising from social, biological and other diverse fields. Certain real-world networks possess a large number of nodes whose memberships are spread through multiple groups. This phenomenon called community structure with pervasive overlaps has been addressed partially by the development of a few well-known algorithms. In this paper, we presented an algorithm called Interaction Coefficient-based Local Community Detection (IC-LCD) that not only uncovers the community structures with pervasive overlaps but do so efficiently. The algorithm extracted communities through a local expansion strategy which underlie the notion of interaction coefficient. We evaluated the performance of IC-LCD on different parameters such as speed, accuracy and stability on a number of synthetic and real-world networks, and compared the results with well-known baseline algorithms, namely DEMON, OSLOM, SLPA and COPRA. The results give a clear indication that IC-LCD gives competitive performance with the chosen baseline algorithms in uncovering the community structures with pervasive overlaps. The time complexity of IC-LCD is \(\mathcal {O}(nc_{\max })\), where n is the number of nodes, and \(c_{\max }\) is the maximum size of a community detected in a network.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Adamic, L.A., Glance, N.: The political blogosphere and the 2004 U.S. election: divided they blog. In: Proceedings of the 3rd International Workshop on Link Discovery, LinkKDD’05, pp. 36–43. ACM, New York (2005). https://doi.org/10.1145/1134271.1134277

  2. Ahn, Y.Y., Bagrow, J.P., Lehmann, S.: Link communities reveal multiscale complexity in networks. Nature 466(7307), 761–764 (2010). https://doi.org/10.1038/nature09182

    Article  Google Scholar 

  3. Bu, D., Zhao, Y., Cai, L., Xue, H., Zhu, X., Lu, H., Zhang, J., Sun, S., Ling, L., Zhang, N., Li, G., Chen, R.: Topological structure analysis of the protein-protein interaction network in budding yeast. Nucleic Acids Res. 31(9), 2443–2450 (2003)

    Article  Google Scholar 

  4. Coscia, M., Rossetti, G., Giannotti, F., Pedreschi, D.: DEMON: a local-first discovery method for overlapping communities. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’12, pp. 615–623. Association for Computing Machinery, New York (2012). https://doi.org/10.1145/2339530.2339630

  5. Costa, G., Ortale, R.: Topic-aware joint analysis of overlapping communities and roles in social media. Int. J. Data Sci. Anal. 9(4), 415–429 (2020)

    Article  Google Scholar 

  6. Ding, Z., Zhang, X., Sun, D., Luo, B.: Overlapping community detection based on network decomposition. Sci. Rep. 6, 24115 (2016). https://doi.org/10.1038/srep24115

    Article  Google Scholar 

  7. Dunn, J.C.: A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters (1973)

  8. Fan, X., Cao, L., Da Xu, R.Y.: Dynamic infinite mixed-membership stochastic blockmodel. IEEE Trans. Neural Netw. Learn. Syst. 26(9), 2072–2085 (2015). https://doi.org/10.1109/TNNLS.2014.2369374

    Article  MathSciNet  Google Scholar 

  9. Flake, G.W., Lawrence, S., Giles, C.L.: Efficient identification of web communities. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’00, pp. 150–160. ACM, New York (2000). https://doi.org/10.1145/347090.347121

  10. Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3–5), 75–174 (2010). https://doi.org/10.1016/j.physrep.2009.11.002

    Article  MathSciNet  Google Scholar 

  11. Fortunato, S., Barthélemy, M.: Resolution limit in community detection. PNAS 104(1), 36–41 (2007). https://doi.org/10.1073/pnas.0605965104

    Article  Google Scholar 

  12. Fortunato, S., Hric, D.: Community detection in networks: a user guide. Phys. Rep. 659, 1–44 (2016). https://doi.org/10.1016/j.physrep.2016.09.002. Community detection in networks: A user guide

  13. Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. PNAS 99(12), 7821–7826 (2002). https://doi.org/10.1073/pnas.122653799

    Article  MathSciNet  MATH  Google Scholar 

  14. Gleiser, P.M., Danon, L.: Community structure in jazz. Advs. Complex Syst. 06(04), 565–573 (2003). https://doi.org/10.1142/S0219525903001067

    Article  Google Scholar 

  15. Gregory, S.: An algorithm to find overlapping community structure in networks. In: European Conference on Principles of Data Mining and Knowledge Discovery, pp. 91–102. Springer, Berlin (2007)

  16. Gregory, S.: A fast algorithm to find overlapping communities in networks. In: Daelemans, W., Goethals, B., Morik, K. (eds.) Machine Learning and Knowledge Discovery in Databases. Lecture Notes in Computer Science, pp. 408–423. Springer, Berlin (2008)

    Chapter  Google Scholar 

  17. Gregory, S.: Finding overlapping communities in networks by label propagation. New J. Phys. 12(10), 103018 (2010). https://doi.org/10.1088/1367-2630/12/10/103018

    Article  MATH  Google Scholar 

  18. Guimerá, R., Amaral, L.A.N.: Cartography of complex networks: modules and universal roles. J. Stat. Mech. 2005(P02001), P02001-1–P02001-13 (2005). https://doi.org/10.1088/1742-5468/2005/02/P02001

  19. Havemann, F., Heinz, M., Struck, A., Gläser, J.: Identification of overlapping communities and their hierarchy by locally calculating community-changing resolution levels. J. Stat. Mech. 2011(01), P01023 (2011). https://doi.org/10.1088/1742-5468/2011/01/P01023

    Article  Google Scholar 

  20. He, D., Jin, D., Chen, Z., Zhang, W.: Identification of hybrid node and link communities in complex networks. Sci. Rep. 5, 8638 (2015). https://doi.org/10.1038/srep08638

    Article  Google Scholar 

  21. Knuth, D.E.: The Standford Graph-Base: A Platform for Combinatorial Computing. Addition-Wesley, Reading (1993)

    Google Scholar 

  22. Kumar, P., Dohare, R.: A neighborhood proximity based algorithm for overlapping community structure detection in weighted networks. Front. Comput. Sci. (2019). https://doi.org/10.1007/s11704-019-8098-0

    Article  Google Scholar 

  23. Kumar, P., Dohare, R.: Formalising and detecting community structures in real world complex networks. J. Syst. Sci. Complex. 34, 180–205 (2021). https://doi.org/10.1007/s11424-020-9252-3

    Article  MathSciNet  MATH  Google Scholar 

  24. Lancichinetti, A., Fortunato, S.: Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Phys. Rev. E 80(1), 016118 (2009). https://doi.org/10.1103/PhysRevE.80.016118

    Article  Google Scholar 

  25. Lancichinetti, A., Fortunato, S., Kertész, J.: Detecting the overlapping and hierarchical community structure in complex networks. New J. Phys. 11(3), 033015 (2009). https://doi.org/10.1088/1367-2630/11/3/033015

    Article  Google Scholar 

  26. Lancichinetti, A., Radicchi, F., Ramasco, J.J., Fortunato, S.: Finding statistically significant communities in networks. PLoS ONE 6(4) (2011). https://doi.org/10.1371/journal.pone.0018961

  27. Lázár, A., Abel, D., Vicsek, T.: Modularity measure of networks with overlapping communities. EPL (Europhys. Lett.) 90(1), 18001 (2010). https://doi.org/10.1209/0295-5075/90/18001

    Article  Google Scholar 

  28. Lee, C., Reid, F., McDaid, A., Hurley, N.: Detecting highly overlapping community structure by greedy clique expansion. arXiv:1002.1827 [physics] (2010)

  29. Leskovec, J., Krevl, A.: SNAP Datasets: stanford large network dataset collection. http://snap.stanford.edu/data (2014)

  30. Lu, Z., Sun, X., Wen, Y., Cao, G., Porta, T.L.: Algorithms and applications for community detection in weighted networks. IEEE Trans. Parallel Distrib. Syst. 26(11), 2916–2926 (2015). https://doi.org/10.1109/TPDS.2014.2370031

    Article  Google Scholar 

  31. McDaid, A., Hurley, N.: Detecting highly overlapping communities with model-based overlapping seed expansion. In: 2010 International Conference on Advances in Social Networks Analysis and Mining, pp. 112–119 (2010). https://doi.org/10.1109/ASONAM.2010.77

  32. Newman, M.E.J.: Network datasets from Newman. http://www-personal.umich.edu/~mejn/netdata/

  33. Newman, M.E.J.: The structure of scientific collaboration networks. PNAS 98(2), 404–409 (2001). https://doi.org/10.1073/pnas.98.2.404

    Article  MathSciNet  MATH  Google Scholar 

  34. Newman, M.E.J.: Detecting community structure in networks. Eur. Phys. J. B 38(2), 321–330 (2004). https://doi.org/10.1140/epjb/e2004-00124-y

    Article  Google Scholar 

  35. Newman, M.E.J.: Fast algorithm for detecting community structure in networks. Phys. Rev. E 69(6), 066133 (2004). https://doi.org/10.1103/PhysRevE.69.066133

    Article  Google Scholar 

  36. Newman, M.E.J.: Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E 74(3), 036104 (2006). https://doi.org/10.1103/PhysRevE.74.036104

    Article  MathSciNet  Google Scholar 

  37. Newman, M.E.J.: Modularity and community structure in networks. PNAS 103(23), 8577–8582 (2006). https://doi.org/10.1073/pnas.0601602103

    Article  Google Scholar 

  38. Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004). https://doi.org/10.1103/PhysRevE.69.026113

    Article  Google Scholar 

  39. Nicosia, V., Mangioni, G., Carchiolo, V., Malgeri, M.: Extending the definition of modularity to directed graphs with overlapping communities. J. Stat. Mech. 2009(03), P03024 (2009). https://doi.org/10.1088/1742-5468/2009/03/P03024

    Article  Google Scholar 

  40. Palla, G., Derényi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure of complex networks in nature and society. Nature 435(7043), 814–818 (2005). https://doi.org/10.1038/nature03607

    Article  Google Scholar 

  41. Qi, Y., Ge, H.: Modularity and dynamics of cellular networks. PLoS Comput. Biol. 2(12), e174 (2006). https://doi.org/10.1371/journal.pcbi.0020174

    Article  Google Scholar 

  42. Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76(3), 036106 (2007). https://doi.org/10.1103/PhysRevE.76.036106

    Article  Google Scholar 

  43. Reichardt, J., Bornholdt, S.: Detecting fuzzy community structures in complex networks with a Potts model. Phys. Rev. Lett. 93(21), 218701 (2004)

    Article  Google Scholar 

  44. Rossetti, G., Milli, L., Cazabet, R.: CDLIB: a python library to extract, compare and evaluate communities from complex networks. Appl. Netw. Sci. 4(1), 52 (2019). https://doi.org/10.1007/s41109-019-0165-9

    Article  Google Scholar 

  45. Shen, H., Cheng, X., Cai, K., Hu, M.B.: Detect overlapping and hierarchical community structure in networks. Physica A 388(8), 1706–1712 (2009). https://doi.org/10.1016/j.physa.2008.12.021

    Article  Google Scholar 

  46. Sun, H., Jia, X., Huang, R., Wang, P., Wang, C., Huang, J.: Distance dynamics based overlapping semantic community detection for node-attributed networks. Comput. Intell. (2020)

  47. Sun, H., Liu, J., Huang, J., Wang, G., Jia, X., Song, Q.: LinkLPA: a link-based label propagation algorithm for overlapping community detection in networks. Comput. Intell. 33(2), 308–331 (2017). https://doi.org/10.1111/coin.12087

    Article  MathSciNet  Google Scholar 

  48. Tripathi, B., Parthasarathy, S., Sinha, H., Raman, K., Ravindran, B.: Adapting community detection algorithms for disease module identification in heterogeneous biological networks. Front. Genet. 10, 164 (2019)

    Article  Google Scholar 

  49. Wang, Y., Bu, Z., Yang, H., Li, H.J., Cao, J.: An effective and scalable overlapping community detection approach: integrating social identity model and game theory. Appl. Math. Comput. 390, 125601 (2021). https://doi.org/10.1016/j.amc.2020.125601

    Article  MathSciNet  MATH  Google Scholar 

  50. Watts, D.J., Strogatz, S.H.: Collective dynamics of small-world networks. Nature 393(6684), 440–442 (1998). https://doi.org/10.1038/30918

    Article  MATH  Google Scholar 

  51. Wei, Y., Singh, L., Gallagher, B., Buttler, D.: Overlapping target event and story line detection of online newspaper articles. In: 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 222–232 (2016). https://doi.org/10.1109/DSAA.2016.30

  52. White, S., Smyth, P.: A spectral clustering approach to finding communities in graphs. In: Proceedings of the 2005 SIAM International Conference on Data Mining, Proceedings, pp. 274–285. Society for Industrial and Applied Mathematics (2005)

  53. Xie, J., Kelley, S., Szymanski, B.K.: Overlapping community detection in networks: the state-of-the-art and comparative study. ACM Comput. Surv. 45(4), 43:1–43:35 (2013). https://doi.org/10.1145/2501654.2501657

  54. Xie, J., Szymanski, B.K., Liu, X.: SLPA: uncovering overlapping communities in social networks via a speaker–listener interaction dynamic process. pp. 344–349. IEEE (2011). https://doi.org/10.1109/ICDMW.2011.154

  55. Yang, J., Leskovec, J.: Overlapping community detection at scale: a nonnegative matrix factorization approach. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, WSDM’13, pp. 587–596. Association for Computing Machinery, New York (2013). https://doi.org/10.1145/2433396.2433471

  56. Yang, J., Leskovec, J.: Overlapping communities explain core–periphery organization of networks. Proc. IEEE 102(12), 1892–1902 (2014). https://doi.org/10.1109/JPROC.2014.2364018

    Article  Google Scholar 

  57. Yang, J., McAuley, J.J., Leskovec, J.: Community Detection in Networks with Node Attributes (2013)

  58. Zhang, F., Ma, A., Wang, Z., Ma, Q., Liu, B., Huang, L., Wang, Y.: A central edge selection based overlapping community detection algorithm for the detection of overlapping structures in protein–protein interaction networks. Molecules 23(10), 2633 (2018)

    Article  Google Scholar 

  59. Zhang, S., Wang, R.S., Zhang, X.S.: Identification of overlapping community structure in complex networks using fuzzy c-means clustering. Physica A 374(1), 483–490 (2007). https://doi.org/10.1016/j.physa.2006.07.023

    Article  Google Scholar 

  60. Zhang, Y., Yin, D., Wu, B., Long, F., Cui, Y., Bian, X.: Plinkshrink: a parallel overlapping community detection algorithm with link-graph for large networks. Soc. Netw. Anal. Min. 9(1), 66 (2019)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Here we shall illustrate the concepts of node and edge interaction coefficients, with the help of examples. Then we shall see how the seed expansion takes place using these coefficients.

1.1 Appendix A.1: Illustration of node and edge interaction coefficients

Consider the graph given below (Fig. 5).

Fig. 5
figure 5

Expansion of the seeds \(u_0\) and \(v_0\)

Take \(C_1 = \lbrace u_0 \rbrace \), and \(C_2 = \lbrace v_0 \rbrace \). Let us expand \(C_1\), and \(C_2\) using the node interaction coefficient \(\xi _{\text {node}}\), taking \(\xi _0 = 0.5\). Note that \(N_{C_1} = \lbrace u_1, u_2, u_3, u_4 \rbrace \) and \(N_{C_2} = \lbrace v_1, v_2, v_3, v_4 \rbrace \). For each \(1 \le i \le 4\), we have

$$\begin{aligned} \xi _\text {node}(u_i, C_1) = \frac{1}{5} < \xi _0. \end{aligned}$$

This means \(C_1\) would not expand. However, for each \(1 \le i \le 4\) we have

$$\begin{aligned} \xi _\text {node}(v_i, C_2) = \frac{1}{2} = \xi _0, \end{aligned}$$

which means \(C_2\) can expand to all its neighbours, and becomes \(C_2 = \lbrace v_0, v_1, v_2, v_3, v_4 \rbrace \). So, \(N_{C_2} = \lbrace v_5, v_6, \ldots , v_{12} \rbrace \). Now for each \(5 \le i \le 12\), we have

$$\begin{aligned} \xi _\text {node}(v_i, C_2) = \frac{1}{4} < \xi _0, \end{aligned}$$

which means \(C_2\) cannot be expanded further. The case we have considered is specific. But, it captures the two important types of seeds which are—highly clustered, and lowly clustered. The same strategy, such as the one based on node interaction coefficient, will not work for the expansion of both the kinds of seeds.

Fig. 6
figure 6

A network with community structure

Therefore, we have introduced the concept of edge interaction coefficient \(\xi _{{\text {edge}}}\). An edge \(e_{uv}\) essentially interacts with a subgraph C through its endpoints u and v. To arrive at a formula for \(\xi _\text {edge}(e_{uv}, C)\), we use the following assumption: If both u and v have more neighbours in C, then \(e_{uv}\) interacts with C highly. So, look at the quantity

$$\begin{aligned} \min \bigl \lbrace \left| N_u \cap V_C \right| , \left| N_v \cap V_C \right| \bigr \rbrace . \end{aligned}$$

To normalise it we can divide it by the minimum or the maximum of the degrees of u and v. Moreover, we wish \(\xi _\text {edge}(e_{uv}, C)\) to be highest when \(N_u \subseteq V_C \backslash \lbrace v \rbrace \), \(N_v \subseteq V_C \backslash \lbrace u \rbrace \), and \(d_u = d_v\). Keeping, all these requirements, we get Eq. (2). It is apparent that \(0 \le \xi _{\text {edge}} \le 1\). It can be seen that \(\xi _\text {edge}(e_{uv}, C) = 1\) iff \(d_u = d_v\) and \(\left| N_u \cap V_C\right| = \left| N_v \cap V_C \right| \). In the denominator of Eq. (2) we have taken \(\max \lbrace d_u, d_v \rbrace \) instead of \(\min \lbrace d_u, d_v \rbrace \). To see why let us look at the case given in the picture below.

figure d

The node u has 3 neighbours in C, and 6 neighbours outside C. So, \(\xi _\text {node}(u, C) = 1/3\) which is much smaller than the threshold \(\xi _0\). Consequently, u must not join C in any case. However, v has 5 neighbours in C and just 2 neighbours outside C. So, v would surely join C. Now let us compute the node interaction coefficient of u with \(C \cup \lbrace v \rbrace \). We have

$$\begin{aligned} \xi _\text {node}(u, C \cup \lbrace v \rbrace ) = \frac{4}{9} < \xi _0 \end{aligned}$$

Thus u would not join \(C \cup \lbrace v \rbrace \) too. Consider, now the case when \(\min \lbrace d_u, d_v \rbrace \) is the numerator in Eq. (2). Then

$$\begin{aligned} \xi _\text {edge}(e_{uv}, C) = \frac{1+3}{7} = \frac{4}{7} > \xi _0 \end{aligned}$$

In this case u joins C. Thus \(\min \lbrace d_u, d_v \rbrace \) is not an appropriate choice for the denominator in Eq. (2).

1.2 Appendix A.2: Illustration of seed expansion phase

Table 6 Before augmentation step, \(V_{\text {new}} = \varnothing \) for \(C = \lbrace 30, 31 \rbrace \) in Fig. 6
Table 7 Before augmentation step, \(V_{\text {new}} = \varnothing \) for \(C = \lbrace 1,2,3 \rbrace \) in Fig. 6

We illustrate the GET-NEW-NODES() procedure through examples. Note that we do not specify any criterion for selecting seeds, so any node may serve as a seed. Then it may well happen that certain seeds, especially the low degree nodes, stop expanding after growing to few nodes, or do not expand at all. Let us consider a few examples assuming that \(\xi _0 = 0.5\) and \(n_{\min } = 4\).

Example 1

Consider the graph given in Fig. 6.

Let \(C = \lbrace 30, 31 \rbrace \). Then \(N_C = \lbrace 25, 26, 29 \rbrace \). In order to compute \(V_{\text {new}}\), the steps followed before the augmentation step are listed in Table 6. No node of \(N_C\) is added to \(V_{\text {new}}\), leaving \(V_{\text {new}}\) empty. On the other hand, during the augmentation step, we find that \(\xi _\text {node}(25, C \cup N_C) = 1/2, \xi _\text {node}(26, C \cup N_C) = 3/4\) and \(\xi _\text {node}(29, C \cup N_C) = 3/5\), which makes \(V_{\text {new}} = \lbrace 25, 26, 29 \rbrace \).

Example 2

This time consider the subgraph \(C = \lbrace 1,2,3 \rbrace \) in the graph given in Fig. 6. Here \(N_C = \lbrace 4,5,14,22,23 \rbrace \). Then before the augmentation step \(V_{\text {new}}\) remains empty as shown in Table 7. However, in this case even the augmentation step does not help, as \(\xi _\text {node}(u, C \cup N_C) < \xi _0\) for all \(u \in N_C\), and so, \(V_{\text {new}} = \varnothing \). Thus the subgraph C is not expandable to a full community. Such groups of nodes are likely to join multiple communities and form the basis for pervasive overlaps.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kumar, P., Dohare, R. An interaction-based method for detecting overlapping community structure in real-world networks. Int J Data Sci Anal 14, 27–44 (2022). https://doi.org/10.1007/s41060-022-00314-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41060-022-00314-3

Keywords