Skip to main content

Does ‘Community Detection’ Find Real Emerging Meso-structures? A Statistical Test Based on Complex Networks Methods

  • Conference paper
  • First Online:
Artificial Life and Evolutionary Computation (WIVACE 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1977))

Included in the following conference series:

  • 38 Accesses

Abstract

Multiple systems that can be represented in network terms usually present areas where the nodes are densely connected among themselves. Community detection analysis is precisely pointing at revealing these areas, thus providing a partition of the network under investigation. Usually, the results of such analysis are discussed in descriptive terms, either with the aid of some statistics, or by listing and discussing the nodes that belong to the different communities. Thus, a statistical evaluation of the detected community structure still missing in literature. In this work, we design a series of tests to assess if the community detection results are compatible with random processes of tie formation, or if what emerges from such analysis cannot be ascribable to randomness. As community detection naturally points at uncovering the presence of meso-structures within a system, what needed is a statistical tool to test if these are just areas that have randomly formed or if, behind what is detected, there is a real emerging phenomenon. In order to provide an example, we run the tests on the network of UK Faculty and discuss the results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Number of ties detected among the nodes belonging to the community, divided by the number of all possible ties that can exist among those nodes.

  2. 2.

    If the network is unweighted, it just means that \(\omega _{i,j} = \{1\}\), i.e., in absence of weights any existing connection is considered to have value 1.

  3. 3.

    There are several contributions explaining, discussing and comparing the performances of community detection algorithms, e.g., [3], but our work does not directly point at proposing a new method to evaluate them.

  4. 4.

    \( E^W \cup E^B = E \,\,\,\,\ \) and \( \,\,\,\,\ E^W \cap E^B = \emptyset \).

  5. 5.

    By definition, \(0 \le \eta ^{W}(G) \le 1\).

  6. 6.

    Community detection algorithms identify areas of the networks, i.e., groups of nodes, within any of which the connections are many, so making that part of the network dense. On the opposite, connections between nodes of different communities are scarcer.

  7. 7.

    If the network is weighted with connections’ weight values \(\in \mathbb {N}^+\), it is only necessary to repeat any connection for a number of times equal to its weight, and to assign to all resulting connection a weight equal to 1. The network will be no longer a simple network, i.e., one and only one connection between any couple of node, but this is irrelevant. We can randomize the peers involved in the connections (included those repeated because of a weight larger than 1), and still the network will result having the same strength sequence, i.e., any node will have the same strength as in the original network. Clearly, since connections are repeated based on their weights and then randomized, this does not make it possible to preserve the same degree sequence as in the original network.

    If the original network’s weights \(\notin \mathbb {N}^+\), then connections can be directly randomized (i.e., no repetition of them) and their weight can just be sampled (or probabilistically inputed) based on the weights of the original network. The resulting random network will preserve the degree sequence and the sum of connections’ weight, but not the strength sequence.

    Finally, in case connections’ weight \(\notin \mathbb {N}^+\) and the network is bipartite, at least the strength sequence can be preserve for one type of node, but not for the other: connections’ weights can be randomly sampled from those displayed by the node in the original network. However, when this is done for one type of node, then it cannot be done also for the other type. Nonetheless, in this case the degree sequence is preserved for all nodes.

  8. 8.

    A node can be part of a network even if not displaying any connections in it.

  9. 9.

    The test is based on the computation of the empirical cumulative distribution function (eCDF) of the values \(\eta ^{W}({\boldsymbol{G}}^*)\), and it calculates the percentile in which \(\eta ^{W}(G)\) falls with respect of the mentioned eCDF.

  10. 10.

    It is important to highlight that the described test allows us also to assess if \(\eta ^{W}(G)\), in case it is statistically significant, has a value which is larger (or smaller) than expected. This would reveal the existence of a non-random community structure characterized by within-community connections that are enhanced (or inhibited), respectively.

  11. 11.

    It is not scope of this work to discuss how to implement a cluster analysis and which kind of variables to consider.

  12. 12.

    \(K \in \mathbb {N}^+ \wedge K < N\).

  13. 13.

    For instance, if \(K=3\), then G is going to be split in all the possible sub-graphs based on nodes’ categorization, i.e., \(G_{k_1, k_1}\), \(G_{k_2, k_2}\), \(G_{k_3, k_3}\),\(G_{k_1, k_2}\), \(G_{k_1, k_3}\), and \(G_{k_2, k_3}\).

  14. 14.

    It is important to observe that with the first test proposed, i.e., \(\eta ^W(G)\) vs. distribution of \(\eta ^W({\boldsymbol{G}}^*)\), this element could not be excluded as a constitutive element for the communities.

  15. 15.

    What needed is to repeat any connection for a number of times equal to its weight, and then to assign to all resulting connections a weight equal to 1, and then connections can be randomized according to the ‘configuration model’. By doing so, each node maintains its strength, even if its degrees may vary.

  16. 16.

    In other terms, G is split in the 16 subnetworks determined by the permutations (with repetitions) of the four Schools. These are \(G_{1,1}\), \(G_{1,2}\), \(G_{1,3}\), \(G_{1,4}\), \(G_{2,1}\), \(G_{2,2}\), \(G_{2,3}\), \(G_{2,4}\), \(G_{3,1}\), \(G_{3,2}\), \(G_{3,3}\), \(G_{3,4}\), \(G_{4,1}\), \(G_{4,2}\), \(G_{4,3}\), and \(G_{4,4}\), where the two subscripts indicate the School of the member from whom the friendship starts, and the School of the member that is considered as a friend, respectively. For the creation of a single \(G^\dag \), then each of these subnetworks is randomized and then all of them are put back together so to have a null system in which any node preserves the same strength as in G, as well as the same strength by School as in G.

References

  1. Bedi, P., Sharma, C.: Community detection in social networks. Wiley Interdisc. Rev.: Data Min. Knowl. Discov. 6(3), 115–135 (2016)

    Google Scholar 

  2. Erdős, P., Rényi, A., et al.: On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci. 5(1), 17–60 (1960)

    MathSciNet  Google Scholar 

  3. Fortunato, S., Hric, D.: Community detection in networks: a user guide. Phys. Rep. 659, 1–44 (2016)

    Article  MathSciNet  Google Scholar 

  4. Girvan, M., Newman, M.E.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. 99(12), 7821–7826 (2002)

    Article  MathSciNet  Google Scholar 

  5. Karson, M.: Handbook of Methods of Applied Statistics (1968)

    Google Scholar 

  6. Kolmogorov, A.N.: Sulla determinazione empirica di una legge didistribuzione. Giorn. Dell’inst. Ital. Degli Att. 4, 89–91 (1933)

    Google Scholar 

  7. Molloy, M., Reed, B.: A critical point for random graphs with a given degree sequence. Random Struct. Algorithms 6(2–3), 161–180 (1995)

    Article  MathSciNet  Google Scholar 

  8. Newman, M.: 369 the configuration model. In: Networks. Oxford University Press (2018). https://doi.org/10.1093/oso/9780198805090.003.0012

  9. Newman, M.: Random graphs. In: Networks. Oxford University Press (2018). https://doi.org/10.1093/oso/9780198805090.003.0011

  10. Newman, M.E.: Mixing patterns in networks. Phys. Rev. E 67(2), 026126 (2003)

    Article  MathSciNet  Google Scholar 

  11. Porter, M.A., Onnela, J.P., Mucha, P.J., et al.: Communities in networks. Not. AMS 56(9), 1082–1097 (2009)

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Riccardo Righi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Righi, R. (2024). Does ‘Community Detection’ Find Real Emerging Meso-structures? A Statistical Test Based on Complex Networks Methods. In: Villani, M., Cagnoni, S., Serra, R. (eds) Artificial Life and Evolutionary Computation. WIVACE 2023. Communications in Computer and Information Science, vol 1977. Springer, Cham. https://doi.org/10.1007/978-3-031-57430-6_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-57430-6_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-57429-0

  • Online ISBN: 978-3-031-57430-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics