Does ‘Community Detection’ Find Real Emerging Meso-structures? A Statistical Test Based on Complex Networks Methods

Righi, Riccardo

doi:10.1007/978-3-031-57430-6_27

Riccardo Righi ORCID: orcid.org/0000-0002-7472-4293⁸

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1977))

Included in the following conference series:

Italian Workshop on Artificial Life and Evolutionary Computation

156 Accesses

Abstract

Multiple systems that can be represented in network terms usually present areas where the nodes are densely connected among themselves. Community detection analysis is precisely pointing at revealing these areas, thus providing a partition of the network under investigation. Usually, the results of such analysis are discussed in descriptive terms, either with the aid of some statistics, or by listing and discussing the nodes that belong to the different communities. Thus, a statistical evaluation of the detected community structure still missing in literature. In this work, we design a series of tests to assess if the community detection results are compatible with random processes of tie formation, or if what emerges from such analysis cannot be ascribable to randomness. As community detection naturally points at uncovering the presence of meso-structures within a system, what needed is a statistical tool to test if these are just areas that have randomly formed or if, behind what is detected, there is a real emerging phenomenon. In order to provide an example, we run the tests on the network of UK Faculty and discuss the results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Constructing null networks for community detection in complex networks

Article 04 July 2018

Formalising and Detecting Community Structures in Real World Complex Networks

Article 07 November 2020

The many facets of community detection in complex networks

Article Open access 15 February 2017

Notes

1.
Number of ties detected among the nodes belonging to the community, divided by the number of all possible ties that can exist among those nodes.
2.
If the network is unweighted, it just means that $\omega _{i,j} = \{1\}$, i.e., in absence of weights any existing connection is considered to have value 1.
3.
There are several contributions explaining, discussing and comparing the performances of community detection algorithms, e.g., [3], but our work does not directly point at proposing a new method to evaluate them.
4.
$ E^W \cup E^B = E \,\,\,\,\ $ and $ \,\,\,\,\ E^W \cap E^B = \emptyset $.
5.
By definition, $0 \le \eta ^{W}(G) \le 1$.
6.
Community detection algorithms identify areas of the networks, i.e., groups of nodes, within any of which the connections are many, so making that part of the network dense. On the opposite, connections between nodes of different communities are scarcer.
7.
If the network is weighted with connections’ weight values $\in \mathbb {N}^+$, it is only necessary to repeat any connection for a number of times equal to its weight, and to assign to all resulting connection a weight equal to 1. The network will be no longer a simple network, i.e., one and only one connection between any couple of node, but this is irrelevant. We can randomize the peers involved in the connections (included those repeated because of a weight larger than 1), and still the network will result having the same strength sequence, i.e., any node will have the same strength as in the original network. Clearly, since connections are repeated based on their weights and then randomized, this does not make it possible to preserve the same degree sequence as in the original network.
If the original network’s weights $\notin \mathbb {N}^+$, then connections can be directly randomized (i.e., no repetition of them) and their weight can just be sampled (or probabilistically inputed) based on the weights of the original network. The resulting random network will preserve the degree sequence and the sum of connections’ weight, but not the strength sequence.
Finally, in case connections’ weight $\notin \mathbb {N}^+$ and the network is bipartite, at least the strength sequence can be preserve for one type of node, but not for the other: connections’ weights can be randomly sampled from those displayed by the node in the original network. However, when this is done for one type of node, then it cannot be done also for the other type. Nonetheless, in this case the degree sequence is preserved for all nodes.
8.
A node can be part of a network even if not displaying any connections in it.
9.
The test is based on the computation of the empirical cumulative distribution function (eCDF) of the values $\eta ^{W}({\boldsymbol{G}}^*)$, and it calculates the percentile in which $\eta ^{W}(G)$ falls with respect of the mentioned eCDF.
10.
It is important to highlight that the described test allows us also to assess if $\eta ^{W}(G)$, in case it is statistically significant, has a value which is larger (or smaller) than expected. This would reveal the existence of a non-random community structure characterized by within-community connections that are enhanced (or inhibited), respectively.
11.
It is not scope of this work to discuss how to implement a cluster analysis and which kind of variables to consider.
12.
$K \in \mathbb {N}^+ \wedge K < N$.
13.
For instance, if $K=3$, then G is going to be split in all the possible sub-graphs based on nodes’ categorization, i.e., $G_{k_1, k_1}$, $G_{k_2, k_2}$, $G_{k_3, k_3}$,$G_{k_1, k_2}$, $G_{k_1, k_3}$, and $G_{k_2, k_3}$.
14.
It is important to observe that with the first test proposed, i.e., $\eta ^W(G)$ vs. distribution of $\eta ^W({\boldsymbol{G}}^*)$, this element could not be excluded as a constitutive element for the communities.
15.
What needed is to repeat any connection for a number of times equal to its weight, and then to assign to all resulting connections a weight equal to 1, and then connections can be randomized according to the ‘configuration model’. By doing so, each node maintains its strength, even if its degrees may vary.
16.
In other terms, G is split in the 16 subnetworks determined by the permutations (with repetitions) of the four Schools. These are $G_{1,1}$, $G_{1,2}$, $G_{1,3}$, $G_{1,4}$, $G_{2,1}$, $G_{2,2}$, $G_{2,3}$, $G_{2,4}$, $G_{3,1}$, $G_{3,2}$, $G_{3,3}$, $G_{3,4}$, $G_{4,1}$, $G_{4,2}$, $G_{4,3}$, and $G_{4,4}$, where the two subscripts indicate the School of the member from whom the friendship starts, and the School of the member that is considered as a friend, respectively. For the creation of a single $G^\dag $, then each of these subnetworks is randomized and then all of them are put back together so to have a null system in which any node preserves the same strength as in G, as well as the same strength by School as in G.

References

Bedi, P., Sharma, C.: Community detection in social networks. Wiley Interdisc. Rev.: Data Min. Knowl. Discov. 6(3), 115–135 (2016)
Google Scholar
Erdős, P., Rényi, A., et al.: On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci. 5(1), 17–60 (1960)
MathSciNet Google Scholar
Fortunato, S., Hric, D.: Community detection in networks: a user guide. Phys. Rep. 659, 1–44 (2016)
Article MathSciNet Google Scholar
Girvan, M., Newman, M.E.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. 99(12), 7821–7826 (2002)
Article MathSciNet Google Scholar
Karson, M.: Handbook of Methods of Applied Statistics (1968)
Google Scholar
Kolmogorov, A.N.: Sulla determinazione empirica di una legge didistribuzione. Giorn. Dell’inst. Ital. Degli Att. 4, 89–91 (1933)
Google Scholar
Molloy, M., Reed, B.: A critical point for random graphs with a given degree sequence. Random Struct. Algorithms 6(2–3), 161–180 (1995)
Article MathSciNet Google Scholar
Newman, M.: 369 the configuration model. In: Networks. Oxford University Press (2018). https://doi.org/10.1093/oso/9780198805090.003.0012
Newman, M.: Random graphs. In: Networks. Oxford University Press (2018). https://doi.org/10.1093/oso/9780198805090.003.0011
Newman, M.E.: Mixing patterns in networks. Phys. Rev. E 67(2), 026126 (2003)
Article MathSciNet Google Scholar
Porter, M.A., Onnela, J.P., Mucha, P.J., et al.: Communities in networks. Not. AMS 56(9), 1082–1097 (2009)
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Centro Analisi Politiche Pubbliche (CAPP), Università degli Studi di Modena e Reggio Emilia, Modena, Italy
Riccardo Righi

Authors

Riccardo Righi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Riccardo Righi .

Editor information

Editors and Affiliations

University of Modena and Reggio Emilia, Modena, Italy
Marco Villani
University of Parma, Parma, Italy
Stefano Cagnoni
University of Modena and Reggio Emilia, Modena, Italy
Roberto Serra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Righi, R. (2024). Does ‘Community Detection’ Find Real Emerging Meso-structures? A Statistical Test Based on Complex Networks Methods. In: Villani, M., Cagnoni, S., Serra, R. (eds) Artificial Life and Evolutionary Computation. WIVACE 2023. Communications in Computer and Information Science, vol 1977. Springer, Cham. https://doi.org/10.1007/978-3-031-57430-6_27

Download citation

DOI: https://doi.org/10.1007/978-3-031-57430-6_27
Published: 30 March 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-57429-0
Online ISBN: 978-3-031-57430-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Does ‘Community Detection’ Find Real Emerging Meso-structures? A Statistical Test Based on Complex Networks Methods