Skip to main content
Log in

Computing exact P-values for community detection

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Community detection is one of the most important issues in modern network science. Although numerous community detection algorithms have been proposed during the past decades, how to assess the statistical significance of one single community analytically and exactly still remains an open problem. In this paper, we present an analytical solution to calculate the exact p-value of a single community with the Erdös–Rényi model. Meanwhile, we propose a local search method for finding statistically significant communities based on the p-value minimization. Experimental results on both real networks and simulated networks demonstrate that our method is able to effectively detect true communities from different types of networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  • Adamic LA, Glance N (2005) The political blogosphere and the 2004 US election: divided they blog. In: Proceedings of the 3rd international workshop on link discovery, pp 36–43

  • Aldecoa R, Marín I (2011) Deciphering network community structure by surprise. PLoS ONE 6(9):e24195

    Google Scholar 

  • Bickel PJ, Sarkar P (2016) Hypothesis testing for automated community detection in networks. J R Stat Soc Ser B (Stat Methodol) 78(1):253–273

    MathSciNet  MATH  Google Scholar 

  • Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008(10):P10008

    MATH  Google Scholar 

  • Carissimo A, Cutillo L, De Feis I (2018) Validation of community robustness. Comput Stat Data Anal 120:1–24

    MathSciNet  MATH  Google Scholar 

  • Chakraborty T, Srinivasan S, Ganguly N, Mukherjee A, Bhowmick S (2014) On the permanence of vertices in network communities. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1396–1405

  • Chakraborty T, Dalmia A, Mukherjee A, Ganguly N (2017) Metrics for community analysis: a survey. ACM Comput Surv 50(4):54

    Google Scholar 

  • Chang YT, Pantazis D, Leahy RM (2012) Assessing statistical significance when partitioning large-scale brain networks. In: 2012 9th IEEE international symposium on biomedical imaging (ISBI), pp 1759–1762

  • Chen K, Lei J (2018) Network cross-validation for determining the number of communities in network data. J Am Stat Assoc 113(521):241–251

    MathSciNet  MATH  Google Scholar 

  • Condon A, Karp RM (2001) Algorithms for graph partitioning on the planted partition model. Random Struct Algorithms 18(2):116–140

    MathSciNet  MATH  Google Scholar 

  • Cutillo L, Signorelli M (2017) An inferential procedure for community structure validation in networks. arXiv:1710.06611

  • Durrett R (2007) Random graph dynamics. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Fortunato S (2010) Community detection in graphs. Phys Rep 486(3):75–174

    MathSciNet  Google Scholar 

  • Fortunato S, Hric D (2016) Community detection in networks: a user guide. Phys Rep 659:1–44

    MathSciNet  Google Scholar 

  • Gao C, Lafferty J (2017a) Testing for global network structure using small subgraph statistics. arXiv:1710.00862

  • Gao C, Lafferty J (2017b) Testing network structure using relations between small subgraph probabilities. arXiv:1704.06742

  • Ghosh S, Banerjee A, Sharma N, Agarwal S, Ganguly N, Bhattacharya S, Mukherjee A (2011) Statistical analysis of the Indian railway network: a complex network approach. Acta Phys Polonica B Proc Suppl 4(2):123–138

    Google Scholar 

  • Girvan M, Newman ME (2002) Community structure in social and biological networks. Proc Natl Acad Sci USA 99(12):7821–7826

    MathSciNet  MATH  Google Scholar 

  • He Z, Liang H, Chen Z, Zhao C (2018) Detecting statistically significant communities. arXiv:1806.05602

  • Hu Y, Nie Y, Yang H, Cheng J, Fan Y, Di Z (2010) Measuring the significance of community structure in complex networks. Phys Rev E 82(6):066106

    Google Scholar 

  • Karrer B, Levina E, Newman ME (2008) Robustness of community structure in networks. Phys Rev E 77(4):046119

    Google Scholar 

  • Kloumann IM, Kleinberg JM (2014) Community membership identification from small seed sets. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 1366–1375

  • Kojaku S, Masuda N (2018) A generalised significance test for individual communities in networks. Sci Rep 8(1):7351

    Google Scholar 

  • Koyutürk M, Szpankowski W, Grama A (2007) Assessing significance of connectivity and conservation in protein interaction networks. J Comput Biol 14(6):747–764

    MathSciNet  MATH  Google Scholar 

  • Krebs V (2013) Social network analysis software & services for organizations, communities, and their consultants. http://www.orgnet.com

  • Lancichinetti A, Fortunato S, Radicchi F (2008) Benchmark graphs for testing community detection algorithms. Phys Rev E 78(4):046110

    Google Scholar 

  • Lancichinetti A, Fortunato S, Kertész J (2009) Detecting the overlapping and hierarchical community structure in complex networks. New J Phys 11(3):033015

    Google Scholar 

  • Lancichinetti A, Radicchi F, Ramasco JJ (2010) Statistical significance of communities in networks. Phys Rev E 81(4):046110

    Google Scholar 

  • Lancichinetti A, Radicchi F, Ramasco JJ, Fortunato S (2011) Finding statistically significant communities in networks. PLoS ONE 6(4):e18961

    Google Scholar 

  • Li Y, Shang Y, Yang Y (2017) Clustering coefficients of large networks. Inf Sci 382:350–358

    MathSciNet  MATH  Google Scholar 

  • Li Y, He K, Kloster K, Bindel D, Hopcroft J (2018) Local spectral clustering for overlapping community detection. ACM Trans Knowl Discov Data (TKDD) 12(2):17

    Google Scholar 

  • Liu X, Cheng HM, Zhang ZY (2019) Evaluation of community detection methods. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2019.2911943

    Article  Google Scholar 

  • Miyauchi A, Kawase Y (2015) What is a network community? A novel quality function and detection algorithms. In: Proceedings of the 24th ACM international on conference on information and knowledge management, pp 1471–1480

  • Miyauchi A, Kawase Y (2016) Z-score-based modularity for community detection in networks. PLoS ONE 11(1):e0147805

    Google Scholar 

  • Newman M (2018) Networks, 2nd edn. Oxford University Press, Oxford

    MATH  Google Scholar 

  • Newman ME, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113

    Google Scholar 

  • Palla G, Derényi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435(7043):814–818

    Google Scholar 

  • Palowitch J (2019) Computing the statistical significance of optimized communities in networks. Sci Rep 9(1):18444

    Google Scholar 

  • Palowitch J, Bhamidi S, Nobel AB (2018) Significance-based community detection in weighted networks. J Mach Learn Res 18(188):1–48

    MathSciNet  MATH  Google Scholar 

  • Peel L, Larremore DB, Clauset A (2017) The ground truth about metadata and community detection in networks. Sci Adv 3(5):e1602548

    Google Scholar 

  • Perry MB, Michaelson GV, Ballard MA (2013) On the statistical detection of clusters in undirected networks. Comput Stat Data Anal 68:170–189

    MathSciNet  MATH  Google Scholar 

  • Radicchi F, Castellano C, Cecconi F, Loreto V, Parisi D (2004) Defining and identifying communities in networks. Proc Natl Acad Sci USA 101(9):2658–2663

    Google Scholar 

  • Reichardt J, Bornholdt S (2006) When are networks truly modular? Physica D 224(1–2):20–26

    MathSciNet  MATH  Google Scholar 

  • Saldana DF, Yu Y, Feng Y (2017) How many communities are there? J Comput Graph Stat 26(1):171–181

    MathSciNet  Google Scholar 

  • Sales-Pardo M, Guimera R, Moreira AA, Amaral LAN (2007) Extracting the hierarchical organization of complex systems. Proc Natl Acad Sci USA 104(39):15224–15229

    Google Scholar 

  • Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905

    Google Scholar 

  • Spirin V, Mirny LA (2003) Protein complexes and functional modules in molecular networks. Proc Natl Acad Sci USA 100(21):12123–12128

    Google Scholar 

  • Tokuda T (2018) Statistical test for detecting community structure in real-valued edge-weighted graphs. PLoS ONE 13(3):e0194079

    Google Scholar 

  • Traag VA, Krings G, Van Dooren P (2013) Significant scales in community structure. Sci Rep 3(1):2930

    Google Scholar 

  • Wang B, Phillips JM, Schreiber R, Wilkinson D, Mishra N, Tarjan R (2008) Spatial scan statistics for graph clustering. In: Proceedings of the 2008 SIAM international conference on data mining, pp 727–738

  • Whang JJ, Gleich DF, Dhillon IS (2013) Overlapping community detection using seed set expansion. In: Proceedings of the 22nd ACM international conference on information and knowledge management, ACM, pp 2099–2108

  • Whang JJ, Gleich DF, Dhillon IS (2016) Overlapping community detection using neighborhood-inflated seed expansion. IEEE Trans Knowl Data Eng 28(5):1272–1284

    Google Scholar 

  • Wilson JD, Wang S, Mucha PJ, Bhamidi S, Nobel AB et al (2014) A testing based extraction algorithm for identifying significant communities in networks. Ann Appl Stat 8(3):1853–1891

    MathSciNet  MATH  Google Scholar 

  • Yang J, Leskovec J (2015) Defining and evaluating network communities based on ground-truth. Knowl Inf Syst 42(1):181–213

    Google Scholar 

  • Zachary WW (1977) An information flow model for conflict and fission in small groups. J Anthropol Res 33(4):452–473

    Google Scholar 

  • Zhang P, Moore C (2014) Scalable detection of statistically significant communities and hierarchies, using message passing for modularity. Proc Natl Acad Sci USA 111(51):18144–18149

    Google Scholar 

Download references

Acknowledgements

This work was partially supported by the Natural Science Foundation of China under Grant Nos. 61972066 and 61572094.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zengyou He.

Additional information

Responsible editor: Evangelos Papalexakis.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

He, Z., Liang, H., Chen, Z. et al. Computing exact P-values for community detection. Data Min Knowl Disc 34, 833–869 (2020). https://doi.org/10.1007/s10618-020-00681-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-020-00681-0

Keywords

Navigation