Abstract
Community detection is a broad area of study in network science, in which its correct detection helps to get information about the groups and the relationships between their nodes. Community detection algorithms use the available snapshot of a network to detect its underlying communities. But, if this snapshot is incomplete, the algorithms may not recover the correct communities. This work proposes a set of link prediction heuristics using different network properties to estimate a more complete version of the network and improve the community detection algorithms. Each heuristic returns the most likely edges to be observed in a future version of the network. We performed experiments on real-world and artificial networks with different insertion sizes, comparing the results with two approaches: (i) without using edge insertion and (ii) using the EdgeBoost algorithm, based on node similarity measures. The experiments show that some of our proposed heuristics improve the results of traditional community detection algorithms. This improvement is even more prominent for networks with poorly defined structures.
Similar content being viewed by others
Notes
Source code available at https://github.com/EricCamacho/LINE.
Library with network analysis tools and that can be used in Python, R, and C/C++ languages (Csardi and Nepusz 2006).
References
Adamic LA, Adar E (2003) Friends and neighbors on the web. Social Netw 25(3):211–230
Al Hasan M, Zaki MJ (2011) A survey of link prediction in social networks. In: Social network data analytics. Springer, pp 243–275
Ana L, Jain AK (2003) Robust data clustering. In: 2003 IEEE computer society conference on computer vision and pattern recognition, 2003. Proceedings, IEEE, vol 2, pp II–128
Atay Y, Koc I, Babaoglu I, Kodaz H (2017) Community detection from biological and social networks: a comparative analysis of metaheuristic algorithms. Appl Soft Comput 50:194–211
Ayoub J, Lotfi D, El Marraki M, Hammouch A (2020) Accurate link prediction method based on path length between a pair of unlinked nodes and their degree. Social Netw Anal Min 10(1):1–13
Bilenko M, Mooney R, Cohen W, Ravikumar P, Fienberg S (2003) Adaptive name matching in information integration. IEEE Intell Syst 18(5):16–23
Biswas A, Biswas B (2017) Community-based link prediction. Multimed Tools Appl 76(18):18619–18639
Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008(10):P10008
Burgess M, Adar E, Cafarella M (2016) Link-prediction enhanced consensus clustering for complex networks. PLoS ONE 11(5):e0153384
Cheng HM, Ning YZ, Yin Z, Yan C, Liu X, Zhang ZY (2018) Community detection in complex networks using link prediction. Mod Phys Lett B 32(01):1850004
Choumane A, Awada A, Harkous A (2020) Core expansion: a new community detection algorithm based on neighborhood overlap. Social Netw Anal Min 10:1–11
Chunaev P (2020) Community detection in node-attributed social networks: a survey. Comput Sci Rev 37:100286
Clauset A, Newman ME, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70(6):066111
Csardi G, Nepusz T (2006) The igraph software package for complex network research. InterJ Complex Syst 1695:1–9
Fortunato S, Barthelemy M (2007) Resolution limit in community detection. Proc Natl Acad Sci 104(1):36–41
Girvan M, Newman ME (2002) Community structure in social and biological networks. Proc Natl Acad Sci 99(12):7821–7826
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
Interdonato R, Tagarelli A, Ienco D, Sallaberry A, Poncelet P (2017) Local community detection in multilayer networks. Data Min Knowl Discov 31(5):1444–1479
Javed MA, Younis MS, Latif S, Qadir J, Baig A (2018) Community detection in networks: a multidisciplinary review. J Netw Comput Appl 108:87–111
Jonsson PF, Cavanna T, Zicha D, Bates PA (2006) Cluster analysis of networks generated through homology: automatic identification of important protein communities involved in cancer metastasis. BMC Bioinform 7(1):2
Kanawati R (2014) Yasca: an ensemble-based approach for community detection in complex networks. In: International computing and combinatorics conference, Springer, pp 657–666
Lancichinetti A, Fortunato S, Radicchi F (2008) Benchmark graphs for testing community detection algorithms. Phys Rev E 78(4):046110
Leicht EA, Holme P, Newman ME (2006) Vertex similarity in networks. Phys Rev E 73(2):026120
Li W, Huang C, Wang M, Chen X (2017) Stepping community detection algorithm based on label propagation and similarity. Phys A Stat Mech Appl 472:145–155
Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inform Sci Technol 58(7):1019–1031
Lusseau D, Schneider K, Boisseau OJ, Haase P, Slooten E, Dawson SM (2003) The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations. Behav Ecol Sociobiol 54(4):396–405
Makris C, Pispirigos G, Rizos IO (2020) A distributed bagging ensemble methodology for community prediction in social networks. Information 11(4):199
Malhotra D, Goyal R (2020) Link prediction in complex networks using information-theoretic measures. J Complex Netw 8(4):cnaa035
McCune B, Grace JB, Urban DL (2002) Analysis of ecological communities, vol 28. MjM software design Gleneden Beach, OR
Nassar H, Benson AR, Gleich DF (2020) Neighborhood and pagerank methods for pairwise link prediction. Social Netw Anal Min 10(1):1–13
Newman ME (2006) Finding community structure in networks using the eigenvectors of matrices. Phys Rev E 74(3):036104
Nicolini C, Bordier C, Bifone A (2017) Community detection in weighted brain connectivity networks beyond the resolution limit. Neuroimage 146:28–39
Ostilli M, Yoneki E, Leung IX, Mendes JF, Lió P, Crowcroft J (2010) Ising model of rumour spreading in interacting communities. Tech. rep., University of Cambridge, Computer Laboratory
Pachev B, Webb B (2018) Fast link prediction for large networks using spectral embedding. J Complex Netw 6(1):79–94
Pons P, Latapy M (2005) Computing communities in large networks using random walks. In: International symposium on computer and information sciences, Springer, pp 284–293
Poulin V, Théberge F (2019) Ensemble clustering for graphs: comparisons and applications. Appl Netw Sci 4(1):1–13
Radicchi F, Castellano C, Cecconi F, Loreto V, Parisi D (2004) Defining and identifying communities in networks. Proc Natl Acad Sci 101(9):2658–2663
Raghavan UN, Albert R, Kumara S (2007) Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E 76(3):036106
Reichardt J, Bornholdt S (2006) Statistical mechanics of community detection. Phys Rev E 74(1):016110
Rossetti G, Cazabet R (2018) Community discovery in dynamic networks: a survey. ACM Comput Surv 51(2):1–37
Rosvall M, Bergstrom CT (2007) An information-theoretic framework for resolving community structure in complex networks. Proc Natl Acad Sci 104(18):7327–7331
Rosvall M, Bergstrom CT (2008) Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci 105(4):1118–1123
Salton G (1989) Automatic text processing: the transformation, analysis, and retrieval of, vol 169. Addison-Wesley, Reading
Sorensen TA (1948) A method of establishing groups of equal amplitude in plant sociology based on similarity of species content and its application to analyses of the vegetation on Danish commons. Biol Skar 5:1–34
Stegehuis C, van der Hofstad R, van Leeuwaarden JS (2016) Epidemic spreading on complex networks with community structures. Sci Rep 6:29748
Su Y, Wang B, Cheng F, Zhang L, Zhang X, Pan L (2017) An algorithm based on positive and negative links for community detection in signed networks. Sci Rep 7(1):1–12
Tagarelli A, Amelio A, Gullo F (2017) Ensemble-based community detection in multilayer networks. Data Min Knowl Discov 31(5):1506–1543
Taguchi H, Murata T, Liu X (2020) Bimlpa: community detection in bipartite networks by multi-label propagation. In: International conference on network science, Springer, pp 17–31
Valverde-Rebaza JC, de Andrade Lopes A (2012) Link prediction in complex networks based on cluster information. In: Brazilian symposium on artificial intelligence, Springer, pp 92–101
Yan B, Gregory S (2012) Detecting community structure in networks using edge prediction methods. J Stat Mech Theory Exp 2012(09):P09008
Yan B, Gregory S (2012b) Finding missing edges in networks based on their community structure. Phys Rev E 85(5):056112
Yen TC, Larremore DB (2020) Community detection in bipartite networks with stochastic block models. Phys Rev E 102(3):032309
Zachary WW (1977) An information flow model for conflict and fission in small groups. J Anthropol Res 33(4):452–473
Zare H, Shooshtari P, Gupta A, Brinkman RR (2010) Data reduction for spectral clustering to analyze high throughput flow cytometry data. BMC Bioinform 11(1):403
Zhang X, Xia Z, Xu S, Wang J (2014) Ensemble method: community detection based on game theory. Int J Mod Phys B 28(30):1450211
Zhao X, Liang J, Wang J (2021) A community detection algorithm based on graph compression for large-scale social networks. Inf Sci 551:358–372
Funding
This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brasil (CAPES) – Finance Code 001.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
de Oliveira, É.T.C., de França, F.O. Enriching networks with edge insertion to improve community detection. Soc. Netw. Anal. Min. 11, 89 (2021). https://doi.org/10.1007/s13278-021-00803-6
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-021-00803-6