Summable and nonsummable data-driven models for community detection in feature-rich networks

Shalileh, Soroosh; Mirkin, Boris

doi:10.1007/s13278-021-00774-8

Summable and nonsummable data-driven models for community detection in feature-rich networks

Original Article
Published: 28 July 2021

Volume 11, article number 67, (2021)
Cite this article

Social Network Analysis and Mining Aims and scope Submit manuscript

573 Accesses
7 Citations
2 Altmetric
Explore all metrics

Abstract

A feature-rich network is a network whose nodes are characterized by categorical or quantitative features. We propose a data-driven model for finding a partition of the nodes to approximate both the network link data and the feature data. The model involves summary quantitative characteristics of both network links and features. We distinguish between two modes of using the network link data. One mode postulates that the link values are comparable and summable across the network (summability); the other assumption models the case in which different nodes represent different measurement systems so that the link data are neither comparable, nor summable, across different nodes (nonsummability). We derive a Pythagorean decomposition of the combined data scatter involving our data recovery least-squares criterion. We address an equivalent problem of maximizing its complementary part, the contribution of a found partition to the combined data scatter. We follow a doubly greedy strategy in maximizing that. First, communities are found one-by-one, and second, entities are added one-by-one in the process of identifying a community. Our algorithms determine the number of clusters automatically. The nonsummability version proves to have a niche of its own; also, it is faster than the other version. In our experiments, they appear to be competitive over generated synthetic data sets and six real-world data sets from the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Dongkuan Xu & Yingjie Tian

A novel intelligent Fuzzy-AHP based evolutionary algorithm for detecting communities in complex networks

Article 29 February 2024

Elmira Pourabbasi, Vahid Majidnezhad, … Yasser jafari

Density-Based Clustering Based on Hierarchical Density Estimates

References

Akoglu L, Tong H, Meeder B, Faloutsos C (2012) PICS: parameter-free identification of cohesive subgroups in large attributed graphs. In: Proceedings of the 12th SIAM international conference on data mining, pp 439–450, SDM
Amorim RC, Mirkin B (2012) Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering. Pattern Recognit 45(3):1061–1075
Article Google Scholar
Bojchevski A, Günnemann S (2018) Bayesian robust attributed graph clustering: joint learning of partial anomalies and group structure. In: Proceedings of thirty-second AAAI conference on artificial intelligence, pp 2738–2745, https://aaai.org/Library/AAAI/aaai18contents.php, Accessed 13 Oct 2020
Binkiewicz N, Vogelstein JT, Rohe K (2017) Covariate-assisted spectral clustering. Biometrika 104(2):361–377
Article MathSciNet MATH Google Scholar
Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 10:P10008
Article MATH Google Scholar
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge
Book MATH Google Scholar
Cao J, Wang H, Jin D, Dang J (2019) Combination of links and node contents for community discovery using a graph regularization approach. Future Gener Comput Syst 91:361–370
Article Google Scholar
Cavallari S, Zheng VW, Cai H, Chang KCC, Cambria E (2017) Learning community embedding with community detection and node embedding on graphs. In: Proceedings of the 2017 ACM conference on information and knowledge management. ACM, pp 377–386
Citraro S, Rossetti G (2020) Identifying and exploiting homogeneous communities in labeled networks. Appl Netw Sci 5(1):1–20
Article Google Scholar
Chang S, Han W, Tang J, Qi GJ, Aggarwal CC, Huang TS (2015) Heterogeneous network embedding via deep architectures. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp 119–128
Chiang MMT, Mirkin B (2010) Intelligent choice of the number of clusters in k-means clustering: an experimental study with different cluster spreads. J Classif 27(1):3–40
Article MathSciNet MATH Google Scholar
Chunaev P (2020) Community detection in node-attributed social networks: a survey. Comput Sci Rev 37:100286
Article MathSciNet MATH Google Scholar
Combe D, Largeron C, Géry M, Egyed-Zsigmond E (2015) I-louvain: an attributed graph clustering method. In: Fromont E, De Bie T, van Leeuwen M (eds) Advances in intelligent data analysis XIV. Springer International Publishing, Cham, pp 181–192
Chapter Google Scholar
Cross RL, Parker A (2004) The hidden power of social networks: understanding how work really gets done in organizations. Harvard Business Press, Boston
Google Scholar
Cover TM, Thomas JA (2012) Elements of information theory. John Wiley and Sons, New York
MATH Google Scholar
De Nooy W, Mrvar A, Batagelj V (2004) Exploratory social network analysis with Pajek. Cambridge University Press, Cambridge
Google Scholar
Dang TA, Viennet E (2012) Community detection based on structural and attribute similarities. In: International conference on digital society (ICDS), pp 7–12
Doreian P, Batagelj V, Ferligoj A (2020) Advances in network clustering and blockmodeling. John Wiley and Sons, New York
MATH Google Scholar
Goldenberg A, Zheng AX, Fienberg SE, Airoldi EM (2010) A survey of statistical network models. Now Publishers Inc., Norwell
MATH Google Scholar
GitHub Repository, Giulio Rossetti, Italian National Research Council https://github.com/GiulioRossetti/EVA
Hoffman M, Steinley D, Gates KM, Prinstein MJ, Brusco MJ (2018) Detecting clusters/communities in social networks. Multivar Behav Res 53(1):57–73
Article Google Scholar
Holland PW, Laskey KB, Leinhardt S (1983) Stochastic blockmodels: first steps. Soc Networks 5(2):109–137
Article MathSciNet Google Scholar
Hu Y, Li M, Zhang P, Fan Y, Di Z (2008) Community detection by signaling on complex networks. Phys Rev E 78(1):16115
Article Google Scholar
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
Article MATH Google Scholar
Interdonato R, Atzmueller M, Gaito S, Kanawati R, Largeron C, Sala A (2019) Feature-rich networks: going beyond complex network topologies. Appl Network Sci 4:1–13
Article Google Scholar
Javed MA, Younis MS, Latif S, Qadir J, Baig A (2018) Community detection in networks: a multidisciplinary review. J Network Comput Appl 108:87–111
Article Google Scholar
Jia C, Li Y, Carson MB, Wang X, Yu J (2017) Node attribute-enhanced community detection in complex networks. Sci Rep 7(1):2626
Article Google Scholar
Jin H, Yu W, Li S (2018) A clustering algorithm for determining community structure in complex networks. Phys A Stat Mech Appl 492:980–993
Article Google Scholar
Kovaleva EV, Mirkin B (2015) Bisecting K-means and 1D projection divisive clustering: a unified framework and experimental comparison. J Classif 32(3):414–442
Article MathSciNet MATH Google Scholar
Larremore DB, Clauset A, Buckee CO (2013) A network approach to analyzing highly recombinant malaria parasite genes. PLoS Comput Biol 9(10):e1003268
Article Google Scholar
Lazega E (2001) The collegial phenomenon: the social mechanisms of cooperation among peers in a corporate law partnership. Oxford University Press, Oxford
Book Google Scholar
Leskovec J, Sosič R (2016) SNAP: a general-purpose network analysis and graph-mining library, ACM transactions on intelligent systems and technology (TIST), vol 8-1, p 1, ACM, CESNA on Github: https://github.com/snap-stanford/snap/tree/master/examples/cesna
Mirkin B (1987) Additive clustering and qualitative factor analysis methods for similarity matrices. J Classif 4:7–31
Article MathSciNet MATH Google Scholar
Mirkin B (1990) A sequential fitting procedure for linear data analysis models. J Classif 7(2):167–195
Article MathSciNet MATH Google Scholar
Mirkin B (2008) The iterative extraction approach to clustering. In: Gorban A (ed) Principal manifolds for data visualization and dimension reduction. Springer, Heidelberg, pp 151–177
Chapter Google Scholar
Mirkin B, Nascimento S (2012) Additive spectral method for fuzzy cluster analysis of similarity data including community structure and affinity matrices. Inf Sci 183(1):16–34
Article Google Scholar
Mirkin B (2012) Clustering: a data recovery approach, CRC Press, 1st Edition, 2005; 2d Edition, 2012
Nascimento S, Casca S, Mirkin B (2015) A seed expanding cluster algorithm for deriving upwelling areas on sea surface temperature images. Comput Geosci 85:74–85
Article Google Scholar
Nature Communications, Mark J. Newman, “W DC University”, https://www.nature.com/articles/ncomms11863
Newman ME (2006) Modularity and community structure in networks. Proc Natl Acad Sci 103(23):8577–8582
Article Google Scholar
Newman ME, Clauset A (2016) Structure and inference in annotated networks. Nat Commun 7:11863
Article Google Scholar
Neville J, Adler M, Jensen D (2003) Clustering relational data using attribute and link information. In: Proceedings of the text mining and link analysis workshop, 18th international joint conference on artificial intelligence (pp 9–15). San Francisco, CA, Morgan Kaufmann Publishers
Ng A (2011) Sparse autoencoder, CS294A lecture notes 72, pp 1–19
Nowicki K, Snijders TAB (2001) Estimation and prediction for stochastic blockstructures. J Am Stat Assoc 96(455):1077–1087
Article MathSciNet MATH Google Scholar
Peel L, Larremore DB, Clauset A (2017) The ground truth about metadata and community detection in networks. Sci Adv 3(5):e1602548
Article Google Scholar
Robin G, Klopp O, Josse J, Moulines É, Tibshirani R (2019) Main effects and interactions in mixed and incomplete data frames. J Am Stat Assoc (Accepted), pp 1–31
Sánchez PI, Müller E, Korn UL, Böhm K, Kappes A, Hartmann T, Wagner D (2015) Efficient algorithms for a robust modularity-driven clustering of attributed graphs. In: Proceedings of the 2015 SIAM International conference on data mining, pp 100–108
Shalileh S, Mirkin B (2020) A data recovery method for community detection in feature-rich networks. In: Proceedings of the 2020 IEEE/ACM international conference on advances in social networks analysis and mining, pp 99–104
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Article Google Scholar
Stanley N, Bonacci T, Kwitt R, Niethammer M, Mucha PJ (2019) Stochastic block models with multiple continuous attributes. Appl Network Sci 4(1):1–22
Article Google Scholar
Snijders T. The Siena webpage. https://www.stats.ox.ac.uk/~snijders/siena/Lazega_lawyers_data.htm
Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res, pp 583–617
Sun H, He F, Huang J, Sun Y, Li Y, Wang C, He L, Sun Z, Jia X (2020) Network embedding for community detection in attributed networks. ACM Trans Knowl Discov Data (TKDD) 14(3):1–25
Article Google Scholar
Vichi M (2008) Fitting semiparametric clustering models to dissimilarity data. Adv Data Anal Classif 2(2):121–161
Article MathSciNet MATH Google Scholar
Wang D, Zhao Y (2019) Network community detection from the perspective of time series. Phys A Stat Mech Appl 522:205–214
Article Google Scholar
Wang X, Jin D, Cao X, Yang L, Zhang W (2016) Semantic community identification in large attribute networks. In: Proceedings of the thirtieth AAAI conference on artificial intelligence, AAAI’16, pp 265–271. AAAI Press
Xu Z, Ke Y, Wang Y, Cheng H, Cheng J (2012) A model-based approach to attributed graph clustering. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data, pp 505–516. ACM
Ye W, Zhou L, Sun X, Plant C, Böhm C (2017) Attributed graph clustering with unimodal normalized cut. In: Ceci M, Hollmén J, Todorovski L, Vens C, Džeroski S (eds) Machine learning and knowledge discovery in databases. Springer International Publishing, Cham, pp 601–616
Chapter Google Scholar
Yang J, McAuley J, Leskovec J (2013) Community detection in networks with node attributes. In: 2013 IEEE 13th international conference on data mining (pp 1151–1156). IEEE, arXiv:1401.7267, Accessed 22 Nov 2019
Zanghi H, Volant S, Ambroise C (2010) Clustering based on random graph model embedding vertex features. Pattern Recognit Lett 31(9):830–836
Article Google Scholar
Zhang Y, Levina E, Zhu J (2015) Community detection in networks with node features. arXiv preprint arXiv:1509.01173

Download references

Acknowledgements

This article was prepared within the framework of the HSE University Basic Research Program. The authors are indebted to the anonymous referees for their invaluable comments taken into account in the final draft.

Author information

Authors and Affiliations

Department of Data Analysis and Artificial Intelligence, HSE University, Pokrovsky Boulevard, 11, Moscow, Russian Federation
Soroosh Shalileh & Boris Mirkin
Laboratory of Methods for Big Data Analysis, HSE University, Pokrovsky Boulevard, 11, Moscow, Russian Federation
Soroosh Shalileh
Department of Computer Science and Information Systems, Birkbeck University of London, Malet Street, London, WC1E 7HX, UK
Boris Mirkin

Authors

Soroosh Shalileh
View author publications
You can also search for this author in PubMed Google Scholar
Boris Mirkin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Boris Mirkin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shalileh, S., Mirkin, B. Summable and nonsummable data-driven models for community detection in feature-rich networks. Soc. Netw. Anal. Min. 11, 67 (2021). https://doi.org/10.1007/s13278-021-00774-8

Download citation

Received: 25 January 2021
Revised: 10 July 2021
Accepted: 13 July 2021
Published: 28 July 2021
DOI: https://doi.org/10.1007/s13278-021-00774-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Summable and nonsummable data-driven models for community detection in feature-rich networks

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

A novel intelligent Fuzzy-AHP based evolutionary algorithm for detecting communities in complex networks

Density-Based Clustering Based on Hierarchical Density Estimates

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Summable and nonsummable data-driven models for community detection in feature-rich networks

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

A novel intelligent Fuzzy-AHP based evolutionary algorithm for detecting communities in complex networks

Density-Based Clustering Based on Hierarchical Density Estimates

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation