Abstract
A feature-rich network is a network whose nodes are characterized by categorical or quantitative features. We propose a data-driven model for finding a partition of the nodes to approximate both the network link data and the feature data. The model involves summary quantitative characteristics of both network links and features. We distinguish between two modes of using the network link data. One mode postulates that the link values are comparable and summable across the network (summability); the other assumption models the case in which different nodes represent different measurement systems so that the link data are neither comparable, nor summable, across different nodes (nonsummability). We derive a Pythagorean decomposition of the combined data scatter involving our data recovery least-squares criterion. We address an equivalent problem of maximizing its complementary part, the contribution of a found partition to the combined data scatter. We follow a doubly greedy strategy in maximizing that. First, communities are found one-by-one, and second, entities are added one-by-one in the process of identifying a community. Our algorithms determine the number of clusters automatically. The nonsummability version proves to have a niche of its own; also, it is faster than the other version. In our experiments, they appear to be competitive over generated synthetic data sets and six real-world data sets from the literature.
Similar content being viewed by others
References
Akoglu L, Tong H, Meeder B, Faloutsos C (2012) PICS: parameter-free identification of cohesive subgroups in large attributed graphs. In: Proceedings of the 12th SIAM international conference on data mining, pp 439–450, SDM
Amorim RC, Mirkin B (2012) Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering. Pattern Recognit 45(3):1061–1075
Bojchevski A, Günnemann S (2018) Bayesian robust attributed graph clustering: joint learning of partial anomalies and group structure. In: Proceedings of thirty-second AAAI conference on artificial intelligence, pp 2738–2745, https://aaai.org/Library/AAAI/aaai18contents.php, Accessed 13 Oct 2020
Binkiewicz N, Vogelstein JT, Rohe K (2017) Covariate-assisted spectral clustering. Biometrika 104(2):361–377
Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 10:P10008
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge
Cao J, Wang H, Jin D, Dang J (2019) Combination of links and node contents for community discovery using a graph regularization approach. Future Gener Comput Syst 91:361–370
Cavallari S, Zheng VW, Cai H, Chang KCC, Cambria E (2017) Learning community embedding with community detection and node embedding on graphs. In: Proceedings of the 2017 ACM conference on information and knowledge management. ACM, pp 377–386
Citraro S, Rossetti G (2020) Identifying and exploiting homogeneous communities in labeled networks. Appl Netw Sci 5(1):1–20
Chang S, Han W, Tang J, Qi GJ, Aggarwal CC, Huang TS (2015) Heterogeneous network embedding via deep architectures. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp 119–128
Chiang MMT, Mirkin B (2010) Intelligent choice of the number of clusters in k-means clustering: an experimental study with different cluster spreads. J Classif 27(1):3–40
Chunaev P (2020) Community detection in node-attributed social networks: a survey. Comput Sci Rev 37:100286
Combe D, Largeron C, Géry M, Egyed-Zsigmond E (2015) I-louvain: an attributed graph clustering method. In: Fromont E, De Bie T, van Leeuwen M (eds) Advances in intelligent data analysis XIV. Springer International Publishing, Cham, pp 181–192
Cross RL, Parker A (2004) The hidden power of social networks: understanding how work really gets done in organizations. Harvard Business Press, Boston
Cover TM, Thomas JA (2012) Elements of information theory. John Wiley and Sons, New York
De Nooy W, Mrvar A, Batagelj V (2004) Exploratory social network analysis with Pajek. Cambridge University Press, Cambridge
Dang TA, Viennet E (2012) Community detection based on structural and attribute similarities. In: International conference on digital society (ICDS), pp 7–12
Doreian P, Batagelj V, Ferligoj A (2020) Advances in network clustering and blockmodeling. John Wiley and Sons, New York
Goldenberg A, Zheng AX, Fienberg SE, Airoldi EM (2010) A survey of statistical network models. Now Publishers Inc., Norwell
GitHub Repository, Giulio Rossetti, Italian National Research Council https://github.com/GiulioRossetti/EVA
Hoffman M, Steinley D, Gates KM, Prinstein MJ, Brusco MJ (2018) Detecting clusters/communities in social networks. Multivar Behav Res 53(1):57–73
Holland PW, Laskey KB, Leinhardt S (1983) Stochastic blockmodels: first steps. Soc Networks 5(2):109–137
Hu Y, Li M, Zhang P, Fan Y, Di Z (2008) Community detection by signaling on complex networks. Phys Rev E 78(1):16115
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
Interdonato R, Atzmueller M, Gaito S, Kanawati R, Largeron C, Sala A (2019) Feature-rich networks: going beyond complex network topologies. Appl Network Sci 4:1–13
Javed MA, Younis MS, Latif S, Qadir J, Baig A (2018) Community detection in networks: a multidisciplinary review. J Network Comput Appl 108:87–111
Jia C, Li Y, Carson MB, Wang X, Yu J (2017) Node attribute-enhanced community detection in complex networks. Sci Rep 7(1):2626
Jin H, Yu W, Li S (2018) A clustering algorithm for determining community structure in complex networks. Phys A Stat Mech Appl 492:980–993
Kovaleva EV, Mirkin B (2015) Bisecting K-means and 1D projection divisive clustering: a unified framework and experimental comparison. J Classif 32(3):414–442
Larremore DB, Clauset A, Buckee CO (2013) A network approach to analyzing highly recombinant malaria parasite genes. PLoS Comput Biol 9(10):e1003268
Lazega E (2001) The collegial phenomenon: the social mechanisms of cooperation among peers in a corporate law partnership. Oxford University Press, Oxford
Leskovec J, Sosič R (2016) SNAP: a general-purpose network analysis and graph-mining library, ACM transactions on intelligent systems and technology (TIST), vol 8-1, p 1, ACM, CESNA on Github: https://github.com/snap-stanford/snap/tree/master/examples/cesna
Mirkin B (1987) Additive clustering and qualitative factor analysis methods for similarity matrices. J Classif 4:7–31
Mirkin B (1990) A sequential fitting procedure for linear data analysis models. J Classif 7(2):167–195
Mirkin B (2008) The iterative extraction approach to clustering. In: Gorban A (ed) Principal manifolds for data visualization and dimension reduction. Springer, Heidelberg, pp 151–177
Mirkin B, Nascimento S (2012) Additive spectral method for fuzzy cluster analysis of similarity data including community structure and affinity matrices. Inf Sci 183(1):16–34
Mirkin B (2012) Clustering: a data recovery approach, CRC Press, 1st Edition, 2005; 2d Edition, 2012
Nascimento S, Casca S, Mirkin B (2015) A seed expanding cluster algorithm for deriving upwelling areas on sea surface temperature images. Comput Geosci 85:74–85
Nature Communications, Mark J. Newman, “W DC University”, https://www.nature.com/articles/ncomms11863
Newman ME (2006) Modularity and community structure in networks. Proc Natl Acad Sci 103(23):8577–8582
Newman ME, Clauset A (2016) Structure and inference in annotated networks. Nat Commun 7:11863
Neville J, Adler M, Jensen D (2003) Clustering relational data using attribute and link information. In: Proceedings of the text mining and link analysis workshop, 18th international joint conference on artificial intelligence (pp 9–15). San Francisco, CA, Morgan Kaufmann Publishers
Ng A (2011) Sparse autoencoder, CS294A lecture notes 72, pp 1–19
Nowicki K, Snijders TAB (2001) Estimation and prediction for stochastic blockstructures. J Am Stat Assoc 96(455):1077–1087
Peel L, Larremore DB, Clauset A (2017) The ground truth about metadata and community detection in networks. Sci Adv 3(5):e1602548
Robin G, Klopp O, Josse J, Moulines É, Tibshirani R (2019) Main effects and interactions in mixed and incomplete data frames. J Am Stat Assoc (Accepted), pp 1–31
Sánchez PI, Müller E, Korn UL, Böhm K, Kappes A, Hartmann T, Wagner D (2015) Efficient algorithms for a robust modularity-driven clustering of attributed graphs. In: Proceedings of the 2015 SIAM International conference on data mining, pp 100–108
Shalileh S, Mirkin B (2020) A data recovery method for community detection in feature-rich networks. In: Proceedings of the 2020 IEEE/ACM international conference on advances in social networks analysis and mining, pp 99–104
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Stanley N, Bonacci T, Kwitt R, Niethammer M, Mucha PJ (2019) Stochastic block models with multiple continuous attributes. Appl Network Sci 4(1):1–22
Snijders T. The Siena webpage. https://www.stats.ox.ac.uk/~snijders/siena/Lazega_lawyers_data.htm
Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res, pp 583–617
Sun H, He F, Huang J, Sun Y, Li Y, Wang C, He L, Sun Z, Jia X (2020) Network embedding for community detection in attributed networks. ACM Trans Knowl Discov Data (TKDD) 14(3):1–25
Vichi M (2008) Fitting semiparametric clustering models to dissimilarity data. Adv Data Anal Classif 2(2):121–161
Wang D, Zhao Y (2019) Network community detection from the perspective of time series. Phys A Stat Mech Appl 522:205–214
Wang X, Jin D, Cao X, Yang L, Zhang W (2016) Semantic community identification in large attribute networks. In: Proceedings of the thirtieth AAAI conference on artificial intelligence, AAAI’16, pp 265–271. AAAI Press
Xu Z, Ke Y, Wang Y, Cheng H, Cheng J (2012) A model-based approach to attributed graph clustering. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data, pp 505–516. ACM
Ye W, Zhou L, Sun X, Plant C, Böhm C (2017) Attributed graph clustering with unimodal normalized cut. In: Ceci M, Hollmén J, Todorovski L, Vens C, Džeroski S (eds) Machine learning and knowledge discovery in databases. Springer International Publishing, Cham, pp 601–616
Yang J, McAuley J, Leskovec J (2013) Community detection in networks with node attributes. In: 2013 IEEE 13th international conference on data mining (pp 1151–1156). IEEE, arXiv:1401.7267, Accessed 22 Nov 2019
Zanghi H, Volant S, Ambroise C (2010) Clustering based on random graph model embedding vertex features. Pattern Recognit Lett 31(9):830–836
Zhang Y, Levina E, Zhu J (2015) Community detection in networks with node features. arXiv preprint arXiv:1509.01173
Acknowledgements
This article was prepared within the framework of the HSE University Basic Research Program. The authors are indebted to the anonymous referees for their invaluable comments taken into account in the final draft.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Shalileh, S., Mirkin, B. Summable and nonsummable data-driven models for community detection in feature-rich networks. Soc. Netw. Anal. Min. 11, 67 (2021). https://doi.org/10.1007/s13278-021-00774-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-021-00774-8