Skip to main content
Log in

Summable and nonsummable data-driven models for community detection in feature-rich networks

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

A feature-rich network is a network whose nodes are characterized by categorical or quantitative features. We propose a data-driven model for finding a partition of the nodes to approximate both the network link data and the feature data. The model involves summary quantitative characteristics of both network links and features. We distinguish between two modes of using the network link data. One mode postulates that the link values are comparable and summable across the network (summability); the other assumption models the case in which different nodes represent different measurement systems so that the link data are neither comparable, nor summable, across different nodes (nonsummability). We derive a Pythagorean decomposition of the combined data scatter involving our data recovery least-squares criterion. We address an equivalent problem of maximizing its complementary part, the contribution of a found partition to the combined data scatter. We follow a doubly greedy strategy in maximizing that. First, communities are found one-by-one, and second, entities are added one-by-one in the process of identifying a community. Our algorithms determine the number of clusters automatically. The nonsummability version proves to have a niche of its own; also, it is faster than the other version. In our experiments, they appear to be competitive over generated synthetic data sets and six real-world data sets from the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Akoglu L, Tong H, Meeder B, Faloutsos C (2012) PICS: parameter-free identification of cohesive subgroups in large attributed graphs. In: Proceedings of the 12th SIAM international conference on data mining, pp 439–450, SDM

  • Amorim RC, Mirkin B (2012) Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering. Pattern Recognit 45(3):1061–1075

    Article  Google Scholar 

  • Bojchevski A, Günnemann S (2018) Bayesian robust attributed graph clustering: joint learning of partial anomalies and group structure. In: Proceedings of thirty-second AAAI conference on artificial intelligence, pp 2738–2745, https://aaai.org/Library/AAAI/aaai18contents.php, Accessed 13 Oct 2020

  • Binkiewicz N, Vogelstein JT, Rohe K (2017) Covariate-assisted spectral clustering. Biometrika 104(2):361–377

    Article  MathSciNet  MATH  Google Scholar 

  • Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 10:P10008

    Article  MATH  Google Scholar 

  • Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Cao J, Wang H, Jin D, Dang J (2019) Combination of links and node contents for community discovery using a graph regularization approach. Future Gener Comput Syst 91:361–370

    Article  Google Scholar 

  • Cavallari S, Zheng VW, Cai H, Chang KCC, Cambria E (2017) Learning community embedding with community detection and node embedding on graphs. In: Proceedings of the 2017 ACM conference on information and knowledge management. ACM, pp 377–386

  • Citraro S, Rossetti G (2020) Identifying and exploiting homogeneous communities in labeled networks. Appl Netw Sci 5(1):1–20

    Article  Google Scholar 

  • Chang S, Han W, Tang J, Qi GJ, Aggarwal CC, Huang TS (2015) Heterogeneous network embedding via deep architectures. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp 119–128

  • Chiang MMT, Mirkin B (2010) Intelligent choice of the number of clusters in k-means clustering: an experimental study with different cluster spreads. J Classif 27(1):3–40

    Article  MathSciNet  MATH  Google Scholar 

  • Chunaev P (2020) Community detection in node-attributed social networks: a survey. Comput Sci Rev 37:100286

    Article  MathSciNet  MATH  Google Scholar 

  • Combe D, Largeron C, Géry M, Egyed-Zsigmond E (2015) I-louvain: an attributed graph clustering method. In: Fromont E, De Bie T, van Leeuwen M (eds) Advances in intelligent data analysis XIV. Springer International Publishing, Cham, pp 181–192

    Chapter  Google Scholar 

  • Cross RL, Parker A (2004) The hidden power of social networks: understanding how work really gets done in organizations. Harvard Business Press, Boston

    Google Scholar 

  • Cover TM, Thomas JA (2012) Elements of information theory. John Wiley and Sons, New York

    MATH  Google Scholar 

  • De Nooy W, Mrvar A, Batagelj V (2004) Exploratory social network analysis with Pajek. Cambridge University Press, Cambridge

    Google Scholar 

  • Dang TA, Viennet E (2012) Community detection based on structural and attribute similarities. In: International conference on digital society (ICDS), pp 7–12

  • Doreian P, Batagelj V, Ferligoj A (2020) Advances in network clustering and blockmodeling. John Wiley and Sons, New York

    MATH  Google Scholar 

  • Goldenberg A, Zheng AX, Fienberg SE, Airoldi EM (2010) A survey of statistical network models. Now Publishers Inc., Norwell

    MATH  Google Scholar 

  • GitHub Repository, Giulio Rossetti, Italian National Research Council https://github.com/GiulioRossetti/EVA

  • Hoffman M, Steinley D, Gates KM, Prinstein MJ, Brusco MJ (2018) Detecting clusters/communities in social networks. Multivar Behav Res 53(1):57–73

    Article  Google Scholar 

  • Holland PW, Laskey KB, Leinhardt S (1983) Stochastic blockmodels: first steps. Soc Networks 5(2):109–137

    Article  MathSciNet  Google Scholar 

  • Hu Y, Li M, Zhang P, Fan Y, Di Z (2008) Community detection by signaling on complex networks. Phys Rev E 78(1):16115

    Article  Google Scholar 

  • Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218

    Article  MATH  Google Scholar 

  • Interdonato R, Atzmueller M, Gaito S, Kanawati R, Largeron C, Sala A (2019) Feature-rich networks: going beyond complex network topologies. Appl Network Sci 4:1–13

    Article  Google Scholar 

  • Javed MA, Younis MS, Latif S, Qadir J, Baig A (2018) Community detection in networks: a multidisciplinary review. J Network Comput Appl 108:87–111

    Article  Google Scholar 

  • Jia C, Li Y, Carson MB, Wang X, Yu J (2017) Node attribute-enhanced community detection in complex networks. Sci Rep 7(1):2626

    Article  Google Scholar 

  • Jin H, Yu W, Li S (2018) A clustering algorithm for determining community structure in complex networks. Phys A Stat Mech Appl 492:980–993

    Article  Google Scholar 

  • Kovaleva EV, Mirkin B (2015) Bisecting K-means and 1D projection divisive clustering: a unified framework and experimental comparison. J Classif 32(3):414–442

    Article  MathSciNet  MATH  Google Scholar 

  • Larremore DB, Clauset A, Buckee CO (2013) A network approach to analyzing highly recombinant malaria parasite genes. PLoS Comput Biol 9(10):e1003268

    Article  Google Scholar 

  • Lazega E (2001) The collegial phenomenon: the social mechanisms of cooperation among peers in a corporate law partnership. Oxford University Press, Oxford

    Book  Google Scholar 

  • Leskovec J, Sosič R (2016) SNAP: a general-purpose network analysis and graph-mining library, ACM transactions on intelligent systems and technology (TIST), vol 8-1, p 1, ACM, CESNA on Github: https://github.com/snap-stanford/snap/tree/master/examples/cesna

  • Mirkin B (1987) Additive clustering and qualitative factor analysis methods for similarity matrices. J Classif 4:7–31

    Article  MathSciNet  MATH  Google Scholar 

  • Mirkin B (1990) A sequential fitting procedure for linear data analysis models. J Classif 7(2):167–195

    Article  MathSciNet  MATH  Google Scholar 

  • Mirkin B (2008) The iterative extraction approach to clustering. In: Gorban A (ed) Principal manifolds for data visualization and dimension reduction. Springer, Heidelberg, pp 151–177

    Chapter  Google Scholar 

  • Mirkin B, Nascimento S (2012) Additive spectral method for fuzzy cluster analysis of similarity data including community structure and affinity matrices. Inf Sci 183(1):16–34

    Article  Google Scholar 

  • Mirkin B (2012) Clustering: a data recovery approach, CRC Press, 1st Edition, 2005; 2d Edition, 2012

  • Nascimento S, Casca S, Mirkin B (2015) A seed expanding cluster algorithm for deriving upwelling areas on sea surface temperature images. Comput Geosci 85:74–85

    Article  Google Scholar 

  • Nature Communications, Mark J. Newman, “W DC University”, https://www.nature.com/articles/ncomms11863

  • Newman ME (2006) Modularity and community structure in networks. Proc Natl Acad Sci 103(23):8577–8582

    Article  Google Scholar 

  • Newman ME, Clauset A (2016) Structure and inference in annotated networks. Nat Commun 7:11863

    Article  Google Scholar 

  • Neville J, Adler M, Jensen D (2003) Clustering relational data using attribute and link information. In: Proceedings of the text mining and link analysis workshop, 18th international joint conference on artificial intelligence (pp 9–15). San Francisco, CA, Morgan Kaufmann Publishers

  • Ng A (2011) Sparse autoencoder, CS294A lecture notes 72, pp 1–19

  • Nowicki K, Snijders TAB (2001) Estimation and prediction for stochastic blockstructures. J Am Stat Assoc 96(455):1077–1087

    Article  MathSciNet  MATH  Google Scholar 

  • Peel L, Larremore DB, Clauset A (2017) The ground truth about metadata and community detection in networks. Sci Adv 3(5):e1602548

    Article  Google Scholar 

  • Robin G, Klopp O, Josse J, Moulines É, Tibshirani R (2019) Main effects and interactions in mixed and incomplete data frames. J Am Stat Assoc (Accepted), pp 1–31

  • Sánchez PI, Müller E, Korn UL, Böhm K, Kappes A, Hartmann T, Wagner D (2015) Efficient algorithms for a robust modularity-driven clustering of attributed graphs. In: Proceedings of the 2015 SIAM International conference on data mining, pp 100–108

  • Shalileh S, Mirkin B (2020) A data recovery method for community detection in feature-rich networks. In: Proceedings of the 2020 IEEE/ACM international conference on advances in social networks analysis and mining, pp 99–104

  • Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905

    Article  Google Scholar 

  • Stanley N, Bonacci T, Kwitt R, Niethammer M, Mucha PJ (2019) Stochastic block models with multiple continuous attributes. Appl Network Sci 4(1):1–22

    Article  Google Scholar 

  • Snijders T. The Siena webpage. https://www.stats.ox.ac.uk/~snijders/siena/Lazega_lawyers_data.htm

  • Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res, pp 583–617

  • Sun H, He F, Huang J, Sun Y, Li Y, Wang C, He L, Sun Z, Jia X (2020) Network embedding for community detection in attributed networks. ACM Trans Knowl Discov Data (TKDD) 14(3):1–25

    Article  Google Scholar 

  • Vichi M (2008) Fitting semiparametric clustering models to dissimilarity data. Adv Data Anal Classif 2(2):121–161

    Article  MathSciNet  MATH  Google Scholar 

  • Wang D, Zhao Y (2019) Network community detection from the perspective of time series. Phys A Stat Mech Appl 522:205–214

    Article  Google Scholar 

  • Wang X, Jin D, Cao X, Yang L, Zhang W (2016) Semantic community identification in large attribute networks. In: Proceedings of the thirtieth AAAI conference on artificial intelligence, AAAI’16, pp 265–271. AAAI Press

  • Xu Z, Ke Y, Wang Y, Cheng H, Cheng J (2012) A model-based approach to attributed graph clustering. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data, pp 505–516. ACM

  • Ye W, Zhou L, Sun X, Plant C, Böhm C (2017) Attributed graph clustering with unimodal normalized cut. In: Ceci M, Hollmén J, Todorovski L, Vens C, Džeroski S (eds) Machine learning and knowledge discovery in databases. Springer International Publishing, Cham, pp 601–616

    Chapter  Google Scholar 

  • Yang J, McAuley J, Leskovec J (2013) Community detection in networks with node attributes. In: 2013 IEEE 13th international conference on data mining (pp 1151–1156). IEEE, arXiv:1401.7267, Accessed 22 Nov 2019

  • Zanghi H, Volant S, Ambroise C (2010) Clustering based on random graph model embedding vertex features. Pattern Recognit Lett 31(9):830–836

    Article  Google Scholar 

  • Zhang Y, Levina E, Zhu J (2015) Community detection in networks with node features. arXiv preprint arXiv:1509.01173

Download references

Acknowledgements

This article was prepared within the framework of the HSE University Basic Research Program. The authors are indebted to the anonymous referees for their invaluable comments taken into account in the final draft.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Boris Mirkin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shalileh, S., Mirkin, B. Summable and nonsummable data-driven models for community detection in feature-rich networks. Soc. Netw. Anal. Min. 11, 67 (2021). https://doi.org/10.1007/s13278-021-00774-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-021-00774-8

Keywords

Navigation