Abstract
Although there is considerable disagreement about the details, community detection in social networks requires finding groups of nodes that are similar to one another, and different from other groups. The notion of similarity is therefore key. Some techniques use attribute similarity—two nodes are similar when they share similar attribute values; some use structural similarity—two nodes are similar when they are well connected, directly or indirectly. Recent work has tried to use both attribute and structural similarity, but the obvious challenge is how to merge and weight these two qualitatively different types of similarity. We design a community detection technique that not only uses attributes and structure, but separates qualitatively different kinds of attributes and treats similarity different for each. Attributes and structure are then combined into a single graph in a principled way, and a spectral embedding used to place the nodes in a geometry, where conventional clustering algorithms can be applied. We apply our community detection technique to real-world data, the Instagram social network, which we crawl to extract the data of a large set of users. We compute attribute similarity from users’ post content, hashtags, image content, and followership as qualitatively different modes of similarity. Our technique outperforms a range of popular community detection techniques across many metrics, providing evidence that different attribute modalities are important for discovering communities. We also validate our technique by computing the topics associated with each community and showing that these are plausibly coherent. This highlights a potential application of community detection in social networks, finding groups of users with specific interests who could be the targets of focused marketing.
Similar content being viewed by others
References
Akbas E, Zhao P (2017) Attributed graph clustering: an attribute-aware graph embedding approach. In: Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining 2017, pp 305–308
Bansal P, Bansal R, Varma V (2015) Towards deep semantic analysis of hashtags. In: European conference on information retrieval, Springer, pp 453–464
Bhat SI, Arif T, Malik MB et al (2020) Browser simulation-based crawler for online social network profile extraction. Int J Web Based Commun 16(4):321–342
Blei DM, Lafferty JD (2006) Dynamic topic models. In: Proceedings of the 23rd international conference on Machine learning, pp 113–120
Bojanowski P, Grave E, Joulin A et al (2017) Enriching word vectors with subword information. Transa Assoc Comput Ling 5:135–146
Bothorel C, Cruz JD, Magnani M et al (2015) Clustering attributed graphs: models, measures and methods. Netw Sci 3(3):408–444
Britvin A, Alrawashdeh JH, Tkachuk R (2022) Client-server system for parsing data from web pages. Adv Cyber-Phys Syst 7(1):8–13
Buccafurri F, Lax G, Nicolazzo S, et al (2014) A model to support multi-social-network applications. In: OTM confederated international conferences" On the move to meaningful internet systems. Springer, pp 639–656
Buccafurri F, Lax G, Nocera A et al (2015) Discovering missing me edges across social networks. Inf Sci 319:18–37
Chakraborty T, Dalmia A, Mukherjee A et al (2017) Metrics for community analysis: a survey. ACM Comput Surv 50(4):1–37
Cho WI, Cheon SJ, Kang WH, et al (2018) Real-time automatic word segmentation for user-generated text. arXiv preprint arXiv:1810.13113
Chunaev P (2020) Community detection in node-attributed social networks: a survey. Comput Sci Rev 37:100286
Chung F (1997) Spectral graph theory. number 92 in CBMS regional conference series in mathematics. American Mathematical Society
Combe D, Largeron C, Géry M, et al (2015) I-louvain: An attributed graph clustering method. In: International symposium on intelligent data analysis, Springer, pp 181–192
Crampes M, Plantié M (2014) A unified community detection, visualization and analysis method. Adv Complex Syst 17(01):1450001
Danon L, Diaz-Guilera A, Duch J et al (2005) (2005) Comparing community structure identification. J Stat Mech: Theory Exp 09:P09008
Deerwester S, Dumais ST, Furnas GW et al (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407
Ding Y (2011) Community detection: topological vs. topical. J Inf 5(4):498–514
Dosovitskiy A, Beyer L, Kolesnikov A, et al (2021) An image is worth 16x16 words: Transformers for image recognition at scale. In: ICLR
Egger R, Kroner M, Stöckl A (2022) Web scraping. In: Applied Data Science in Tourism. Springer, p 67–82
Fei-Fei L (2007) Recognizing and learning object categories. CVPR Short Course, 2007
Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174
Girvan M, Newman ME (2001) Community structure in social and biological networks. Proc Natl Acad Sci USA 99:8271–8276
Girvan M, Newman ME (2002) Community structure in social and biological networks. Proc Natl Acad Sci 99(12):7821–7826
Günnemann S, Färber I, Boden B, et al (2010) Subspace clustering meets dense subgraph mining: a synthesis of two paradigms. In: 2010 IEEE international conference on data mining, IEEE, pp 845–850
Günnemann S, Boden B, Seidl T (2011) Db-csc: a density-based approach for subspace clustering in graphs with feature vectors. In: Joint European conference on machine learning and knowledge discovery in databases, Springer, pp 565–580
Günnemann S, Färber I, Raubach S, et al (2013) Spectral subspace clustering for graphs with feature vectors. In: 2013 IEEE 13th international conference on data mining, IEEE, pp 231–240
He X, Deng L (2017) Deep learning for image-to-text generation: a technical overview. IEEE Signal Process Mag 34(6):109–116
Jia C, Li Y, Carson MB et al (2017) Node attribute-enhanced community detection in complex networks. Sci Rep 7(1):1–15
Jin D, Yu Z, Jiao P, et al (2021) A survey of community detection approaches: from statistical modeling to deep learning. arXiv: 2101:01669
Karami E, Prasad S, Shehata M (2015) Image matching using sift, surf, brief and orb: performance comparison for distorted images. In: Newfoundland electrical and computer engineering conference
Khataei S, Hine MJ, Arya A (2021) The design, development and validation of a persuasive content generator. J Int Technol Inf Manag 29(3):46–80
Kodiyala VS, Mercer RE (2021) Emotion recognition and sentiment classification using bert with data augmentation and emotion lexicon enrichment. In: 2021 20th ieee international conference on machine learning and applications (ICMLA), IEEE, pp 191–198
Koto F, Adriani M (2015) Hbe: Hashtag-based emotion lexicons for twitter sentiment analysis. In: Proceedings of the 7th Forum for Information Retrieval Evaluation, pp 31–34
Lee D, Seung H (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791
Lee JA, Sudarshan S, Sussman KL et al (2022) Why are consumers following social media influencers on instagram? exploration of consumer’s motives for following influencers and the role of materialism. Int J Advert 41(1):78–100
Leskovec J, Lang KJ, Mahoney M (2010) Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th international conference on World wide web, pp 631–640
Li Y, Sha C, Huang X, et al (2018) Community detection in attributed graphs: An embedding approach. In: The thirty-second AAAI conference on artificial intelligence (AAAI-18)
Liu DR, Tsai PY, Chiu PH (2011) Personalized recommendation of popular blog articles for mobile applications. Inf Sci 181(9):1552–1572
Lu DD, Qi J, Yan J et al (2022) Community detection combining topology and attribute information. Knowl Inf Syst 64(2):537–558
Malliaros FD, Vazirgiannis M (2013) Clustering and community detection in directed networks: A survey. Phys Rep 533(4):95–142
Moser F, Colak R, Rafiey A, et al (2009) Mining cohesive patterns from graphs with feature vectors. In: Proceedings of the 2009 SIAM international conference on data mining, SIAM, pp 593–604
Newman ME (2006) Finding community structure in networks using the eigenvectors of matrices. Phys Rev E 74(3):036104
Newman ME, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113
Nick B, Lee C, Cunningham P, et al (2013) Simmelian backbones: Amplifying hidden homophily in Facebook networks. In: Proceedings of Advances in Social Network Analysis and Modelling ASONAM, ACM & IEEE
Orman GK, Labatut V, Cherifi H (2012) Comparative evaluation of community detection algorithms: a topological approach. J Stat Mech: Theory Exp 08:P08001
Perozzi B, Akoglu L, Iglesias Sánchez P, et al (2014a) Focused clustering and outlier detection in large attributed graphs. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 1346–1355
Perozzi B, Al-Rfou R, Skiena S (2014b) Deepwalk: Online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 701–710
Rosvall M, Axelsson D, Bergstrom CT (2009) The map equation. Eur Phys J Spcl Top 178(1):13–23
Ruan Y, Fuhry D, Liang J, et al (2015) Community discovery: simple and scalable approaches. In: User community discovery. Springer, p 23–54
Sandholm T, Ung H (2011) Real-time, location-aware collaborative filtering of web content. In: Proceedings of the 2011 workshop on context-awareness in retrieval and recommendation, pp 14–18
Schliski F, Schlötterer J, Granitzer M (2020) Influence of random walk parametrization on graph embeddings. In: European conference on information retrieval, Springer, pp 58–65
Sheikh N, Kefato Z, Montresor A (2019) gat2vec: representation learning for attributed graphs. Computing 101(3):187–209
Skillicorn D, Zheng Q (2012) Global similarity in social networks with typed edges. In: 2012 IEEE/ACM international conference on advances in social networks analysis and mining, pp 79–85
Sun H, He F, Huang J et al (2020) Network embedding for community detection in attributed networks. ACM Trans Knowl Discov Data 14(3):1–25
Tang J, Wang X, Gao H et al (2012) Enriching short text representation in microblog for clustering. Front Comp Sci 6(1):88–101
Traag V, Krings G, Dooren PV (2013) Significant scales in community structure. Sci Rep 3:1–10
Wang C, Pan S, Long G, et al (2017) MGAE: Marginalized graph autoencoder for graph clustering. In: CIKM’17
Wu H, Cui X, He J et al (2014) On improving aggregate recommendation diversity and novelty in folksonomy-based social systems. Pers Ubiquit Comput 18(8):1855–1869
Xie J, Kelley S, Szymanski BK (2013) Overlapping community detection in networks: the state-of-the-art and comparative study. ACM Comput Surv 45(4):1–35
Xu X, Yuruk N, Feng Z, et al (2007) Scan: a structural clustering algorithm for networks. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 824–833
Xu Z, Ke Y, Wang Y, et al (2012) A model-based approach to attributed graph clustering. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data, pp 505–516
Yang J, Leskovec J (2012) Defining and evaluating network communities based on ground-truth. In: Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics, pp 1–8
Yang J, McAuley J, Leskovec J (2013) Community detection in networks with node attributes. In: 2013 IEEE 13th international conference on data mining, IEEE, pp 1151–1156
Yang T, Jin R, Chi Y, et al (2009) Combining link and content for community detection: a discriminative approach. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 927–936
Zheng Q (2016) Spectral techniques for heterogeneous social networks. PhD thesis, Queen’s University at Kingston
Zheng Q, Skillicorn D (2017) Social networks with rich edge semantics. Taylor & Francis, Milton Park
Zhou Y, Liu L (2013) Social influence based clustering of heterogeneous information networks. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 338–346
Zhou Y, Cheng H, Yu JX (2009) Graph clustering based on structural/attribute similarities. Proc VLDB Endowment 2(1):718–729
Zhou Y, Cheng H, Yu JX (2010) Clustering large attributed graphs: an efficient incremental approach. In: 2010 IEEE International conference on data mining, IEEE, pp 689–698
Author information
Authors and Affiliations
Contributions
MA carried out the research and contributed to the writing. DBS wrote the main manuscript text. Both authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Alfaqeeh, M., Skillicorn, D.B. Community detection in social networks by spectral embedding of typed graphs. Soc. Netw. Anal. Min. 14, 12 (2024). https://doi.org/10.1007/s13278-023-01172-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-023-01172-y