Abstract
In this paper, we focus on the problem of community detection on Sina weibo, the most popular microblogging system in China. By characterizing the structure and content of microgroup (community) on Sina weibo in detail, we observe that different from ordinary social networks, the degree assortativity coefficients are negative on most microgroups. In addition, we find that users from the same microgroup tend to share some common attributes (e.g., followers, tags) and interests extracted from their published posts. Inspired by these new findings, we propose a united method to remodel the network for microgroup detection while maintaining the information of link structure and user content. Firstly, the link direction is concerned by assigning greater weight values to more surprising links, while the content similarity is measured by the Jaccard coefficient of common features and interest similarity based on Latent Dirichlet Allocation model. Then, both link direction and content similarity between two users are uniformly converted to the edge weight of a new remodeled network, which is undirected and weighted. Finally, multiple frequently used community detection algorithms that support weighted networks could be employed. Extensive experiments on real-world social networks show that both link structure and user content play almost equally important roles in microgroup detection on Sina weibo. Our method outperforms the traditional methods with average accuracy improvement up to 39 %, and the number of unrecognized users decreased by about 75 %.











Similar content being viewed by others
References
Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: SIGMOD conference’98. pp 94–105
Andreopoulos B, An A, Wang X, Schroeder M (2009) A roadmap of clustering algorithms: finding a match for a biomedical application. Brief Bioinform 10(3):297–314
Arenas A, Díaz-Guilera A, Pérez-Vicente CJ (2006) Synchronization reveals topological scales in complex networks. Phys Rev Lett 96(11):114102
Cha M, Mislove A, Gummadi PK (2009) A measurement-driven analysis of information propagation in the flickr social network. In: World wide web conference series, pp 721–730
Cheeseman P, Stutz J (1996) Advances in knowledge discovery and data mining. American Association for Artificial Intelligence, Menlo Park, CA
Clauset A, Newman MEJ, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70(6):066111
Cohn DA, Hofmann T (2001) The missing link—a probabilistic model of document content and hypertext connectivity. In: Leen TK, Dietterich TG, Tresp V (eds) Advances in Neural information processing systems 13. MIT Press, pp 430–436
Danon L, Duch J, Arenas A, Daz-guilera A (2005) Comparing community structure identification. J Stat Mech Theory Exp 9008:09008
Dietz L, Bickel S, Scheffer T (2007) Unsupervised prediction of citation influences. In: Proceedings of the 24th international conference on machine learning, pp 233–240
Duan D, Li Y, Jin Y, Lu Z (2009) Community mining on dynamic weighted directed graphs. In: Proceedings of international conference on information and knowledge management, pp 11–18
Endres DM, Schindelin JE (2003) A new metric for probability distributions. IEEE Trans Inf Theory 49(7):1858–1860
Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD’96, pp 226–231
Flake G, Lawrence S, Giles C, Coetzee F (2002) Self-organization and identification of Web communities. Computer 35(3):66–70
Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174
Fortunato S, Castellano C (2007) Community structure in graphs. eprint arXiv: 0712.2716
Getoor L, Friedman N, Koller D, Taskar B (2002) Learning probabilistic models of link structure. J Mach Learn Res 3:679–707
Girvan M, Newman MEJ (2002) Community structure in social and biological networks. PNAS 99(12):7821–7826
Gregory S (2010) Finding overlapping communities in networks by label propagation. New J Phys 12(10):103018+
Gruber A, Rosen-Zvi M, Weiss Y (2008) Latent topic models for hypertext. In: McAllester DA, Myllymäki P (eds) Proceedings of the 24th conference in uncertainty in artificial intelligence (UAI-08). AUI Press, Corvallis, Oregon, pp 230–239
Hochbaum DS, Shmoys DB (1985) A best possible heuristic for the k-center problem. Math Oper Res 10(2):180–184
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31:(3)264–323
Kalogeratos A, Likas A (2011) Document clustering using synthetic cluster prototypes. Data Knowl Eng 70(3):284–306
Kernighan BW, Lin S (1970) An efficient heuristic procedure for partitioning graphs. Bell Syst Tech J 49(1):291–307
Kim Y, Son SW, Jeong H (2009) Community identification in directed networks. In: Zhou J (ed) Complex sciences, vol 5 of lecture notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering. Springer, pp 2050–2053
Kumar R, Novak J, Tomkins A (2006) Structure and evolution of online social networks. In: Eliassi-Rad T, Ungar LH, Craven M, Gunopulos Dimitrios (eds) Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, NY, pp 611–617
Kwak H, Lee C, Park H, Moon SB (2010) What is Twitter, a social network or a news media? In: World wide web conference series, pp 591–600
Lai D, Lu H, Nardini C (2010) Finding communities in directed networks by pagerank random walk induced network embedding. Physica A Stat Mech Appl 389:2443–2454
Lancichinetti A, Radicchi F, Ramasco JJ (2010) Statistical significance of communities in networks. Phys Rev E 81(4):046110
Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inf Sci Technol 58:1019–1031
Lusseau D, Schneider K, Boisseau OJ, Haase P, Slooten E (2003) The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations. Behav Ecol Sociobiol 54(4):396–405
Newman MEJ (2003) Mixing patterns in networks. Phys Rev E 67(2):026126
Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113
Palla G, Derenyi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435:814
Pothen A, Simon HD, Liou K-P (1990) Partitioning sparse matrices with eigenvectors of graphs. SIAM J Matrix Anal Appl 11(3):430–452
Radicchi F, Castellano C, Cecconi F, Loreto V, Parisi D (2004) Defining and identifying communities in networks. Proc Natl Acad Sci 101(9):2658
Raghavan UN, Albert R, Kumara S (2007) Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E 76(3):036106
Rosvall M, Bergstrom CT (2008) Maps of random walks on complex networks reveal community structure. PNAS 105:1118
Stanoev A, Smilkov D, Kocarev L (2011) Identifying communities by influence dynamics in social networks. CoRR abs/1104.5247. http://arxiv.org/abs/1104.5247
Stephen EE, Fienberg S, Lafferty J (2004) Mixed membership models of scientific publications. Proc Natl Acad Sci 101(suppl 1):5220–5227. doi:10.1073/pnas.0307760101
Topsoe F (2000) Some inequalities for information divergence and related measures of discrimination. IEEE Trans Inf Theory 46(4):1602–1609
Traud AL, Kelsic ED, Mucha PJ, Porter MA (2009) Comparing community structure to characteristics in online collegiate social networks. In: Proceedings of the 2009 APS March meeting
Wang X, Tang L, Liu H, Wang L (2012) Learning with multi-resolution overlapping communities. Knowl Inf Syst 1–19. doi:10.1007/s10115-012-0555-0
White S, Smyth P (2005) A spectral clustering approach to finding communities in graphs. Proc SIAM Int Conf Data Min
Xiang R, Neville J, Rogati M (2010) Modeling relationship strength in online social networks. In: Rappa M, Jones P, Freire J, Chakrabarti S (eds) WWW. ACM, pp 981–990
Xiong X, Niu X, Zhou G, Xu K, Huang Y (2011) Microgroup mining on tsina via network structure and user attribute. In: Tang J, King I, Chen L, Wang J (eds) ADMA (2), vol 7121 of lecture notes in computer science. Springer, pp 138–151
Yan F, Cai S, Zhang M, Liu G, Deng Z (2013) A clique-superposition model for social networks. Sci China Inf Sci 56(5):52113. doi:10.1007/s11432-011-4526-y
Yang T, Chi Y, Zhu S, Gong Y, Jin R (2010) Directed network community detection: a popularity and productivity link model. In: SIAM international conference on data mining, pp 742–753
Yang T, Jin R, Chi Y, Zhu S (2009) Combining link and content for community detection: a discriminative approach. In: Knowledge discovery and data mining. pp 927–936
Zachary W (1977) An information flow model for conflict and fission in small groups. J Anthropol Res 33:452–473
Zhang K, Lo D, Lim E-P, Prasetyo P (2012) Mining indirect antagonistic communities from social interactions. Knowl Inf Syst 1–31. doi:10.1007/s10115-012-0519-4
Zhang T, Ramakrishnan R, Livny M (1997) Birch: a new data clustering algorithm and its applications. Data Min Knowl Discov 1(2):141–182
Zhao J, Wu J, Feng X, Xiong H, Xu K (2012) Information propagation in online social networks: a tie-strength perspective. Knowl Inf Syst 32(3):589–608
Acknowledgments
We thank anonymous reviewers for their useful comments and suggestions. This work was partially supported by the fund of open project from the State Key Lab of Software Development Environment, China (No. SKLSDE-2011KF-06), the National High Technology Research and Development Program of China (863 Program) (No. 2012AA011005), and the State Key Laboratory of Mathematical Engineering and Advanced Computing, China. Part of this research was done when the first author visited the State Key Lab of Software Development Environment, Beihang University, China. We would like to thank Dr. Jichang Zhao, Dr. Xu Feng, and Dr. Xiao Liang for their encouragement and support.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Xiong, X., Zhou, G., Niu, X. et al. Remodeling the network for microgroup detection on microblog. Knowl Inf Syst 39, 643–665 (2014). https://doi.org/10.1007/s10115-013-0626-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-013-0626-x