Skip to main content
Log in

Remodeling the network for microgroup detection on microblog

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

In this paper, we focus on the problem of community detection on Sina weibo, the most popular microblogging system in China. By characterizing the structure and content of microgroup (community) on Sina weibo in detail, we observe that different from ordinary social networks, the degree assortativity coefficients are negative on most microgroups. In addition, we find that users from the same microgroup tend to share some common attributes (e.g., followers, tags) and interests extracted from their published posts. Inspired by these new findings, we propose a united method to remodel the network for microgroup detection while maintaining the information of link structure and user content. Firstly, the link direction is concerned by assigning greater weight values to more surprising links, while the content similarity is measured by the Jaccard coefficient of common features and interest similarity based on Latent Dirichlet Allocation model. Then, both link direction and content similarity between two users are uniformly converted to the edge weight of a new remodeled network, which is undirected and weighted. Finally, multiple frequently used community detection algorithms that support weighted networks could be employed. Extensive experiments on real-world social networks show that both link structure and user content play almost equally important roles in microgroup detection on Sina weibo. Our method outperforms the traditional methods with average accuracy improvement up to 39 %, and the number of unrecognized users decreased by about 75 %.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: SIGMOD conference’98. pp 94–105

  2. Andreopoulos B, An A, Wang X, Schroeder M (2009) A roadmap of clustering algorithms: finding a match for a biomedical application. Brief Bioinform 10(3):297–314

    Google Scholar 

  3. Arenas A, Díaz-Guilera A, Pérez-Vicente CJ (2006) Synchronization reveals topological scales in complex networks. Phys Rev Lett 96(11):114102

    Article  Google Scholar 

  4. Cha M, Mislove A, Gummadi PK (2009) A measurement-driven analysis of information propagation in the flickr social network. In: World wide web conference series, pp 721–730

  5. Cheeseman P, Stutz J (1996) Advances in knowledge discovery and data mining. American Association for Artificial Intelligence, Menlo Park, CA

    Google Scholar 

  6. Clauset A, Newman MEJ, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70(6):066111

    Article  Google Scholar 

  7. Cohn DA, Hofmann T (2001) The missing link—a probabilistic model of document content and hypertext connectivity. In: Leen TK, Dietterich TG, Tresp V (eds) Advances in Neural information processing systems 13. MIT Press, pp 430–436

  8. Danon L, Duch J, Arenas A, Daz-guilera A (2005) Comparing community structure identification. J Stat Mech Theory Exp 9008:09008

    Article  Google Scholar 

  9. Dietz L, Bickel S, Scheffer T (2007) Unsupervised prediction of citation influences. In: Proceedings of the 24th international conference on machine learning, pp 233–240

  10. Duan D, Li Y, Jin Y, Lu Z (2009) Community mining on dynamic weighted directed graphs. In: Proceedings of international conference on information and knowledge management, pp 11–18

  11. Endres DM, Schindelin JE (2003) A new metric for probability distributions. IEEE Trans Inf Theory 49(7):1858–1860

    Article  MathSciNet  Google Scholar 

  12. Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD’96, pp 226–231

  13. Flake G, Lawrence S, Giles C, Coetzee F (2002) Self-organization and identification of Web communities. Computer 35(3):66–70

    Article  Google Scholar 

  14. Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174

    Article  MathSciNet  Google Scholar 

  15. Fortunato S, Castellano C (2007) Community structure in graphs. eprint arXiv: 0712.2716

  16. Getoor L, Friedman N, Koller D, Taskar B (2002) Learning probabilistic models of link structure. J Mach Learn Res 3:679–707

    MathSciNet  Google Scholar 

  17. Girvan M, Newman MEJ (2002) Community structure in social and biological networks. PNAS 99(12):7821–7826

    Article  MATH  MathSciNet  Google Scholar 

  18. Gregory S (2010) Finding overlapping communities in networks by label propagation. New J Phys 12(10):103018+

    Article  Google Scholar 

  19. Gruber A, Rosen-Zvi M, Weiss Y (2008) Latent topic models for hypertext. In: McAllester DA, Myllymäki P (eds) Proceedings of the 24th conference in uncertainty in artificial intelligence (UAI-08). AUI Press, Corvallis, Oregon, pp 230–239

  20. Hochbaum DS, Shmoys DB (1985) A best possible heuristic for the k-center problem. Math Oper Res 10(2):180–184

    Article  MATH  MathSciNet  Google Scholar 

  21. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31:(3)264–323

    Google Scholar 

  22. Kalogeratos A, Likas A (2011) Document clustering using synthetic cluster prototypes. Data Knowl Eng 70(3):284–306

    Article  Google Scholar 

  23. Kernighan BW, Lin S (1970) An efficient heuristic procedure for partitioning graphs. Bell Syst Tech J 49(1):291–307

    Article  MATH  Google Scholar 

  24. Kim Y, Son SW, Jeong H (2009) Community identification in directed networks. In: Zhou J (ed) Complex sciences, vol 5 of lecture notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering. Springer, pp 2050–2053

  25. Kumar R, Novak J, Tomkins A (2006) Structure and evolution of online social networks. In: Eliassi-Rad T, Ungar LH, Craven M, Gunopulos Dimitrios (eds) Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, NY, pp 611–617

  26. Kwak H, Lee C, Park H, Moon SB (2010) What is Twitter, a social network or a news media? In: World wide web conference series, pp 591–600

  27. Lai D, Lu H, Nardini C (2010) Finding communities in directed networks by pagerank random walk induced network embedding. Physica A Stat Mech Appl 389:2443–2454

    Article  Google Scholar 

  28. Lancichinetti A, Radicchi F, Ramasco JJ (2010) Statistical significance of communities in networks. Phys Rev E 81(4):046110

    Article  MathSciNet  Google Scholar 

  29. Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inf Sci Technol 58:1019–1031

    Article  Google Scholar 

  30. Lusseau D, Schneider K, Boisseau OJ, Haase P, Slooten E (2003) The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations. Behav Ecol Sociobiol 54(4):396–405

    Article  Google Scholar 

  31. Newman MEJ (2003) Mixing patterns in networks. Phys Rev E 67(2):026126

    Article  MathSciNet  Google Scholar 

  32. Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113

    Article  Google Scholar 

  33. Palla G, Derenyi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435:814

    Article  Google Scholar 

  34. Pothen A, Simon HD, Liou K-P (1990) Partitioning sparse matrices with eigenvectors of graphs. SIAM J Matrix Anal Appl 11(3):430–452

    Article  MATH  MathSciNet  Google Scholar 

  35. Radicchi F, Castellano C, Cecconi F, Loreto V, Parisi D (2004) Defining and identifying communities in networks. Proc Natl Acad Sci 101(9):2658

    Article  Google Scholar 

  36. Raghavan UN, Albert R, Kumara S (2007) Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E 76(3):036106

    Article  Google Scholar 

  37. Rosvall M, Bergstrom CT (2008) Maps of random walks on complex networks reveal community structure. PNAS 105:1118

    Article  Google Scholar 

  38. Stanoev A, Smilkov D, Kocarev L (2011) Identifying communities by influence dynamics in social networks. CoRR abs/1104.5247. http://arxiv.org/abs/1104.5247

  39. Stephen EE, Fienberg S, Lafferty J (2004) Mixed membership models of scientific publications. Proc Natl Acad Sci 101(suppl 1):5220–5227. doi:10.1073/pnas.0307760101

    Google Scholar 

  40. Topsoe F (2000) Some inequalities for information divergence and related measures of discrimination. IEEE Trans Inf Theory 46(4):1602–1609

    Article  MathSciNet  Google Scholar 

  41. Traud AL, Kelsic ED, Mucha PJ, Porter MA (2009) Comparing community structure to characteristics in online collegiate social networks. In: Proceedings of the 2009 APS March meeting

  42. Wang X, Tang L, Liu H, Wang L (2012) Learning with multi-resolution overlapping communities. Knowl Inf Syst 1–19. doi:10.1007/s10115-012-0555-0

  43. White S, Smyth P (2005) A spectral clustering approach to finding communities in graphs. Proc SIAM Int Conf Data Min

  44. Xiang R, Neville J, Rogati M (2010) Modeling relationship strength in online social networks. In: Rappa M, Jones P, Freire J, Chakrabarti S (eds) WWW. ACM, pp 981–990

  45. Xiong X, Niu X, Zhou G, Xu K, Huang Y (2011) Microgroup mining on tsina via network structure and user attribute. In: Tang J, King I, Chen L, Wang J (eds) ADMA (2), vol 7121 of lecture notes in computer science. Springer, pp 138–151

  46. Yan F, Cai S, Zhang M, Liu G, Deng Z (2013) A clique-superposition model for social networks. Sci China Inf Sci 56(5):52113. doi:10.1007/s11432-011-4526-y

    Google Scholar 

  47. Yang T, Chi Y, Zhu S, Gong Y, Jin R (2010) Directed network community detection: a popularity and productivity link model. In: SIAM international conference on data mining, pp 742–753

  48. Yang T, Jin R, Chi Y, Zhu S (2009) Combining link and content for community detection: a discriminative approach. In: Knowledge discovery and data mining. pp 927–936

  49. Zachary W (1977) An information flow model for conflict and fission in small groups. J Anthropol Res 33:452–473

    Google Scholar 

  50. Zhang K, Lo D, Lim E-P, Prasetyo P (2012) Mining indirect antagonistic communities from social interactions. Knowl Inf Syst 1–31. doi:10.1007/s10115-012-0519-4

  51. Zhang T, Ramakrishnan R, Livny M (1997) Birch: a new data clustering algorithm and its applications. Data Min Knowl Discov 1(2):141–182

    Google Scholar 

  52. Zhao J, Wu J, Feng X, Xiong H, Xu K (2012) Information propagation in online social networks: a tie-strength perspective. Knowl Inf Syst 32(3):589–608

    Article  Google Scholar 

Download references

Acknowledgments

We thank anonymous reviewers for their useful comments and suggestions. This work was partially supported by the fund of open project from the State Key Lab of Software Development Environment, China (No. SKLSDE-2011KF-06), the National High Technology Research and Development Program of China (863 Program) (No. 2012AA011005), and the State Key Laboratory of Mathematical Engineering and Advanced Computing, China. Part of this research was done when the first author visited the State Key Lab of Software Development Environment, Beihang University, China. We would like to thank Dr. Jichang Zhao, Dr. Xu Feng, and Dr. Xiao Liang for their encouragement and support.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Xiaobing Xiong or Ke Xu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xiong, X., Zhou, G., Niu, X. et al. Remodeling the network for microgroup detection on microblog. Knowl Inf Syst 39, 643–665 (2014). https://doi.org/10.1007/s10115-013-0626-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-013-0626-x

Keywords

Navigation