Skip to main content
Log in

Community detection for emerging social networks

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Many famous online social networks, e.g., Facebook and Twitter, have achieved great success in the last several years. Users in these online social networks can establish various connections via both social links and shared attribute information. Discovering groups of users who are strongly connected internally is defined as the community detection problem. Community detection problem is very important for online social networks and has extensive applications in various social services. Meanwhile, besides these popular social networks, a large number of new social networks offering specific services also spring up in recent years. Community detection can be even more important for new networks as high quality community detection results enable new networks to provide better services, which can help attract more users effectively. In this paper, we will study the community detection problem for new networks, which is formally defined as the “New Network Community Detection” problem. New network community detection problem is very challenging to solve for the reason that information in new networks can be too sparse to calculate effective similarity scores among users, which is crucial in community detection. However, we notice that, nowadays, users usually join multiple social networks simultaneously and those who are involved in a new network may have been using other well-developed social networks for a long time. With full considerations of network difference issues, we propose to propagate useful information from other well-established networks to the new network with efficient information propagation models to overcome the shortage of information problem. An effective and efficient method, Cat (Cold stArT community detector), is proposed in this paper to detect communities for new networks using information from multiple heterogeneous social networks simultaneously. Extensive experiments conducted on real-world heterogeneous online social networks demonstrate that Cat can address the new network community detection problem effectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7

Similar content being viewed by others

Notes

  1. http://expandedramblings.com

  2. http://www.spotify.com

References

  1. Banfield, J., Raftery, A.: Model-based Gaussian and non-Gaussian clustering Biometrics (1993)

  2. Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003)

    Article  MATH  Google Scholar 

  3. Chakrabarti, D.: Autopart: Parameter-Free Graph Partitioning and Outlier Detection PKDD (2004)

    Google Scholar 

  4. Cimiano, P., Hotho, A., Staab, S.: Comparing Conceptual, Divise and Agglomerative Clustering for Learning Taxonomies from Text ECAI (2004)

    Google Scholar 

  5. Gács, P., Lovász, L.: Complexity of algorithms Lecture Notes (1999)

  6. Hastie, T., Tibshirani, R., Friedman, J.: Hierarchical Clustering The Elements of Statistical Learning (2009)

    Chapter  Google Scholar 

  7. Hopner, F., Hoppner, F., Klawonn, F.: Fuzzy Cluster Analysis: Methods for Classification, Data Analysis and Image Recognition (1999)

  8. Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values Data Mining Knowledge Discovery (1998)

  9. Jin, S., Zhang, J., Yu, P., Yang, S., Li, A.: Synergistic partitioning in multiple large scale social networks IEEE BigData (2014)

    Google Scholar 

  10. Khorasgani, R.R., Chen, J., Zaïane. O.R.: Top Leaders Community Detection Approach in Information Networks 4Th SNA-KDD Workshop on Social Network Mining and Analysis, Washington DC. Citeseer (2010)

    Google Scholar 

  11. King, B.: Step-Wise Clustering procedures journal of the american statistical association (1967)

  12. Kong, X., Zhang, J., Yu, P.: Inferring Anchor Links across Multiple Heterogeneous Social Networks CIKM (2013)

    Google Scholar 

  13. Kumpula, J.M., Kivelä, M., Kaski, K., Saramäki, J.: Sequential algorithm for fast clique percolation. Phys. Rev. E 78(2), 026109 (2008)

    Article  Google Scholar 

  14. Kwak, H., Lee, C., Park, H., Moon, S.: What is twitter, a social network or a news media? WWW (2010)

    Google Scholar 

  15. Leskovec, J., Lang, K., Dasgupta, A., Mahoney, M.: Statistical Properties of Community Structure in Large Social and Information Networks WWW (2008)

    Google Scholar 

  16. Lin, C., Cho, Y., Hwang, W., Pei, P., Zhang, A.: Clustering Methods in a Protein-Protein Interaction Network (2007)

  17. Lin, W., Kong, X., Yu, P., Wu, Q., Jia, Y., Li, C.: Community detection in incomplete information networks WWW (2012)

    Google Scholar 

  18. Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J.: Understanding of Internal Clustering Validation Measures ICDM (2010)

    Google Scholar 

  19. Luxburg, U.V.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)

    Article  MathSciNet  Google Scholar 

  20. Meila, M., Shi, J.: A Random Walks View of Spectral Segmentation AISTATS (2001)

    Google Scholar 

  21. Mishra, N., Schreiber, R., Stanton, I., Tarjan, R.: Clustering Social Networks WAW (2007)

    Google Scholar 

  22. Ng, A.Y., Jordan, M.I., Weiss, Y., et al.: On spectral clustering: Analysis and an algorithm. Adv. Neural Inf. Proces. Syst. 2, 849–856 (2002)

    Google Scholar 

  23. Noulas, A., Scellato, S., Mascolo, C., Pontil, M.: An Empirical Study of Geographic User Activity Patterns in Foursquare ICWSM (2011)

    Google Scholar 

  24. Palla, G., Derényi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure of complex networks in nature and society. Nature 435(7043), 814–818 (2005)

    Article  Google Scholar 

  25. Pan, S., Yang, Q.: A survey on transfer learning TKDE (2010)

  26. Panigrahy, R., Najork, M., Xie, Y.: How User Behavior is Related to Social Affinity WSDM (2012)

    Google Scholar 

  27. Petersen, P.: Linear algebra (2012)

  28. Prat-Pérez, A., Dominguez-Sal, D., Brunat, J.M., Larriba-Pey, J.-L.: Shaping Communities out of Triangles Proceedings of the 21St ACM International Conference on Information and Knowledge Management, Pages 1677–1681. \(~~~1\) (2012)

    Google Scholar 

  29. Prat-Pérez, A., Dominguez-Sal, D., Larriba-Pey, J.-L.: High Quality, Scalable and Parallel Community Detection for Large Real Graphs Proceedings of the 23Rd International Conference on World Wide Web, pp 225–236 (2014)

    Google Scholar 

  30. Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76(3), 036106 (2007)

    Article  Google Scholar 

  31. Richardson, M., Domingos, P.: Mining Knowledge-Sharing Sites for Viral Marketing KDD (2002)

    Google Scholar 

  32. Shi, J., Malik, J.: NorMalized cuts and image segmentation TPAMI (2000)

  33. Shi, J., Malik, J.: NorMalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)

    Article  Google Scholar 

  34. Sneath, P., Sokal, R.: Numerical taxonomy the principles and practice of numerical classification (1973)

  35. Sun, Y., Aggarwal, C., Han, J.: Relation strength-aware clustering of heterogeneous information networks with incomplete attributes VLDB (2012)

  36. Tang, J., Gao, H., Hu, X., Liu, H.: Exploiting Homophily Effect for Trust Prediction WSDM (2013)

    Google Scholar 

  37. Tobias, J., Planqué, R., Cram, D., Seddon, N.: Species interactions and the structure of complex communication networks. PNAS (2014)

  38. Trusov, M., Bodapati, A., Bucklin, R.: Determining influential users in internet social networks journal of marketing research (2010)

  39. van Wijk, B., Stam, C., Daffertshofer, A.: Comparing brain networks of different size and connectivity density using graph theory PLos ONE (2010)

  40. Wang, L., Lou, T., Tang, J., Hopcroft, J.: Detecting Community Kernels in Large Social Networks ICDM (2011)

    Google Scholar 

  41. Wang, M., Wang, C., Jeffrey, X.Y., Zhang, J.: Community detection in social networks: an in-depth benchmarking study with a procedure-oriented framework. Proc. VLDB Endowment 8(10), 998–1009 (2015)

    Article  Google Scholar 

  42. Wang, S., Zhang, Z., Li, J.: A scalable cur matrix decomposition algorithm: Lower time complexity and tighter bound. CoRR (2012)

  43. Ward, J.: Hierarchical grouping to optimize an objective function Journal of the American Statistical Association (1963)

  44. Watts, D., Strogatz, S.: Collective dynamics of ’small-world’ networks. Nature (1998)

  45. Zhan, Q., Zhang, S., Wang, J., Yu, P., Xie, J.: Influence Maximization across Partially Aligned Heterogenous Social Networks PAKDD (2015)

    Google Scholar 

  46. Zhang, J., Kong, X., Yu, P.: Predicting social links for new users across aligned heterogeneous social networks ICDM (2013)

    Google Scholar 

  47. Zhang, J., Kong, X., Yu, P.: Transferring Heterogeneous Links across Location-Based Social Networks WSDM (2014)

    Google Scholar 

  48. Zhang, J., Shao, W., Wang, S., Kong, X., Yu, P.: Pna: Partial network alignment with generic stable matching IEEE IRI (2015)

    Google Scholar 

  49. Zhang, J., Yu, P.: Community Detection for Emerging Networks SDM (2015)

    Google Scholar 

  50. Zhang, J., Yu, P.: Integrated anchor and social link predictions across partially aligned social networks IJCAI (2015)

    Google Scholar 

  51. Zhang, J., Yu, P.: Mcd: Mutual Clustering across Multiple Heterogeneous Networks IEEE Bigdata Congress (2015)

    Google Scholar 

  52. Zhang, J., Yu, P., Zhou, Z.: Meta-Path Based Multi-Network Collective Link Prediction KDD (2014)

    Google Scholar 

  53. Zhou, Y., Cheng, H., Yu, J.: Graph clustering based on structural/attribute similarities VLDB (2009)

  54. Zhou, Y., Liu, L.: Social Influence Based Clustering of Heterogeneous Information Networks KDD (2013)

    Google Scholar 

Download references

Acknowledgments

This work is supported in part by NSF through grants IIS-1526499, and CNS-1626432, and NSFC 61672313. It is also supported by the National Key R&D Program of China (Grant No. 2016YFB1001102) and NSFC (Grant No.61375069, 61403156, 61502227) and the Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qianyi Zhan.

Appendix

Appendix

1.1 A Proof of Lemma 2

Proof

The Lemma can be proved by induction on k [26] as follows:Base Case: When k = 1, let p i and λ i be the i th eigenvector and eigenvalue of matrix Q respectively, where Qp i = λ i p i . Organizing all the eigenvectors and eigenvalues of Q in matrix P and Λ, we can have Q 1 P = PΛ1.

Inductive Assumption: When k = m, m ≥ 1, let’s assume the lemma holds when k = m, m ≥ 1. In other words, the following equation holds:

$$\mathbf{Q}^m\mathbf{P} = \mathbf{P}\mathbf{\Lambda}^m. $$

Induction: When k = m+1, m ≥ 1,

$$\mathbf{Q}^{(m + 1)}\mathbf{P} = \mathbf{Q}\mathbf{Q}^m\mathbf{P}= \mathbf{Q}\mathbf{P}\mathbf{\Lambda}^m\\ =\mathbf{P}\mathbf{\Lambda}\mathbf{\Lambda}^m =\mathbf{P}\mathbf{\Lambda}^{(m + 1)}. $$

In sum, the lemma holds for k ≥ 1. □

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhan, Q., Zhang, J., Yu, P. et al. Community detection for emerging social networks. World Wide Web 20, 1409–1441 (2017). https://doi.org/10.1007/s11280-017-0441-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-017-0441-5

Keywords

Navigation