Abstract
Many famous online social networks, e.g., Facebook and Twitter, have achieved great success in the last several years. Users in these online social networks can establish various connections via both social links and shared attribute information. Discovering groups of users who are strongly connected internally is defined as the community detection problem. Community detection problem is very important for online social networks and has extensive applications in various social services. Meanwhile, besides these popular social networks, a large number of new social networks offering specific services also spring up in recent years. Community detection can be even more important for new networks as high quality community detection results enable new networks to provide better services, which can help attract more users effectively. In this paper, we will study the community detection problem for new networks, which is formally defined as the “New Network Community Detection” problem. New network community detection problem is very challenging to solve for the reason that information in new networks can be too sparse to calculate effective similarity scores among users, which is crucial in community detection. However, we notice that, nowadays, users usually join multiple social networks simultaneously and those who are involved in a new network may have been using other well-developed social networks for a long time. With full considerations of network difference issues, we propose to propagate useful information from other well-established networks to the new network with efficient information propagation models to overcome the shortage of information problem. An effective and efficient method, Cat (Cold stArT community detector), is proposed in this paper to detect communities for new networks using information from multiple heterogeneous social networks simultaneously. Extensive experiments conducted on real-world heterogeneous online social networks demonstrate that Cat can address the new network community detection problem effectively.
Similar content being viewed by others
References
Banfield, J., Raftery, A.: Model-based Gaussian and non-Gaussian clustering Biometrics (1993)
Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003)
Chakrabarti, D.: Autopart: Parameter-Free Graph Partitioning and Outlier Detection PKDD (2004)
Cimiano, P., Hotho, A., Staab, S.: Comparing Conceptual, Divise and Agglomerative Clustering for Learning Taxonomies from Text ECAI (2004)
Gács, P., Lovász, L.: Complexity of algorithms Lecture Notes (1999)
Hastie, T., Tibshirani, R., Friedman, J.: Hierarchical Clustering The Elements of Statistical Learning (2009)
Hopner, F., Hoppner, F., Klawonn, F.: Fuzzy Cluster Analysis: Methods for Classification, Data Analysis and Image Recognition (1999)
Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values Data Mining Knowledge Discovery (1998)
Jin, S., Zhang, J., Yu, P., Yang, S., Li, A.: Synergistic partitioning in multiple large scale social networks IEEE BigData (2014)
Khorasgani, R.R., Chen, J., Zaïane. O.R.: Top Leaders Community Detection Approach in Information Networks 4Th SNA-KDD Workshop on Social Network Mining and Analysis, Washington DC. Citeseer (2010)
King, B.: Step-Wise Clustering procedures journal of the american statistical association (1967)
Kong, X., Zhang, J., Yu, P.: Inferring Anchor Links across Multiple Heterogeneous Social Networks CIKM (2013)
Kumpula, J.M., Kivelä, M., Kaski, K., Saramäki, J.: Sequential algorithm for fast clique percolation. Phys. Rev. E 78(2), 026109 (2008)
Kwak, H., Lee, C., Park, H., Moon, S.: What is twitter, a social network or a news media? WWW (2010)
Leskovec, J., Lang, K., Dasgupta, A., Mahoney, M.: Statistical Properties of Community Structure in Large Social and Information Networks WWW (2008)
Lin, C., Cho, Y., Hwang, W., Pei, P., Zhang, A.: Clustering Methods in a Protein-Protein Interaction Network (2007)
Lin, W., Kong, X., Yu, P., Wu, Q., Jia, Y., Li, C.: Community detection in incomplete information networks WWW (2012)
Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J.: Understanding of Internal Clustering Validation Measures ICDM (2010)
Luxburg, U.V.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)
Meila, M., Shi, J.: A Random Walks View of Spectral Segmentation AISTATS (2001)
Mishra, N., Schreiber, R., Stanton, I., Tarjan, R.: Clustering Social Networks WAW (2007)
Ng, A.Y., Jordan, M.I., Weiss, Y., et al.: On spectral clustering: Analysis and an algorithm. Adv. Neural Inf. Proces. Syst. 2, 849–856 (2002)
Noulas, A., Scellato, S., Mascolo, C., Pontil, M.: An Empirical Study of Geographic User Activity Patterns in Foursquare ICWSM (2011)
Palla, G., Derényi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure of complex networks in nature and society. Nature 435(7043), 814–818 (2005)
Pan, S., Yang, Q.: A survey on transfer learning TKDE (2010)
Panigrahy, R., Najork, M., Xie, Y.: How User Behavior is Related to Social Affinity WSDM (2012)
Petersen, P.: Linear algebra (2012)
Prat-Pérez, A., Dominguez-Sal, D., Brunat, J.M., Larriba-Pey, J.-L.: Shaping Communities out of Triangles Proceedings of the 21St ACM International Conference on Information and Knowledge Management, Pages 1677–1681. \(~~~1\) (2012)
Prat-Pérez, A., Dominguez-Sal, D., Larriba-Pey, J.-L.: High Quality, Scalable and Parallel Community Detection for Large Real Graphs Proceedings of the 23Rd International Conference on World Wide Web, pp 225–236 (2014)
Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76(3), 036106 (2007)
Richardson, M., Domingos, P.: Mining Knowledge-Sharing Sites for Viral Marketing KDD (2002)
Shi, J., Malik, J.: NorMalized cuts and image segmentation TPAMI (2000)
Shi, J., Malik, J.: NorMalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)
Sneath, P., Sokal, R.: Numerical taxonomy the principles and practice of numerical classification (1973)
Sun, Y., Aggarwal, C., Han, J.: Relation strength-aware clustering of heterogeneous information networks with incomplete attributes VLDB (2012)
Tang, J., Gao, H., Hu, X., Liu, H.: Exploiting Homophily Effect for Trust Prediction WSDM (2013)
Tobias, J., Planqué, R., Cram, D., Seddon, N.: Species interactions and the structure of complex communication networks. PNAS (2014)
Trusov, M., Bodapati, A., Bucklin, R.: Determining influential users in internet social networks journal of marketing research (2010)
van Wijk, B., Stam, C., Daffertshofer, A.: Comparing brain networks of different size and connectivity density using graph theory PLos ONE (2010)
Wang, L., Lou, T., Tang, J., Hopcroft, J.: Detecting Community Kernels in Large Social Networks ICDM (2011)
Wang, M., Wang, C., Jeffrey, X.Y., Zhang, J.: Community detection in social networks: an in-depth benchmarking study with a procedure-oriented framework. Proc. VLDB Endowment 8(10), 998–1009 (2015)
Wang, S., Zhang, Z., Li, J.: A scalable cur matrix decomposition algorithm: Lower time complexity and tighter bound. CoRR (2012)
Ward, J.: Hierarchical grouping to optimize an objective function Journal of the American Statistical Association (1963)
Watts, D., Strogatz, S.: Collective dynamics of ’small-world’ networks. Nature (1998)
Zhan, Q., Zhang, S., Wang, J., Yu, P., Xie, J.: Influence Maximization across Partially Aligned Heterogenous Social Networks PAKDD (2015)
Zhang, J., Kong, X., Yu, P.: Predicting social links for new users across aligned heterogeneous social networks ICDM (2013)
Zhang, J., Kong, X., Yu, P.: Transferring Heterogeneous Links across Location-Based Social Networks WSDM (2014)
Zhang, J., Shao, W., Wang, S., Kong, X., Yu, P.: Pna: Partial network alignment with generic stable matching IEEE IRI (2015)
Zhang, J., Yu, P.: Community Detection for Emerging Networks SDM (2015)
Zhang, J., Yu, P.: Integrated anchor and social link predictions across partially aligned social networks IJCAI (2015)
Zhang, J., Yu, P.: Mcd: Mutual Clustering across Multiple Heterogeneous Networks IEEE Bigdata Congress (2015)
Zhang, J., Yu, P., Zhou, Z.: Meta-Path Based Multi-Network Collective Link Prediction KDD (2014)
Zhou, Y., Cheng, H., Yu, J.: Graph clustering based on structural/attribute similarities VLDB (2009)
Zhou, Y., Liu, L.: Social Influence Based Clustering of Heterogeneous Information Networks KDD (2013)
Acknowledgments
This work is supported in part by NSF through grants IIS-1526499, and CNS-1626432, and NSFC 61672313. It is also supported by the National Key R&D Program of China (Grant No. 2016YFB1001102) and NSFC (Grant No.61375069, 61403156, 61502227) and the Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing University.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 A Proof of Lemma 2
Proof
The Lemma can be proved by induction on k [26] as follows:Base Case: When k = 1, let p i and λ i be the i th eigenvector and eigenvalue of matrix Q respectively, where Qp i = λ i p i . Organizing all the eigenvectors and eigenvalues of Q in matrix P and Λ, we can have Q 1 P = PΛ1.
Inductive Assumption: When k = m, m ≥ 1, let’s assume the lemma holds when k = m, m ≥ 1. In other words, the following equation holds:
Induction: When k = m+1, m ≥ 1,
In sum, the lemma holds for k ≥ 1. □
Rights and permissions
About this article
Cite this article
Zhan, Q., Zhang, J., Yu, P. et al. Community detection for emerging social networks. World Wide Web 20, 1409–1441 (2017). https://doi.org/10.1007/s11280-017-0441-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-017-0441-5