Skip to main content
Log in

Efficient community discovery with user engagement and similarity

The VLDB Journal Aims and scope Submit manuscript

Abstract

In this paper, we investigate the problem of (k,r)-core which intends to find cohesive subgraphs on social networks considering both user engagement and similarity perspectives. In particular, we adopt the popular concept of k-core to guarantee the engagement of the users (vertices) in a group (subgraph) where each vertex in a (k,r)-core connects to at least k other vertices. Meanwhile, we consider the pairwise similarity among users based on their attributes. Efficient algorithms are proposed to enumerate all maximal (k,r)-cores and find the maximum (k,r)-core, where both problems are shown to be NP-hard. Effective pruning techniques substantially reduce the search space of two algorithms. A novel (\(k\),\(k'\))-core based (\(k\),\(r\))-core size upper bound enhances the performance of the maximum (k,r)-core computation. We also devise effective search orders for two algorithms with different search priorities for vertices. Besides, we study the diversified (\(k\),\(r\))-core search problem to find l maximal (\(k\),\(r\))-cores which cover the most vertices in total. These maximal (\(k\),\(r\))-cores are distinctive and informationally rich. An efficient algorithm is proposed with a guaranteed approximation ratio. We design a tight upper bound to prune unpromising partial (\(k\),\(r\))-cores. A new search order is designed to speed up the search. Initial candidates with large size are generated to further enhance the pruning power. Comprehensive experiments on real-life data demonstrate that the maximal (k,r)-cores enable us to find interesting cohesive subgraphs, and performance of three mining algorithms is effectively improved by all the proposed techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25

Similar content being viewed by others

Notes

  1. Following the convention, when the distance metric (e.g., Euclidean distance) is employed, we say two vertices are similar if their distance is not larger than the given distance threshold.

  2. To avoid the noise, we enforce that there are at least three co-authored papers between two connected authors in the case study.

  3. To make the figure clear, in this case study, one edge represents that there are at least three co-authored papers between two corresponding authors.

References

  1. Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying search results. In: WSDM, pp. 5–14 (2009)

  2. Angel, A., Koudas, N.: Efficient diversity-aware search. In: SIGMOD, pp. 781–792 (2011)

  3. Ausiello, G., Boria, N., Giannakos, A., Lucarelli, G., Paschos, V.T.: Online maximum k-coverage. Discrete Appl. Math. 160(13–14), 1901–1913 (2012)

    Article  MathSciNet  Google Scholar 

  4. Badanidiyuru, A., Mirzasoleiman, B., Karbasi, A., Krause, A.: Streaming submodular maximization: massive data summarization on the fly. In: KDD, pp. 671–680 (2014)

  5. Batagelj, V., Zaversnik, M.: An o(m) algorithm for cores decomposition of networks. In: CoRR, cs.DS/0310049 (2003)

  6. Bhawalkar, K., Kleinberg, J.M., Lewi, K., Roughgarden, T., Sharma, A.: Preventing unraveling in social networks: the anchored k-core problem. SIAM J. Discrete Math. 29(3), 1452–1475 (2015)

    Article  MathSciNet  Google Scholar 

  7. Bird, C., Gourley, A., Devanbu, P. T., Gertz, M., Swaminathan, A.: Mining email social networks. In: MSR, pp. 137–143 (2006)

  8. Borodin, A., Lee, H.C., Ye, Y.: Max-sum diversification, monotone submodular functions and dynamic updates. In: PODS, pp. 155–166 (2012)

  9. Bron, C., Kerbosch, J.: Finding all cliques of an undirected graph (algorithm 457). Commun. ACM 16(9), 575–576 (1973)

    Article  Google Scholar 

  10. Chang, L.: Efficient maximum clique computation over large sparse graphs. In: SIGKDD, pp. 529–538 (2019)

  11. Chang, L., Yu, J.X., Qin, L.: Fast maximal cliques enumeration in sparse graphs. Algorithmica 66(1), 173–186 (2013)

    Article  MathSciNet  Google Scholar 

  12. Chen, K., Lei, C.: Network game design: hints and implications of player interaction. In: NETGAMES, p. 17 (2006)

  13. Chen, L., Liu, C., Zhou, R., Li, J., Yang, X., Wang, B.: Maximum co-located community search in large scale social networks. PVLDB 11(10), 1233–1246 (2018)

    Google Scholar 

  14. Cheng, J., Zhu, L., Ke, Y., Chu, S.: Fast algorithms for maximal clique enumeration with limited memory. In: KDD, pp. 1240–1248 (2012)

  15. Clark, B.N., Colbourn, C.J., Johnson, D.S.: Unit disk graphs. Discrete Math. 86(1–3), 165–177 (1990)

    Article  MathSciNet  Google Scholar 

  16. Deng, T., Fan, W.: On the complexity of query result diversification. PVLDB 6(8), 577–588 (2013)

    Google Scholar 

  17. Drosou, M., Pitoura, E.: Search result diversification. SIGMOD Rec. 39(1), 41–47 (2010)

    Article  Google Scholar 

  18. Eppstein, D., Strash, D.: Listing all maximal cliques in large sparse real-world graphs. In: SEA, pp. 364–375 (2011)

  19. Facebook. How does facebook suggest groups for me? https://www.facebook.com/help/382485908586472?helpref=uf_permalink. Accessed 16 Sep 2019

  20. Fan, W., Wang, X., Wu, Y.: Diversified top-k graph pattern matching. PVLDB 6(13), 1510–1521 (2013)

    Google Scholar 

  21. Fang, Y., Cheng, R., Li, X., Luo, S., Hu, J.: Effective community search over large spatial graphs. PVLDB 10(6), 709–720 (2017)

    Google Scholar 

  22. Fang, Y., Cheng, R., Luo, S., Hu, J.: Effective community search for large attributed graphs. PVLDB 9(12), 1233–1244 (2016)

    Google Scholar 

  23. Fang, Y., Zhang, H., Ye, Y., Li, X.: Detecting hot topics from twitter: a multiview approach. J. Inf. Sci. 40(5), 578–593 (2014)

    Article  Google Scholar 

  24. Feige, U.: A threshold of ln n for approximating set cover. J. ACM 45(4), 634–652 (1998)

    Article  MathSciNet  Google Scholar 

  25. Ferrara, E., JafariAsbagh, M., Varol, O., Qazvinian, V., Menczer, F., Flammini, A.: Clustering memes in social media. In: ASONAM, pp. 548–555 (2013)

  26. Garey, M.R., Johnson, D.S.: The complexity of near-optimal graph coloring. JACM 23(1), 43–49 (1976)

    Article  MathSciNet  Google Scholar 

  27. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H Freeman, New York (1979)

    MATH  Google Scholar 

  28. Goldberg, M.K., Kelley, S., Magdon-Ismail, M., Mertsalov, K., Wallace, A.: Finding overlapping communities in social networks. In: SocialCom/PASSAT, pp. 104–113 (2010)

  29. Gupta, R., Walrand, J., Goldschmidt, O.: Maximal cliques in unit disk graphs: polynomial approximation. In: Proceedings INOC, vol. 2005. Citeseer (2005)

  30. Hristova, D., Musolesi, M., Mascolo, C.: Keep your friends close and your facebook friends closer: A multiplex network approach to the analysis of offline and online social ties. In: ICWSM (2014)

  31. Huang, X., Lu, W., Lakshmanan, L.V.S.: Truss decomposition of probabilistic graphs: Semantics and algorithms. In: SIGMOD, pp. 77–90 (2016)

  32. Pfeiffer, J.J III., Moreno, S., Fond, T.L., Neville, J., Gallagher, B.: Attributed graph models: modeling network structure with correlated attributes. In: WWW, pp. 831–842 (2014)

  33. Izumi, T., Suzuki, D.: Faster enumeration of all maximal cliques in unit disk graphs using geometric structure. IEICE Trans. 98–D(3), 490–496 (2015)

    Article  Google Scholar 

  34. Kitsak, M., Gallos, L.K., Havlin, S., Liljeros, F., Muchnik, L., Stanley, H.E., Makse, H.A.: Identification of influential spreaders in complex networks. Nat. Phys. 6(11), 888–893 (2010)

    Article  Google Scholar 

  35. Lee, P., Lakshmanan, L.V.S., Milios, E.E.: CAST: a context-aware story-teller for streaming social content. In: CIKM, pp. 789–798 (2014)

  36. Lin, X., Yuan, Y., Zhang, Q., Zhang, Y.: Selecting stars: the k most representative skyline operator. In: ICDE, pp. 86–95 (2007)

  37. Liu, Y., Sutanto, J.: Buyers purchasing time and herd behavior on deal-of-the-day group-buying websites. Electron. Mark. 22(2), 83–93 (2012)

    Article  Google Scholar 

  38. Luo, M.M., Chea, S.: The effect of social rewards and perceived effectiveness of e-commerce institutional mechanisms on intention to group buying. In: Advances in Human Factors, Business Management, Training and Education, pp. 833–840. Springer, Berlin (2017)

    Google Scholar 

  39. Luo, X., Andrews, M., Song, Y., Aspara, J.: Group-buying deal popularity. J. Mark. 78(2), 20–33 (2014)

    Article  Google Scholar 

  40. Malliaros, F.D., Vazirgiannis, M.: To stay or not to stay: modeling engagement dynamics in social graphs. In: CIKM, pp. 469–478 (2013)

  41. Minack, E., Siberski, W., Nejdl, W.: Incremental diversification for very large sets: a streaming-based approach. In: SIGIR, pp. 585–594 (2011)

  42. Mitzlaff, F., Atzmüller, M., Hotho, A., Stumme, G.: The social distributional hypothesis: a pragmatic proxy for homophily in online social networks. Soc. Netw. Anal. Min. 4(1), 216 (2014)

    Article  Google Scholar 

  43. PokemonGo. Developer insights: Inside the philosophy of friends and trading. https://pokemongolive.com/en/post/jundevupdate-trading/. Accessed 16 Sep 2019

  44. Qin, L., Yu, J.X., Chang, L.: Diversifying top-k results. PVLDB 5(11), 1124–1135 (2012)

    Google Scholar 

  45. Seidman, S.B.: Network structure and minimum degree. Soc. Netw. 5(3), 269–287 (1983)

    Article  MathSciNet  Google Scholar 

  46. Sharma, P., Govindan, S.: Information seeking behavior of expats in asia on facebook open groups. Singap. J. Libr. Inf. Manag. 44, 35 (2016)

    Google Scholar 

  47. Singla, P., Richardson, M.: Yes, there is a correlation—from social networks to personal behavior on the web. In: WWW, pp. 655–664 (2008)

  48. Statista. Number of active users of pokemon go worldwide from 2016 to 2020, by region (in millions). https://www.statista.com/statistics/665640. Accessed 16 Sep 2019

  49. Ugander, J., Backstrom, L., Marlow, C., Kleinberg, J.: Structural diversity in social contagion. PNAS 109(16), 5962–5966 (2012)

    Article  Google Scholar 

  50. Vieira, M.R., Razente, H.L., Barioni, M.C.N., Hadjieleftheriou, M., Srivastava, D., Traina, Jr. C., Tsotras, V.J.: On query result diversification. In: ICDE, pp. 1163–1174 (2011)

  51. Wang, J., Cheng, J., Fu, A.W.: Redundancy-aware maximal cliques. In: KDD, pp. 122–130 (2013)

  52. Wang, K., Cao, X., Lin, X., Zhang, W., Qin, L.: Efficient computing of radius-bounded k-cores. In: ICDE, pp. 233–244 (2018)

  53. Wang, K., Lin, X., Qin, L., Zhang, W., Zhang, Y.: Vertex priority based butterfly counting for large-scale bipartite networks. PVLDB 12(10), 1139–1152 (2019)

    Google Scholar 

  54. Wen, D., Qin, L., Zhang, Y., Lin, X., Yu, J.X.: I/O efficient core graph decomposition at web scale. In: ICDE, pp. 133–144 (2016)

  55. Wu, S., Sarma, A.D., Fabrikant, A., Lattanzi, S., Tomkins, A.: Arrival and departure dynamics in social networks. In: WSDM, pp. 233–242 (2013)

  56. Wu, Y., Jin, R., Zhu, X., Zhang, X.: Finding dense and connected subgraphs in dual networks. In: ICDE, pp. 915–926 (2015)

  57. Xu, Z., Ke, Y., Wang, Y., Cheng, H., Cheng, J.: A model-based approach to attributed graph clustering. In: SIGMOD, pp. 505–516 (2012)

  58. Yang, J., McAuley, J.J., Leskovec, J.: Community detection in networks with node attributes. In: ICDM, pp. 1151–1156 (2013)

  59. Yu, H., Yuan, D.: Set coverage problems in a one-pass data stream. In: SDM, pp. 758–766 (2013)

  60. Yuan, L., Qin, L., Lin, X., Chang, L., Zhang, W.: Diversified top-k clique search. In: ICDE, pp. 387–398 (2015)

  61. Yuan, Q., Zhao, S., Chen, L., Liu, Y., Ding, S., Zhang, X., Zheng, W.: Augmenting collaborative recommender by fusing explicit social relationships. In: Recsys Workshop (2009)

  62. Zhang, F., Yuan, L., Zhang, Y., Qin, L., Lin, X., Zhou, A.: Discovering strong communities with user engagement and tie strength. In: DASFAA, pp. 425–441 (2018)

  63. Zhang, F., Zhang, W., Zhang, Y., Qin, L., Lin, X.: OLAK: an efficient algorithm to prevent unraveling in social networks. PVLDB 10(6), 649–660 (2017)

    Google Scholar 

  64. Zhang, F., Zhang, Y., Qin, L., Zhang, W., Lin, X.: Finding critical users for social network engagement: the collapsed k-core problem. In: AAAI, pp. 245–251 (2017)

  65. Zhang, F., Zhang, Y., Qin, L., Zhang, W., Lin, X.: When engagement meets similarity: efficient (k, r)-core computation on social networks. PVLDB 10(10), 998–1009 (2017)

    Google Scholar 

  66. Zhang, F., Zhang, Y., Qin, L., Zhang, W., Lin, X.: Efficiently reinforcing social networks over user engagement and tie strength. In: ICDE, pp. 557–568 (2018)

  67. Zhang, Y., Qin, L., Zhang, F., Zhang, W.: Hierarchical decomposition of big graphs. In: ICDE, pp. 2064–2067 (2019)

  68. Zhou, Z., Zhang, F., Lin, X., Zhang, W., Chen, C.: K-core maximization: An edge addition approach. In: IJCAI, pp. 4867–4873 (2019)

  69. Zhu, Q., Hu, H., Xu, C., Xu, J., Lee, W.: Geo-social group queries with minimum acquaintance constraints. VLDB J. 26(5), 709–727 (2017)

    Article  Google Scholar 

Download references

Acknowledgements

Xuemin Lin is supported by 2018YFB1003504, NSFC61232006, ARC DP180103096 and DP170101628. Ying Zhang is supported by ARC DP180103096 and FT170100128. Lu Qin is supported by ARC DP160101513. Wenjie Zhang is supported by ARC DP180103096.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fan Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, F., Lin, X., Zhang, Y. et al. Efficient community discovery with user engagement and similarity. The VLDB Journal 28, 987–1012 (2019). https://doi.org/10.1007/s00778-019-00579-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-019-00579-4

Keywords

Navigation