Skip to main content
Log in

Extracting representative user subset of social networks towards user characteristics and topological features

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Extracting a subset of representative users from the original set in social networks plays a critical role in Social Network Analysis. In existing studies, some researchers focus on preserving users’ characteristics when sampling representative users, while others pay attention to preserving the topology structure. However, both users’ characteristics and the network topology contain abundant information of users. Thus, it is critical to preserve both of them while extracting the representative user subset. To achieve the goal, we propose a novel approach in this study, and formulate the problem as RUS (Representative User Subset) problem that is proved as an NP-Hard problem. To solve RUS problem, we propose two approaches KS (K-Selected) and an optimized method (ACS) that are both consisted of a clustering algorithm and a sampling model, where a greedy heuristic algorithm is proposed to solve the sampling model. In addition, we propose the pruning strategy by taking advantage of MaxHeap structure. To validate the performance of the proposed approach, extensive experiments are conducted on two real-world datasets. Results demonstrate that our methods outperform state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. d1: the number of user’s followers, d2: the number of user’s friends, d3: the score of user’s influence, d4: the score of user’s activity, d5: the number of tweets, d6: times of tweets being “liked”, d7: times of tweets being “retweeted”, d8: address, d9: words and phrases

  2. http://scikit-learn.org/stable/modules/naive_bayesḣtml#multinomial-naive-bayes

  3. http://scikit-learn.org/stable/modules/ensembleḣtml#random-forests

  4. http://networkx.github.io/

References

  1. Anagnostopoulos, A., Kumar, R., Mahdian, M.: Influence and correlation in social networks. In: KDD, pp 7–15 (2008)

  2. Aslam, J.A., Montague, M.: Models for metasearch. SIGIR 276–284 (2001)

  3. Chen, W., Wang, Y., Yang, S.: Efficient influence maximization in social networks. In: KDD, pp 199–208 (2009)

  4. Chen, Y.-C.: A novel algorithm for mining opinion leaders in social networks. World Wide Web 22(3), 1279–1295 (2019)

    Article  Google Scholar 

  5. Clauset, A., Newman, M.E., Moore, C: Finding community structure in very large networks. Phys. Rev. E 70(2), 066111 (2004)

    Article  Google Scholar 

  6. Crandall, D.J., Cosley, D., Huttenlocher, D.P., Kleinberg, J.M., Suri, S.: Feedback effects between similarity and social influence in online communities. In: KDD, pp 160–168 (2008)

  7. Duda, R.O., Hart, P.E.: Pattern classification and scene analysis. Tronto A Wiley-Interscience Publication, New York (1973)

    MATH  Google Scholar 

  8. Elhamifar, E., Sapiro, G., Sastry, S.S: Dissimilarity-based sparse subset selection. IEEE Trans. Pattern Anal. Intell. 38(11), 2182–2197 (2016)

    Article  Google Scholar 

  9. Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp 226–231 (1996)

  10. Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)

    Article  MathSciNet  Google Scholar 

  11. Girvan, M, Newman, M.E.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 99(12), 7821 (2002)

    Article  MathSciNet  Google Scholar 

  12. Goyal, A., Bonchi, F., Lakshmanan, L.V.S.: Discovering leaders from community actions. In: CIKM, pp 499–508 (2008)

  13. Han, Y., Tang, J.: Probabilistic community and role model for social networks. In: KDD, pp 407–416 (2015)

  14. Hinton, G.E.: Visualizing high-dimensional data using t-sne. Vigiliae Christianae 9, 2579–2605,01 (2008)

    MATH  Google Scholar 

  15. Kaufmann, L., Rousseeuw, P.J.: Clustering by means of medoids. In: Statistical Data Analysis Based on the L1-norm & Related Methods, pp 405–416 (1987)

  16. Ke, S., Morrison, D., Bruno, E.: Stėphane marchand-maillet Learning representative nodes in social networks. In: PAKDD, pp 25–36 (2013)

  17. Maiya, A.S., Tanya, Y.: Berger-wolf. Sampling community structure. In: WWW, pp 701–710 (2010)

  18. Megiddo, N., Supowit, K.J.: On the complexity of some common geometric location problems. SIAM 13(1), 182–196 (1984)

    Article  MathSciNet  Google Scholar 

  19. Newman, M.E.J.: Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E 74(3), 036104 (2006)

    Article  MathSciNet  Google Scholar 

  20. Lawrence Page: The pagerank citation ranking : Bringing order to the Web. Stanford Digital Libraries Working Paper 9(1), 1–14 (1998)

    Google Scholar 

  21. Papagelis, M., Das, G., Koudas, N.: Sampling online social networks. IEEE TKDE 25(3), 662–676 (2013)

    Google Scholar 

  22. Scripps, J., Tan, P.-N., Esfahanian, A.-H.: Measuring the effects of preprocessing decisions and network forces in dynamic network analysis. In: KDD, pp 747–756 (2009)

  23. Song, S., Meng, Y., Shi, Z., Zheng, Z., Chen, H.: A simple yet effective method for summarizing microblogging users with their representative tweets. In: IALP, pp 310–313 (2017)

  24. Song, X., Chi, Y., Hino, K., Tseng, B.L.: Identifying opinion leaders in the blogosphere. In: CIKM, pp 971–974 (2007)

  25. Stein, J., Song, H.H., Baldi, M., Li, J.: On the most representative summaries of network user activities. Comput. Netw. 113, 205–217 (2017)

    Article  Google Scholar 

  26. Tang, J., Sun, J., Wang, C., Zi, Y.: Social influence analysis in large-scale networks. In: KDD, pp 807–816 (2009)

  27. Tang, J., Zhang, C., Cai, K., Li, Z., Zhong, S.: Sampling representative users from large social networks. In: AAAI, pp 304–310 (2015)

  28. Tang, M.-C., Hsiao, T.-K., Ou, I.-A.: Not all books in the user profile are created equal: Measuring the preference "representativeness” of books in anobii online bookshelves. In: HCI, pp 424–433 (2017)

  29. Ugander, J., Karrer, B., Backstrom, L., Kleinberg, J.M.: Graph cluster randomization: network exposure to multiple universes. In: KDD, pp 329–337 (2013)

  30. Vazirani, V.V.: Approximation algorithms. Springer, berlin (2003)

    Book  Google Scholar 

  31. Xiao, M., Jie, W., Huang, L., Cheng, R., Wang, Y.: Online task assignment for crowdsensing in predictable mobile social networks. IEEE Trans. Mob. Comput. 10, 1–1 (2016)

    Google Scholar 

  32. Xiao, M., Ma, K., Liu, A., Zhao, H., Li, Z., Zheng, K., Zhou, X.: SRA: Secure Reverse auction for task assignment in spatial crowdsourcing. IEEE Trans. Knowl. Data Eng. 32(4), 782–796 (2020)

    Article  Google Scholar 

  33. Xiao, M., Wu, J., Huang, L.: Community-aware opportunistic routing in mobile social networks. IEEE Trans. Comput. 63(7), 1682–1695 (2014)

    Article  MathSciNet  Google Scholar 

  34. Ye, R.C., Kim, Y., Kim, S., Park, K., Park, J.: An on-device gender prediction method for mobile users using representative wordsets. Expert Syst. Appl. 64, 423–433 (2016)

    Article  Google Scholar 

  35. Yin, H., Chen, H., Sun, X., Wang, H., Wang, Y., Nguyen, Q.V.H.: SPTF: A scalable probabilistic tensor factorization model for semantic-aware behavior prediction. In: ICDM, pp 585–594 (2017)

  36. Yin, H., Cui, B., Huang, Y.: Finding a wise group of experts in social networks. In: ADMA, pp 381–394 (2011)

  37. Yin, H., Zhiting, H., Zhou, X., Wang, H., Zheng, K., Hung, Ng.Q.V., Sadiq, S.W.: Discovering interpretable geo-social communities for user behavior prediction. In: ICDE, pp 942–953 (2016)

  38. Yin, H., Zhou, X., Cui, B., Wang, H., Zheng, K., Hung, N.Q.V.: Adapting to user interest drift for POI recommendation. TKDE 28(10), 2566–2581 (2016)

    Google Scholar 

  39. Zhao, Z., Li, C., Zhang, X., Chiclana, F., Herrera-viedma, E.: An incremental method to detect communities in dynamic evolving social networks. Knowl.-Based Syst. 163, 404–415 (2019)

    Article  Google Scholar 

  40. Zhou, Y., Han, Y., An, L., Li, Z., Yin, H., Zhao, L.: Extracting representative user subset of social networks towards user characteristics and topological features. In: WISE, pp 213–229 (2018)

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant No. 61572335, 61572336, 61902270), and the Major Program of Natural Science Foundation, Educational Commission of Jiangsu Province, China (Grant No. 19KJA610002), and the Natural Science Foundation, Educational Commission of Jiangsu Province, China (Grant No. 19KJB520052, 19KJB520050), and Collaborative Innovation Center of Novel Software Technology and Industrialization, Jiangsu, China.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Wei Chen or Lei Zhao.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Web Information Systems Engineering 2018

Guest Editors: Hakim Hacid, Wojciech Cellary, Hua Wang and Yanchun Zhang

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, Y., Han, Y., Liu, A. et al. Extracting representative user subset of social networks towards user characteristics and topological features. World Wide Web 23, 2903–2931 (2020). https://doi.org/10.1007/s11280-020-00828-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-020-00828-5

Keywords

Navigation