Skip to main content
Log in

Complex Network Hierarchical Sampling Method Combining Node Neighborhood Clustering Coefficient with Random Walk

  • Published:
New Generation Computing Aims and scope Submit manuscript

Abstract

Aiming at the problem of over-sampling for high-degree nodes and low-degree nodes in current sampling algorithms, a node Neighborhood Clustering coefficient Hierarchical Random Walk (NCHRW) sampling method is proposed. Firstly, the idea of hierarchy and degree distribution are adopted, and the k-means clustering algorithm is used to determine the value of the number of layers; secondly, combining the accuracy degree distribution to determine the boundary value between each hierarchical network; thirdly, sampling is carried out not only by taking the degree of the current node, the number of common neighbors between the current node and its neighbors, but the clustering coefficient of these neighbors into consideration at each layer. Finally, on eight real networks and one synthetic network, NCHRW and existing algorithms are compared from six aspects of degree distribution, density, average degree, average clustering coefficient, transitivity and sampling network visualization. The results show that the proposed NCHRW method is significantly better than other nine traditional sampling algorithms in terms of degree distribution, density and average degree, the topology properties of the network can be preserved very well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Hu, P., W. C. Lau.: A survey and taxonomy of graph sampling. https://arxiv.org/abs/1308.5865 [cs.SI] (2013)

  2. Gjoka, M., et al.: Multigraph sampling of online social networks. IEEE J. Sel. Areas Commun. 29(9), 1893–1905 (2011). https://doi.org/10.1109/JSAC.2011.111012

    Article  Google Scholar 

  3. Volz, E.M., Heckathorn, D.D.: Probability based estimation theory for respondent driven sampling. Qual. Eng. 53, 559–560 (2008)

    Google Scholar 

  4. Papagelis, M., Das, G., Koudas, N.: Sampling online social networks. IEEE Trans. Knowl. Data Eng. 25(3), 662–676 (2013). https://doi.org/10.1109/TKDE.2011.254

    Article  Google Scholar 

  5. Krishnamurthy, V., et al.: Reducing Large Internet Topologies for Faster Simulations. Springer, Berlin, Heidelberg. 328–341(2005). https://doi.org/10.1007/11422778_27

  6. Doerr C., Blenn. B.: Metric convergence in social network sampling. In Proceedings of the 5th ACM workshop on HotPlanet. 45–50 (2013). https://doi.org/10.1145/2491159.2491168

  7. Gjoka, M., Kurant, M., Butts, C. T.: Walking in Facebook: A Case Study of Unbiased Sampling of OSNs. In: 2010 Proceedings IEEE Infocom,IEEE,1–9. https://doi.org/10.1109/INFCOM.2010.5462078

  8. Hübler, C., et al.: Metropolis Algorithms for Representative Subgraph Sampling. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 283–292 (2008). https://doi.org/10.1109/ICDM.2008.124

  9. Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 631–636 (2006). https://doi.org/10.1145/1150402.1150479

  10. NESREEN, et al.: Network Sampling: From Static to Streaming Graphs. Acm Transactions on Knowledge Discovery from Data. 8(2), 1–56 (2013). https://doi.org/10.1145/2601438

  11. Zhao, J., Wang, P., Lui, J., Don, T., et al.: Sampling online social networks by random walk with indirect jumps. Data Min. Knowl. Disc. 33, 24–57 (2019). https://doi.org/10.1007/s10618-018-0587-5

    Article  MathSciNet  MATH  Google Scholar 

  12. Wagner, C., Singer, P., Karimi, F., Pfeffer, J., Strohmaier, M.: Sampling from Social Networks with Attributes. In: Conference www'17 Proceedings of the 26th International Conference on World Wide Wep. pp. 1181–1190 (2017). https://doi.org/10.1145/3038912.3052665

  13. Hasan, M. A.: Methods and Applications of Network Sampling. SIAM Conference on Data Mining. 115–139 (2016). https://doi.org/10.1287/educ.2016.0147

  14. Rezvanian, A., Meybodi, M. R.: Sarmpling algorithms for weighted networks. Social Network Analysis &. Mining. 6(1), 1–22 (2016). https://doi.org/10.1007/s13278-016-0371-8

  15. Voudigari, E., Salmanos, N., Papageorgiou, T., et al.: Rank degree: An efficient algorithm for graph sampling. In: 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). pp. 120–129 (2016). https://doi.org/10.1109/ASONAM.2016.7752223.

  16. Zhao, J., Lui, J., Towsley, D., Wang, P., Guan, X.: A tale of three graphs: sampling design on hybrid social-affiliation networks. Proceedings of IEEE ICDE (2015). https://doi.org/10.1109/ICDE.2015.7113346

    Article  Google Scholar 

  17. Cui, Y.A., et al.: A comparison on methodologies of sampling online social media. Chin. J. Comput. 37(8), 1859–1876 (2014). https://doi.org/10.3724/SP.J.1016.2014.01859

    Article  Google Scholar 

  18. Tang, J., Wang, T., Ji, W.: Shortest path approximate algorithm for complex network analysis. J. Softw. 22(10), 2279–2290 (2011). https://doi.org/10.3724/SP.J.1001.2011.03924

    Article  MathSciNet  MATH  Google Scholar 

  19. Ahmed, N.K., Berchmans, F., et al.: Time-based sampling of social network activity graphs. In: Proceedings of the Eighth Workshop on Mining and Learning with Graphs, 2010, pp. 1–9. https://doi.org/10.1145/1830252.1830253

  20. Ahmed, N.K., Neville, J., Kompella, R.: Space-efficient sampling from social activity streams. In: Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, 2012, pp. 53–60. https://arxiv.org/abs/1206.4952 [cs.SI] (2012)

  21. Kurant, M., Gjoka, M., Butts, C.T., Markopoulou, A.: Walking on a graph with a magnifying glass: stratified sampling via weighted random walks. In: Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2011, pp. 281–292. https://doi.org/10.1145/1993744.1993773

  22. Li, Y., Wu, Z., Lin, S., Xie, H., Lv, M., Xu, Y., et al. Walking with Perception: Efficient Random Walk Sampling via Common Neighbor Awareness. In: 2019 IEEE 35th International Conference on Data Engineering (ICDE), 2019, pp. 962–973. https://doi.org/10.1109/ICDE.2019.00090

  23. Rezvanian, A., Moradabadi, B., Ghavipour, M., Daliri Khomami, M.M., Meybodi, M.R.: Social Network Sampling. In: Learning Automata Approach for Social Networks. 820: 91–149 (2019). https://doi.org/10.1007/978-3-030-10767-3_4

  24. Rezvanian, A., Meybodi, M.R.: Sampling algorithms for stochastic graphs: a learning automata approach. Knowl.-Based Syst. 127, 126–144 (2017). https://doi.org/10.1016/j.knosys.2017.04.012

    Article  Google Scholar 

  25. Ghavipour, M., Meybodi, M.R.: A dynamic sampling algorithm based on learning automata for stochastic trust networks. Knowl.-Based Syst. 212, 106620 (2021). https://doi.org/10.1016/j.knosys.2020.106620.(ISSN0950-7051)

    Article  Google Scholar 

  26. Lin, M.-K., Li, W.-Z., et al.: SAKE: estimating katz centrality based on sampling for large-scale social networks. ACM Trans. Knowl. Discov. Data. 15(4), 1–21 (2021). https://doi.org/10.1145/3441646

    Article  Google Scholar 

  27. Du, X.-L., Wang, D., et al.: SGP: a social network sampling method based on graph partition. Int J Inform Technol Manag Indersci Enterprises Ltd. 18(2/3), 227–242 (2019). https://doi.org/10.1145/3441646

    Article  MathSciNet  Google Scholar 

  28. Chen, J., Gong, Z., Wang, W., Liu, W.: HNS: hierarchical negative sampling for network representation learning. Inf. Sci. 542, 343–356 (2021). https://doi.org/10.1016/j.ins.2020.07.015

    Article  MathSciNet  MATH  Google Scholar 

  29. Hong, C., et al.: GL2vec: Graph Embedding Enriched by Line Graphs with Edge Features. In: International Conference on Neural Information Processing. Springer, 2019, pp. 3–14. https://doi.org/10.1007/978-3-030-36718-3_1

  30. Hamilton, W. L., Ying, R., Leskovec, J.: Representation learning on graphs: methods and applications. https://arxiv.org/abs/1709.05584 [Cs.SI] (2018)

  31. Rozemberczki, B., Allen, C., Sarkar, R.: Multi-Scale Attributed Node Embedding. 9(2), 2051–1329 (2021). https://doi.org/10.1093/comnet/cnab014

    Article  Google Scholar 

  32. Rozemberczki, B., Davies, R., Sarkar, R., Sutton, C.: GEM-SEC: Graph Embedding with Self Clustering. In: Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2019, pp. 65–72. https://doi.org/10.1145/3341161.3342890

  33. Rozemberczki, B., Kiss, O., Sarkar, R.: Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM ’20), 2020, pp. 3125–3132. https://doi.org/10.1145/3340531.3412757

  34. Papagelis, M.: Refining social graph connectivity via shortcut edge addition. ACM Trans. Knowl. Discov. Data 10(2), 1–35 (2015). https://doi.org/10.1145/2757281

    Article  Google Scholar 

  35. Rozemberczki, B., Sarkar, R.: Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models. In: Proceedings of the 29th ACM International on Conference on Information and Knowledge Management (CIKM ’20), 2020, pp. 1325–1334. https://doi.org/10.1145/3340531.3411866

  36. Rozemberczki, B., Davies, R., Sarkar, R., Sutton, C.: GEMSEC: Graph Embedding with Self Clustering. In: Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2019, pp. 65–72. https://doi.org/10.1145/3341161.3342890

  37. Lz, A., Hong, J.B., Fang, W.A., Dan, F.A.: DRaWS: A dual random-walk based sampling method to efficiently estimate distributions of degree and clique size over social networks. Knowl.-Based Syst. 198, 10–21 (2020). https://doi.org/10.1016/j.knosys.2020.105891

    Article  Google Scholar 

  38. Goldstein, M.L., Morris, S.A., Yen, G.G.: Problems with fitting to the power-law distribution. Phys. Condensed Matter. 41(2), 255–258 (2004). https://doi.org/10.1140/epjb/e2004-00316-5

    Article  Google Scholar 

  39. Lawyer, G.: Understanding the influence of all nodes in a network. Sci Rep 5, 8665 (2015). https://doi.org/10.1038/srep08665

    Article  Google Scholar 

  40. Scott, E., Stephen, K., Mike, G., Katy, B., Constantine, D.: Analysis of network clustering algorithms and cluster quality metrics at scale. PLoS ONE 11(7), e0159161 (2017). https://doi.org/10.1371/journal.pone.0159161

    Article  Google Scholar 

  41. Wu, Y., Cao, N., Archambault, D., Shen, Q., Qu, H., Cui, W.: Evaluation of Graph Sampling: A Visualization Perspective. IEEE Transactions on Visualization and Computer Graphics (InfoVis 2016), 23(1), 401–410 (2017). http://dx.doi.org/https://doi.org/10.1109/TVCG.2016.2598867

  42. Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech: Theory Exp. (2008). https://doi.org/10.1088/1742-5468/2008/10/P10008

    Article  MATH  Google Scholar 

  43. Laishui, et al.: PageRank centrality for temporal networks. Physics Letters A. 383(12), 1215–1222 (2019). https://doi.org/10.1016/j.physleta.2019.01.041

  44. Suvarna, Mashrin, B.J., Pankaj S.: PageRank Algorithm using Eigenvector Centrality. https://arxiv.org/abs/2201.05469 [cs.SI] (2022)

  45. Britta, R.: Eigenvector-centrality - a node-centrality? Social Networks. 22(4), 357–365 (2000). https://doi.org/10.1016/S0378-8733(00)00031-9

    Article  Google Scholar 

  46. Bihari, A., Pandia, M. K.: Eigenvector centrality and its application in research professionals' relationship network. In: International Conference on Futuristic Trends on Computational Analysis and Knowledge Management (ABLAZE), 2015, pp. 510–514, https://doi.org/10.1109/ABLAZE.2015.7154915.

Download references

Acknowledgements

This work is supported in part by Science and Technology Research Project of Chongqing Municipal Education Commission (KJZD-K202001101), Chongqing Ba-nan District Science and Technology Bureau Science and Technology Talents Special Project (2020.58), General Project of Chongqing Natural Science Foundation (cstc2021jcyj-msxmX0162), 2021 National Education Examination Research Project (GJK2021028), 2020 Chongqing Municipal Human Resources and Social Security Bureau of Innovation Project for Returned Overseas Person (cx2020031), 2020 National Statistical Science Research Project (2020412).

Author information

Authors and Affiliations

Authors

Contributions

XL: conceptualization, software, methodology, validation, data curation, writing-review & editing. MZ: methodology, formal analysis, writing-original draft, data curation, writing review & editing. GF: methodology, writing-review & editing. PDM: methodology, writing-review & editing.

Corresponding author

Correspondence to Xiaoyang Liu.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, X., Zhang, M., Fiumara, G. et al. Complex Network Hierarchical Sampling Method Combining Node Neighborhood Clustering Coefficient with Random Walk. New Gener. Comput. 40, 765–807 (2022). https://doi.org/10.1007/s00354-022-00179-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00354-022-00179-x

Keywords

Navigation