Skip to main content
Log in

Deterministic graph exploration for efficient graph sampling

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Graph sampling is a widely used procedure in social network analysis, has attracted great interest in the scientific community and is considered as a very powerful and useful tool in several domains of network analysis. Apart from initial research in this area, which has proposed simple processes such as the classic Random Walk algorithm, Random Node and Random Edge sampling, during the last decade, more advanced graph sampling approaches have been emerged. In this paper, we extensively study the properties of a newly proposed method, the Rank Degree method, which leads to representative graph subgraphs. The Rank Degree is a novel graph exploration method which significantly differs from other existing methods in the literature. The novelty of the Rank Degree lies on the fact that its core methodology corresponds to a deterministic graph exploration; one specific variation corresponds to a number of parallel deterministic traverses that explore the graph. We perform extensive experiments on twelve real-world datasets of a different type, using a variety of measures and comparing our method with Forest Fire, Metropolis Hastings Random Walk and Metropolis Hastings. We provide strong evidence that our approach leads to highly efficient graph sampling; the generated samples preserve several graph properties, to a large extent.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. https://drive.google.com/file/d/0BxezFIUuAvc3a29vY0Fqd3J5RUk.

References

  • Ahmed NK, Neville J, Kompella RR (2012) Network sampling: from static to streaming graphs. CoRR abs/1211.3412

  • Bar-Yossef Z, Gurevich M (2008) Random sampling from a search engine’s index. J ACM 55(5):24:1–24:74

    Article  MathSciNet  MATH  Google Scholar 

  • Çem E, Tozal ME, Saraç K (2013) Impact of sampling design in estimation of graph characteristics. In: IEEE 32nd international performance computing and communications conference, IPCCC 2013, San Diego, CA, USA, December 6–8, 2013, pp 1–10

  • Chen D, Lu L, Shang MS, Zhang YC, Zhou T (2012) Identifying influential nodes in complex networks. Phys A Stat Mech Appl 391(4):1777–1787

    Article  Google Scholar 

  • Chiericetti F, Dasgupta A, Kumar R, Lattanzi S, Sarlós T (2016) On sampling nodes in a network. In: Proceedings of the 25th international conference on world wide web, Republic and Canton of Geneva, Switzerland, WWW ’16, pp 471–481

  • Cho E, Myers SA, Leskovec J (2011) Friendship and mobility: user movement in location-based social networks. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA, KDD ’11, pp 1082–1090

  • Choudhury MD, Lin Y, Sundaram H, Candan KS, Xie L, Kelliher A (2010) How does the data sampling strategy impact the discovery of information diffusion in social media? In: Proceedings of the fourth international conference on weblogs and social media, ICWSM 2010, Washington, DC, USA, May 23–26, 2010

  • Csardi G, Nepusz T (2006) The igraph software package for complex network research. Inter J Complex Syst 1695. http://igraph.org

  • Freeman LC (1978) Centrality in social networks conceptual clarification. Soc Netw 1(3):215–239

    Article  Google Scholar 

  • Gabielkov M, Rao A, Legout A (2014) Sampling online social networks: an experimental study of twitter. In: ACM SIGCOMM 2014 conference, SIGCOMM’14, Chicago, IL, USA, August 17–22, 2014, pp 127–128

  • Gjoka M, Kurant M, Butts CT, Markopoulou A (2010) Walking in facebook: a case study of unbiased sampling of OSNs. In: IEEE Proceedings on INFOCOM, 2010, pp 1–9

  • Gjoka M, Kurant M, Butts CT, Markopoulou A (2011) Practical recommendations on crawling online social networks. IEEE J Sel Areas Commun 29(9):1872–1892

    Article  Google Scholar 

  • Gu Y, McCallum A, Towsley D (2005) Detecting anomalies in network traffic using maximum entropy estimation. In: Proceedings of the 5th ACM SIGCOMM conference on internet measurement, USENIX Association, Berkeley, CA, USA, IMC ’05, pp 32–32

  • Haveliwala TH (2003) Topic-sensitive pagerank: a context-sensitive ranking algorithm for web search. IEEE Trans Knowl Data Eng 15(4):784–796

    Article  Google Scholar 

  • Hu P, Lau WC (2013) A survey and taxonomy of graph sampling. CoRR abs/1308.5865

  • Hubler C, peter Kriegel H, Borgwardt K, Ghahramani Z (2008) Metropolis algorithms for representative subgraph sampling. In: Eighth IEEE international conference on data mining, 2008. ICDM08, pp 283–292

  • Kitsak M, Gallos LK, Havlin S, Liljerosand F, Muchnik L, Stanley HE, Makse HA (2010) Identification of influential spreaders in complex networks. Nat Phys 6:888–893. doi:10.1038/nphys1746

    Article  Google Scholar 

  • Krivitsky PN, Kolaczyk ED (2015) On the question of effective sample size in network modeling: an asymptotic inquiry. Stat Sci 30(2):184–198

    Article  MathSciNet  MATH  Google Scholar 

  • Kurant M, Markopoulou A, Thiran P (2011) Towards unbiased BFS sampling. IEEE J Sel Areas Commun 29(9):1799–1809

    Article  Google Scholar 

  • Lee C, Xu X, Eun DY (2012) Beyond random walk and Metropolis–Hastings samplers: why you should not backtrack for unbiased graph sampling. In: ACM SIGMETRICS/PERFORMANCE joint international conference on measurement and modeling of computer systems, SIGMETRICS ’12, London, UK, June 11–15, 2012, pp 319–330

  • Leskovec J, Faloutsos C (2006) Sampling from large graphs. In: Proceedings of the twelfth ACM SIGKDD international conference on knowledge discovery and data mining, Philadelphia, PA, USA, August 20–23, 2006, pp 631–636

  • Leskovec J, Krevl A (2014) SNAP datasets: stanford large network dataset collection. http://snap.stanford.edu/data

  • Leskovec J, Kleinberg JM, Faloutsos C (2005) Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery and data mining, Chicago, IL, USA, August 21–24, 2005, pp 177–187

  • Leskovec J, Kleinberg JM, Faloutsos C (2007) Graph evolution: densification and shrinking diameters. TKDD 1(1):1–40

    Article  Google Scholar 

  • Leskovec J, Lang KJ, Dasgupta A, Mahoney MW (2009) Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math 6(1):29–123

    Article  MathSciNet  MATH  Google Scholar 

  • Leskovec J, Huttenlocher D, Kleinberg J (2010) Signed networks in social media. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, New York, NY, USA, CHI ’10, pp 1361–1370

  • Levin DA, Peres Y, Wilmer EL (2008) Markov chains and mixing times. American Mathematical Society (AMS), Providence.

  • Li R, Yu JX, Qin L, Mao R, Jin T (2015) On random walk based graph sampling. In: 31st IEEE international conference on data engineering, ICDE 2015, Seoul, South Korea, April 13–17, 2015, pp 927–938

  • Maiya AS, Berger-Wolf TY (2011) Benefits of bias: towards better characterization of network sampling. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, San Diego, CA, USA, August 21–24, 2011, pp 105–113

  • McAuley JJ, Leskovec J (2012) Learning to discover social circles in ego networks. In: Bartlett PL, Pereira FCN, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in Neural Information Processing Systems 25 (NIPS 2012). Curran Associates, Inc., pp 539–547

  • Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21(6):1087–1092. doi:10.1063/1.1699114

    Article  Google Scholar 

  • Potamias M, Bonchi F, Castillo C, Gionis A (2009) Fast shortest path distance estimation in large networks. In: Proceedings of the 18th ACM conference on information and knowledge management, ACM, New York, NY, USA, CIKM ’09, pp 867–876

  • Ribeiro BF, Towsley DF (2010) Estimating and sampling graphs with multidimensional random walks. In: Proceedings of the 10th ACM SIGCOMM internet measurement conference, IMC 2010, Melbourne, Australia, November 1–3, 2010, pp 390–403

  • Richardson M, Agrawal R, Domingos P (2003) Trust management for the semantic web. Springer, Berlin

    Book  Google Scholar 

  • Ripeanu M, Foster IT, Iamnitchi A (2002) Mapping the gnutella network: properties of large-scale peer-to-peer systems and implications for system design. CoRR cs.DC/0209028

  • Robles-Granda P, Moreno S, Neville J (2016) Sampling of attributed networks from hierarchical generative models. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, CA, USA, August 13–17, 2016, pp 1155–1164

  • Salamanos N, Voudigari E, Yannakoudakis EJ (2016) Identifying influential spreaders by graph sampling. In: Proceedings of the 5th international workshop on complex networks and their applications, Milan, Italy, November 30–December 02, 2016

  • Stutzbach D, Rejaie R, Duffield N, Sen S, Willinger W (2009) On unbiased sampling for unstructured peer-to-peer networks. IEEE/ACM Trans Netw 17(2):377–390

    Article  Google Scholar 

  • Vattani A, Chakrabarti D, Gurevich M (2011) Preserving personalized pagerank in subgraphs. In: Proceedings of the 28th international conference on machine learning, ICML 2011, Bellevue, Washington, USA, June 28–July 2, 2011, pp 793–800

  • Voudigari E, Salamanos N, Papageorgiou T, Yannakoudakis EJ (2016) Rank degree: an efficient algorithm for graph sampling. In: 2016 IEEE/ACM international conference on advances in social networks analysis and mining, ASONAM 2016, San Francisco, CA, USA, August 18–21, 2016, pp 120–129

  • Xu X, Lee C, Eun DY (2014) A general framework of hybrid graph sampling for complex network analysis. In: 2014 IEEE conference on computer communications, INFOCOM 2014, Toronto, Canada, April 27–May 2, 2014, pp 2795–2803

  • Yang J, Leskovec J (2012) Defining and evaluating network communities based on ground-truth. In: Proceedings of the ACM SIGKDD workshop on mining data semantics, ACM, New York, NY, USA, MDS ’12, pp 3:1–3:8

  • Zafarani R, Liu H (2009) Social computing data repository at ASU. http://socialcomputing.asu.edu

  • Zeng A, Zhang CJ (2013) Ranking spreaders by decomposing complex networks. Phys Lett A 377(14):1031–1035

    Article  Google Scholar 

Download references

Acknowledgements

We thank Kyriaki Chryssaki for her helpful comments on the final manuscript. We also wish to acknowledge the Athens University of Economics and Business for the technical equipment provision.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nikos Salamanos.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Salamanos, N., Voudigari, E. & Yannakoudakis, E.J. Deterministic graph exploration for efficient graph sampling. Soc. Netw. Anal. Min. 7, 24 (2017). https://doi.org/10.1007/s13278-017-0441-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-017-0441-6

Keywords

Navigation