Estimating network parameters using random walks

Cooper, Colin; Radzik, Tomasz; Siantos, Yiannis

doi:10.1007/s13278-014-0168-6

Estimating network parameters using random walks

Original Article
Published: 22 February 2014

Volume 4, article number 168, (2014)
Cite this article

Social Network Analysis and Mining Aims and scope Submit manuscript

Colin Cooper¹,
Tomasz Radzik¹ &
Yiannis Siantos¹

387 Accesses
7 Citations
Explore all metrics

Abstract

Sampling from large graphs is an area of great interest, especially since the emergence of huge structures such as Online Social Networks and the World Wide Web (WWW). The large scale properties of a network can be summarized in terms of parameters of the underlying graph, such as the total number of vertices, edges and triangles. However, the large si ze of these networks makes it computationally expensive to obtain such structural properties of the underlying graph by exhaustive search. If we can estimate these properties by taking small but representative samples from the network, then size is no longer such a problem. In this paper we present a general framework to estimate network properties using random walks. These methods work under the assumption we are able to obtain local characteristics of a vertex during each step of the random walk, for example the number of its neighbours, and their labels. As examples of this approach, we present practical methods to estimate the total number of edges/links m, number of vertices/nodes n and number of connected triads of vertices (triangles) t in graphs. We also give a general method to count any type of small connected subgraph, of which vertices, edges and triangles are specific examples. Additionally we present experimental estimates for n, m, t we obtained using our methods on real or synthetic networks. The synthetic networks were random graphs with power-law degree distributions and designed to have a large number of triangles. We used these graphs as they tend to correspond to the structure of large online networks. The real networks were samples of the WWW and social networks obtained from the SNAP database. In order to test that the methods are indeed practical, the total number of steps made by the walk was limited to at most the size n of the network. In fact the estimates appear to converge to the correct value at a lower number of steps, indicating that our proposed methods are feasible in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aldous D, Fill JA (1995) Reversible markov chains and random walks on graphs. http://stat-www.berkeley.edu/pub/users/aldous/RWG/book.html
Barabási A, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512
Google Scholar
Bawa M, Garcia-Molina H, Gionis A, Motwani R (2003) Estimating aggregates on a peer-to-peer network. Technical Report 2003-24, Stanford InfoLab. http://ilpubs.stanford.edu:8090/586/
Bollobás B, Riordan O (2002) Mathematical results on scale-free graphs. In: Bornholdt S, Schuster H (eds) Handbook of graphs and networks. Wiley-VCH, Weinheim, pp 1–32
Bollobás B, Riordan O (2004) The diameter of a scale-free random graph. Combinatorica 24:5–34. http://dl.acm.org/citation.cfm?id=1005121.1005123
Google Scholar
Bollobás B, Riordan O, Spencer J, Tusnády G (2001) The degree sequence of a scale-free random graph process. Random Struct Algorithms 18:279–290
Article MATH Google Scholar
Broder AZ, Kumar R, Maghoul F, Raghavan P, Rajagopalan S, Stata R, Tomkins A, Wiener JL (2000) Graph structure in the web. Comput Netw 33(1–6):309–320
Article Google Scholar
Cheeger J (1971) A lower bound for the smallest eigenvalue of the laplacian. Probl Anal 195–199 (papers dedicated to Salomon Bochner, 1969)
Cooper C (2002) Classifying special interest groups in web graphs. In: RANDOM 2002: Randomization and Approximation Techniques in Computer Science, pp 263–275
Cooper C (2006) The age specific degree distribution of web-graphs. Comb Probab Comput 15:637–661
Article MATH Google Scholar
Cooper C, Frieze A (2003) A general model web graphs. Random Struct Algorithms 22(3):311–335
Google Scholar
Cooper C, Frieze A (2007) The cover time of the preferential attachment graphs. J Comb Theory B 97:269–290
Article MathSciNet MATH Google Scholar
Cooper C, Frieze AM (2008) Random walks on random graphs. In: Cheng MX (ed) NanoNet. Lecture notes of the institute for computer sciences, social informatics and telecommunications engineering, vol 3. Springer, Berlin, pp 95–106
Cooper C, Radzik T, Siantos Y (2012) Estimating network parameters using random walks. In: CASoN. IEEE, pp 33–40
Cooper C, Radzik T, Siantos Y (2012) A fast algorithm to find all high degree vertices in graphs with a power law degree sequence. In: WAW 2012 Proceedings. LNCS, vol 7323. Springer, Berlin, pp 165–178
Drinea E, Enachescu M, Mitzenmacher M (2001) Technical report: variations on random graph models for the web. Tech. rep., Harvard University, Department of Computer Science
Ganesh A, Kermarrec A, Le Merrer E, Massouli L (2007) Peer counting and sampling in overlay networks based on random walks. Distrib Comput 20:267–278. doi:10.1007/s00446-007-0027-z
Google Scholar
Gjoka M, Kurant M, Butts CT, Markopoulou A (2009) A walk in facebook: uniform sampling of users in online social networks. CoRR. abs/0906.0060
Hastings WK (1970) Monte carlo sampling methods using markov chains and their applications. Biometrika 57(1):97–109
Article MATH Google Scholar
Katzir L, Liberty E, Somekh O (2011) Estimating sizes of social networks via biased sampling. In: Srinivasan S, Ramamritham K, Kumar A, Ravindra MP, Bertino V, Kumar R (eds.) WWW. ACM, pp 597–606
Leskovec J (2009) Stanford network analysis package. http://snap.stanford.edu/
Leskovec J, Backstrom L, Kumar R, Tomkins A (2008) Microscopic evolution of social networks. In: Proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’08, ACM, New York, NY, USA, pp 462–470. doi:10.1145/1401890.1401948
Leskovec J, Lang KJ, Dasgupta A, Mahoney MW (2008) Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. CoRR. abs/0810.1355
Lovász L (1996) Random walks on graphs: a survey. Bolyai Soc Math Stud 2:353–397
Google Scholar
Massoulié L, Merrer EL, Kermarrec AM, Ganesh AJ (2006) Peer counting and sampling in overlay networks: random walk methods. In: Ruppert E, Malkhi D (eds.) PODC. ACM, pp 123–132
Mckinney EH (1966) Generalized birthday problem. Am Math Mon 73(4):385–387. http://www.jstor.org/stable/2315408
Google Scholar
Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21(6):1087–1092
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, King’s College, London, UK
Colin Cooper, Tomasz Radzik & Yiannis Siantos

Authors

Colin Cooper
View author publications
You can also search for this author in PubMed Google Scholar
Tomasz Radzik
View author publications
You can also search for this author in PubMed Google Scholar
Yiannis Siantos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yiannis Siantos.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cooper, C., Radzik, T. & Siantos, Y. Estimating network parameters using random walks. Soc. Netw. Anal. Min. 4, 168 (2014). https://doi.org/10.1007/s13278-014-0168-6

Download citation

Received: 08 May 2013
Revised: 15 November 2013
Accepted: 23 November 2013
Published: 22 February 2014
DOI: https://doi.org/10.1007/s13278-014-0168-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimating network parameters using random walks

Abstract

Access this article

Similar content being viewed by others

Fast Low-Cost Estimation of Network Properties Using Random Walks

Sampling as a Method of Comparing Real and Generated Networks

Guided sampling for large graphs

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Estimating network parameters using random walks

Abstract

Access this article

Similar content being viewed by others

Fast Low-Cost Estimation of Network Properties Using Random Walks

Sampling as a Method of Comparing Real and Generated Networks

Guided sampling for large graphs

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation