Skip to main content
Log in

Spiraling Facebook: an alternative Metropolis–Hastings random walk using a spiral proposal distribution

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Sampling the content of an Online Social Network (OSN) is a major application area due to the growing interest in collecting social information e.g., email, location, age and number of friends. Large-scale social networks such as Facebook can be difficult to sample due to the amount of data and the privacy settings imposed by this company. Sampling techniques require the development of reliable algorithms able to cope with an unknown environment. Our main purpose in this manuscript is to examine whether it is possible to switch the normal distribution of the Metropolis–Hasting random walk (MHRW) by using a spiral approach as an alternative and reliable distribution. We propose a sampling algorithm, the Alternative Metropolis–Hasting random walk AMHRW, to study the effect of collecting digital profiles on two different datasets. We examine the soundness and robustness of the proposed algorithm through independent walks on two different representative samples of Facebook. We observe that normal distribution performance can be approximated by means of the use of an Illusion spiral. Similarly, we provide a formal convergence analysis to evaluate the performance of our independent walks and to evaluate whether the sample of draws has attained an equilibrium state. Finally, our preliminary results provide experimental evidence that collecting data with the AMHRW algorithm can be equally effective as the MHRW algorithm on large-scale networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. When we sample with replacement, the sample values are independent.

  2. A full analysis of the whole structure of Facebook can be found in (Ugander et al. 2011).

References

  • Akkermans H (2012) Web dynamics as a random walk: how and why power laws occur

  • Allison P (2010) Survival analysis using SAS: a practical guide. Sas Institute, Cary

  • API (2013) Graph api getting started guide. https://developers.facebook.com/docs/reference/api/search/. Accessed on 6 June 2013

  • Backstrom L, Leskovec J (2011) Supervised random walks: predicting and recommending links in social networks. In: Proceedings of the 4th ACM international conference on Web search and data mining, ACM, pp 635–644

  • Bar-Yossef Z, Gurevich M (2008) Random sampling from a search engine’s index. J ACM 55(5):24

    Google Scholar 

  • Beskos A, Stuart A (2009) Computational complexity of Metropolis–Hastings methods in high dimensions. In: Monte Carlo and Quasi-Monte Carlo methods 2008, Springer, Berlin, pp 61–71

  • Best N, Cowles M, Vines K (1995) Coda* convergence diagnosis and output analysis software for gibbs sampling output version 0.30. MRC Biostatistics Unit, Cambridge

  • Bhattacharyya P, Garg A, Wu SF (2011) Analysis of user keyword similarity in online social networks. Soc Netw Anal Min 1(3):143–158

    Article  Google Scholar 

  • Caci B, Cardaci M, Tabacchi ME (2011) Facebook as a small world: a topological hypothesis. Soc Netw Anal Min, pp 1–5

  • Catanese S, De Meo P, Ferrara E, Fiumara G, Provetti A (2011) Crawling facebook for social network analysis purposes. Arxiv preprint arXiv:1105.6307

  • Chapra S, Canale R (2010) Numerical methods for engineers, vol 2. McGraw-Hill, New York

  • Codling E, Bearon R, Thorn G (2010) Diffusion about the mean drift location in a biased random walk. Ecology 91(10):3106–3113

    Article  Google Scholar 

  • Cook T (1979) The curves of life. Dover Publications, New York

  • Cutillo L, Molva R, Onen M (2011) Analysis of privacy in online social networks from the graph theory perspective. In: Proceedings of the global telecommunications conference (GLOBECOM 2011), IEEE, pp 1–5

  • Davis P, Gautschi W, Iserles A (1993) Spirals: from Theodorus to chaos. AK Peters, Wellesley

  • Dudewicz E (1976) Introduction to statistics and probability. Holt, Rinehart and Winston

  • Ferri F, Grifoni P, Guzzo T (2012) New forms of social and professional digital relationships: the case of facebook. Soc Netw Anal Min 2(2):121–137

    Article  Google Scholar 

  • Geweke J et al (1991) Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. Research Department, Federal Reserve Bank of Minneapolis

  • Gjoka M, Kurant M, Butts C, Markopoulou A (2011) Practical recommendations on crawling online social networks. IEEE J Sel Areas Commun 29(9):1872–1892

    Article  Google Scholar 

  • Gjoka M, Kurant M, Butts CT, Markopoulou A (2010) Walking in Facebook: a case study of unbiased sampling of OSNs. In: Proceedings of IEEE INFOCOM ’10. San Diego, CA

  • Hargittai I (1992) Spiral symmetry. World Scientific Publishing Company Incorporated, Singapore

  • Karatzas I, Shreve S (1991) Brownian motion and stochastic calculus, vol 113, Springer, Berlin

  • Katzir L, Liberty E, Somekh O (2011) Estimating sizes of social networks via biased sampling. In: Proceedings of the 20th international conference on World wide web, ACM, pp 597–606

  • Kurant M, Gjoka M, Butts CT, Markopoulou A (2011) Walking on a graph with a magnifying glass: stratified sampling via weighted random walks. In: Proceedings of ACM SIGMETRICS ’11. San Jose, CA

  • Lafore R, Waite M (2003) Data structures and algorithms in Java. Sams Publishing

  • LeSage J (1999) Applied econometrics using matlab. Manuscript, Department of Economics, University of Toronto

  • Martinez W, Martinez A (2001) Computational statistics handbook with MATLAB, vol 2. Chapman and Hall/CRC

  • Mislove A, Gummadi KP, Druschel P (2006) Exploiting social networks for internet search. In: Proceedings of the 5th workshop on hot topics in networks (HotNets06). Citeseer, p 79

  • Papagelis M, Das G, Koudas N (2011) Sampling online social networks

  • Ribeiro B, Towsley D (2010) Estimating and sampling graphs with multidimensional random walks. In: Proceedings of the 10th annual conference on Internet measurement, ACM, pp 390–403

  • Robert C, Casella G (2009) Introducing Monte Carlo methods with R. Springer, Berlin

  • Scott J (2011) Social network analysis: developments, advances, and prospects. Soc Netw Anal Min 1(1):21–26

    Article  Google Scholar 

  • Stuckman J, Purtilo J (2011) Analyzing the wikisphere: methodology and data to support quantitative wiki research. J Am Soc Inf Sci Technol 62(8):1564–1576

    Article  Google Scholar 

  • Tang J, Musolesi M, Mascolo C, Latora V (2009) Temporal distance metrics for social network analysis. In: Proceedings of the 2nd ACM workshop on Online social networks, ACM, pp 31–36

  • Ugander J, Karrer B, Backstrom L, Marlow C (2011) The anatomy of the facebook social graph. Arxiv preprint arXiv:1111.4503

  • Viswanathan G, Buldyrev S, Havlin S, Da Luz M, Raposo E, Stanley H (1999) Optimizing the success of random searches. Nature 401(6756):911–914

    Article  Google Scholar 

  • Vitter JS (1985) Random sampling with a reservoir. ACM Trans Math Softw (TOMS) 11(1):37–57

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The authors thank the reviewers of this paper for their useful comments. Mr. Piña-García has been partially supported by the Mexican National Council of Science and Technology (CONACYT), through the program “Becas para estudios de posgrado en el extranjero” (no. 213550).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to C. A. Piña-García.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Piña-García, C.A., Gu, D. Spiraling Facebook: an alternative Metropolis–Hastings random walk using a spiral proposal distribution. Soc. Netw. Anal. Min. 3, 1403–1415 (2013). https://doi.org/10.1007/s13278-013-0126-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13278-013-0126-8

Keywords

Navigation