Abstract
Sampling the content of an Online Social Network (OSN) is a major application area due to the growing interest in collecting social information e.g., email, location, age and number of friends. Large-scale social networks such as Facebook can be difficult to sample due to the amount of data and the privacy settings imposed by this company. Sampling techniques require the development of reliable algorithms able to cope with an unknown environment. Our main purpose in this manuscript is to examine whether it is possible to switch the normal distribution of the Metropolis–Hasting random walk (MHRW) by using a spiral approach as an alternative and reliable distribution. We propose a sampling algorithm, the Alternative Metropolis–Hasting random walk AMHRW, to study the effect of collecting digital profiles on two different datasets. We examine the soundness and robustness of the proposed algorithm through independent walks on two different representative samples of Facebook. We observe that normal distribution performance can be approximated by means of the use of an Illusion spiral. Similarly, we provide a formal convergence analysis to evaluate the performance of our independent walks and to evaluate whether the sample of draws has attained an equilibrium state. Finally, our preliminary results provide experimental evidence that collecting data with the AMHRW algorithm can be equally effective as the MHRW algorithm on large-scale networks.







Similar content being viewed by others
Notes
When we sample with replacement, the sample values are independent.
A full analysis of the whole structure of Facebook can be found in (Ugander et al. 2011).
References
Akkermans H (2012) Web dynamics as a random walk: how and why power laws occur
Allison P (2010) Survival analysis using SAS: a practical guide. Sas Institute, Cary
API (2013) Graph api getting started guide. https://developers.facebook.com/docs/reference/api/search/. Accessed on 6 June 2013
Backstrom L, Leskovec J (2011) Supervised random walks: predicting and recommending links in social networks. In: Proceedings of the 4th ACM international conference on Web search and data mining, ACM, pp 635–644
Bar-Yossef Z, Gurevich M (2008) Random sampling from a search engine’s index. J ACM 55(5):24
Beskos A, Stuart A (2009) Computational complexity of Metropolis–Hastings methods in high dimensions. In: Monte Carlo and Quasi-Monte Carlo methods 2008, Springer, Berlin, pp 61–71
Best N, Cowles M, Vines K (1995) Coda* convergence diagnosis and output analysis software for gibbs sampling output version 0.30. MRC Biostatistics Unit, Cambridge
Bhattacharyya P, Garg A, Wu SF (2011) Analysis of user keyword similarity in online social networks. Soc Netw Anal Min 1(3):143–158
Caci B, Cardaci M, Tabacchi ME (2011) Facebook as a small world: a topological hypothesis. Soc Netw Anal Min, pp 1–5
Catanese S, De Meo P, Ferrara E, Fiumara G, Provetti A (2011) Crawling facebook for social network analysis purposes. Arxiv preprint arXiv:1105.6307
Chapra S, Canale R (2010) Numerical methods for engineers, vol 2. McGraw-Hill, New York
Codling E, Bearon R, Thorn G (2010) Diffusion about the mean drift location in a biased random walk. Ecology 91(10):3106–3113
Cook T (1979) The curves of life. Dover Publications, New York
Cutillo L, Molva R, Onen M (2011) Analysis of privacy in online social networks from the graph theory perspective. In: Proceedings of the global telecommunications conference (GLOBECOM 2011), IEEE, pp 1–5
Davis P, Gautschi W, Iserles A (1993) Spirals: from Theodorus to chaos. AK Peters, Wellesley
Dudewicz E (1976) Introduction to statistics and probability. Holt, Rinehart and Winston
Ferri F, Grifoni P, Guzzo T (2012) New forms of social and professional digital relationships: the case of facebook. Soc Netw Anal Min 2(2):121–137
Geweke J et al (1991) Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. Research Department, Federal Reserve Bank of Minneapolis
Gjoka M, Kurant M, Butts C, Markopoulou A (2011) Practical recommendations on crawling online social networks. IEEE J Sel Areas Commun 29(9):1872–1892
Gjoka M, Kurant M, Butts CT, Markopoulou A (2010) Walking in Facebook: a case study of unbiased sampling of OSNs. In: Proceedings of IEEE INFOCOM ’10. San Diego, CA
Hargittai I (1992) Spiral symmetry. World Scientific Publishing Company Incorporated, Singapore
Karatzas I, Shreve S (1991) Brownian motion and stochastic calculus, vol 113, Springer, Berlin
Katzir L, Liberty E, Somekh O (2011) Estimating sizes of social networks via biased sampling. In: Proceedings of the 20th international conference on World wide web, ACM, pp 597–606
Kurant M, Gjoka M, Butts CT, Markopoulou A (2011) Walking on a graph with a magnifying glass: stratified sampling via weighted random walks. In: Proceedings of ACM SIGMETRICS ’11. San Jose, CA
Lafore R, Waite M (2003) Data structures and algorithms in Java. Sams Publishing
LeSage J (1999) Applied econometrics using matlab. Manuscript, Department of Economics, University of Toronto
Martinez W, Martinez A (2001) Computational statistics handbook with MATLAB, vol 2. Chapman and Hall/CRC
Mislove A, Gummadi KP, Druschel P (2006) Exploiting social networks for internet search. In: Proceedings of the 5th workshop on hot topics in networks (HotNets06). Citeseer, p 79
Papagelis M, Das G, Koudas N (2011) Sampling online social networks
Ribeiro B, Towsley D (2010) Estimating and sampling graphs with multidimensional random walks. In: Proceedings of the 10th annual conference on Internet measurement, ACM, pp 390–403
Robert C, Casella G (2009) Introducing Monte Carlo methods with R. Springer, Berlin
Scott J (2011) Social network analysis: developments, advances, and prospects. Soc Netw Anal Min 1(1):21–26
Stuckman J, Purtilo J (2011) Analyzing the wikisphere: methodology and data to support quantitative wiki research. J Am Soc Inf Sci Technol 62(8):1564–1576
Tang J, Musolesi M, Mascolo C, Latora V (2009) Temporal distance metrics for social network analysis. In: Proceedings of the 2nd ACM workshop on Online social networks, ACM, pp 31–36
Ugander J, Karrer B, Backstrom L, Marlow C (2011) The anatomy of the facebook social graph. Arxiv preprint arXiv:1111.4503
Viswanathan G, Buldyrev S, Havlin S, Da Luz M, Raposo E, Stanley H (1999) Optimizing the success of random searches. Nature 401(6756):911–914
Vitter JS (1985) Random sampling with a reservoir. ACM Trans Math Softw (TOMS) 11(1):37–57
Acknowledgments
The authors thank the reviewers of this paper for their useful comments. Mr. Piña-García has been partially supported by the Mexican National Council of Science and Technology (CONACYT), through the program “Becas para estudios de posgrado en el extranjero” (no. 213550).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Piña-García, C.A., Gu, D. Spiraling Facebook: an alternative Metropolis–Hastings random walk using a spiral proposal distribution. Soc. Netw. Anal. Min. 3, 1403–1415 (2013). https://doi.org/10.1007/s13278-013-0126-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13278-013-0126-8