Skip to main content

Subsampling for Chain-Referral Methods

  • Conference paper
  • First Online:
Analytical and Stochastic Modelling Techniques and Applications (ASMTA 2016)

Abstract

We study chain-referral methods for sampling in social networks. These methods rely on subjects of the study recruiting other participants among their set of connections. This approach gives us the possibility to perform sampling when the other methods, that imply the knowledge of the whole network or its global characteristics, fail. Chain-referral methods can be implemented with random walks or crawling in the case of online social networks. However, the estimations made on the collected samples can have high variance, especially with small sample size. The other drawback is the potential bias due to the way the samples are collected. We suggest and analyze a subsampling technique, where some users are requested only to recruit other users but do not participate to the study. Assuming that the referral has lower cost than actual participation, this technique takes advantage of exploring a larger variety of population, thus decreasing significantly the variance of the estimator. We test the method on real social networks and on synthetic ones. As by-product, we propose a Gibbs like method for generating synthetic networks with desired properties.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    We are ignoring here the effect of resampling.

  2. 2.

    It could be adopted to model the case where nodes are on a line and social influences are homogeneous.

  3. 3.

    Note that \(Y_i\) = \(g_j\) if the random walk is on node j at the i-th step.

  4. 4.

    Matrix \(P^*\) is always diagonalizable for RW on undirected graph.

References

  1. Freeman, L.C.: Research Professor, Department of Sociology and Institute for Mathematical Behavioral Sciences School of Social Sciences, University of California, Irvine. http://moreno.ss.uci.edu/data.html. Accessed 01 July 2015

  2. The National Longitudinal Study of Adolescent to Adult Health. http://www.cpc.unc.edu/projects/addhealth. Accessed 01 July 2015

  3. The Office of Population Research at Princeton University. https://opr.princeton.edu/archive/p90/. Accessed 01 July 2015

  4. Stanford Large Network Dataset Collection. https://snap.stanford.edu/data/ Accessed 01 July 2015

  5. Brémaud, P.: Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues, vol. 31. Springer Science & Business Media, Berlin (2013)

    MATH  Google Scholar 

  6. Nicholas, A.: Christakis and James H Fowler.: The spread of obesity in a large social network over 32 years. New Engl. J. Med. 357(4), 370–379 (2007)

    Article  Google Scholar 

  7. Gile, K.J., Handcock, M.S.: Respondent-driven sampling: an assessment of current methodology. Sociol. Methodol. 40(1), 285–327 (2010)

    Article  MathSciNet  Google Scholar 

  8. Goel, S., Salganik, M.J.: Assessing respondent-driven sampling. Proc. Natl. Acad. Sci. 107(15), 6743–6747 (2010)

    Article  Google Scholar 

  9. Heckathorn, D.D., Jeffri, J.: Jazz networks: using respondent-driven sampling to study stratification in two jazz musician communities. In: Unpublished Paper Presented at American Sociological Association Annual Meeting (2003)

    Google Scholar 

  10. Jeon, K.C., Goodson, P.: US adolescents’ friendship networks and health risk behaviors: a systematic review of studies using social network analysis and Add Health data. PeerJ 3, e1052 (2015)

    Article  Google Scholar 

  11. Musyoki, H., Kellogg, T.A., Geibel, S., Muraguri, N., Okal, J., Tun, W., Raymond, H.F., Dadabhai, S., Sheehy, M., Kim, A.A.: Prevalence of HIV, sexually transmitted infections, and risk behaviours among female sex workers in Nairobi, Kenya: results of a respondent driven sampling study. AIDS Behav. 19(1), 46–58 (2015)

    Article  Google Scholar 

  12. Ramirez-Valles, J., Heckathorn, D.D., Vázquez, R., Diaz, R.M., Campbell, R.T.: From networks to populations: the development and application of respondent-driven sampling among IDUs and Latino gay men. AIDS Behav. 9(4), 387–402 (2005)

    Article  Google Scholar 

  13. Volz, E., Heckathorn, D.D.: Probability based estimation theory for respondent driven sampling. J. Off. Stat. 24(1), 79 (2008)

    Google Scholar 

Download references

Acknowledgements

This work was supported by CEFIPRA grant no. 5100-IT1 “Monte Carlo and Learning Schemes for Network Analytics,” Inria Nokia Bell Labs ADR “Network Science,” and Inria Brazilian-French research team Thanes.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alina Tuholukova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Avrachenkov, K., Neglia, G., Tuholukova, A. (2016). Subsampling for Chain-Referral Methods. In: Wittevrongel, S., Phung-Duc, T. (eds) Analytical and Stochastic Modelling Techniques and Applications. ASMTA 2016. Lecture Notes in Computer Science(), vol 9845. Springer, Cham. https://doi.org/10.1007/978-3-319-43904-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-43904-4_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-43903-7

  • Online ISBN: 978-3-319-43904-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics