Skip to main content

Validating Social Media Monitoring: Statistical Pitfalls and Opportunities from Public Opinion

  • Conference paper
  • First Online:
Book cover Social, Cultural, and Behavioral Modeling (SBP-BRiMS 2020)

Abstract

Social media are a promising new data source for real-world behavioral monitoring. Despite clear advantages, analyses of social media data face some challenges. In this paper, we seek to elucidate some of these challenges and draw relevant lessons from more traditional survey techniques. Beyond standard machine learning approaches, we make the case that studies that conduct statistical analyses of social media data should carefully consider elements of study design, providing behavioral examples throughout. Specifically, we focus on issues surrounding the validity of statistical conclusions that may be drawn from social media data. We discuss common pitfalls and techniques to avoid these pitfalls, so researchers may mitigate potential problems of design.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    See [10] for discussion of the remaining types of validity in a social media context.

  2. 2.

    See [31] for an extensive list of challenges on social data beyond those discussed here.

  3. 3.

    These population-level correlations, while a limited success, are a far cry from individual-level responses considered gold-standard [1, 2].

References

  1. Baker, R., et al.: Summary report of the AAPOR task force on non-probability sampling. J. Surv. Stat. Methodol. 1(2), 90–143 (2013)

    Article  Google Scholar 

  2. Baker, R., et al.: Evaluating Survey Quality in Today’s Complex Environment - AAPOR, May 2016

    Google Scholar 

  3. Beauchamp, N.: Predicting and interpolating state-level polls using Twitter textual data. Am. J. Polit. Sci. 61, 490–503 (2016)

    Article  Google Scholar 

  4. Beskow, D.M., Carley, K.M.: Bot conversations are different: leveraging network metrics for bot detection in Twitter. In: 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 825–832. IEEE (2018)

    Google Scholar 

  5. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)

    Google Scholar 

  6. Bonevski, B., et al.: Reaching the hard-to-reach: a systematic review of strategies for improving health and medical research with socially disadvantaged groups. BMC Med. Res. Methodol. 14, 42 (2014)

    Article  Google Scholar 

  7. Broniatowski, D.A., Hilyard, K.M., Dredze, M.: Effective vaccine communication during the disneyland measles outbreak. Vaccine 34(28), 3225–3228 (2016)

    Article  Google Scholar 

  8. Broniatowski, D.A., et al.: Weaponized health communication: Twitter bots and Russian trolls amplify the vaccine debate. Am. J. Public Health 108(10), 1378–1384 (2018)

    Article  Google Scholar 

  9. Broniatowski, D.A., Paul, M.J., Dredze, M.: National and local influenza surveillance through Twitter: an analysis of the 2012–2013 influenza epidemic. PLoS ONE 8(12), e83672 (2013)

    Article  Google Scholar 

  10. Broniatowski, D.A., Tucker, C.: Assessing causal claims about complex engineered systems with quantitative data: internal, external, and construct validity. Syst. Eng. 20(6), 483–496 (2017)

    Article  Google Scholar 

  11. Campbell, D.T., Stanley, J.C.: Experimental and Quasi-Experimental Designs for Research, 2nd Print edn. Houghton Mifflin Comp, Boston (1967). oCLC: 247359300

    Google Scholar 

  12. Culotta, A., Ravi, N., Cutler, J.: Predicting Twitter user demographics using distant supervision from website traffic data. J. Artif. Intell. Res. 55, 389–408 (2016)

    Article  Google Scholar 

  13. Cunha, E., Magno, G., Comarela, G., Almeida, V., Gonçalves, M.A., Benevenuto, F.: Analyzing the dynamic evolution of hashtags on Twitter: a language-based approach. In: Proceedings of the Workshop on Languages in Social Media, pp. 58–65. Association for Computational Linguistics (2011)

    Google Scholar 

  14. Davis, C.A., Varol, O., Ferrara, E., Flammini, A., Menczer, F.: BotOrNot: a system to evaluate social bots. arXiv:1602.00975 [cs], pp. 273–274 (2016)

  15. Dredze, M., Broniatowski, D.A., Smith, M.C., Hilyard, K.M.: Understanding vaccine refusal: why we need social media now. Am. J. Prev. Med. 50(4), 550 (2016)

    Article  Google Scholar 

  16. Duggan, M., Brenner, J.: The Demographics of Social Media Users – 2012, February 2013

    Google Scholar 

  17. Fitzgerald, R., Fuller, L.: I hear you knocking but you can’t come in: the effects of reluctant respondents and refusers on sample survey estimates. Sociol. Methods Res. 11(1), 3–32 (1982)

    Google Scholar 

  18. Getis, A., Ord, J.K.: The analysis of spatial association by use of distance statistics. Geograph. Anal. 24(3), 189–206 (1992)

    Article  Google Scholar 

  19. Groves, R.M.: Three eras of survey research. Public Opin. Q. 75(5), 861–871 (2011)

    Article  Google Scholar 

  20. Huang, X., et al.: Examining patterns of influenza vaccination in social media. In: AAAI Joint Workshop on Health Intelligence (W3PHIAI) (2017)

    Google Scholar 

  21. Kata, A.: Anti-vaccine activists, web 2.0, and the postmodern paradigm – an overview of tactics and tropes used online by the anti-vaccination movement. Vaccine 30(25), 3778–3789 (2012)

    Google Scholar 

  22. Knowles, R., Carroll, J., Dredze, M.: Demographer: extremely simple name demographics. In: NLP+ CSS 2016, p. 108 (2016)

    Google Scholar 

  23. Krumpal, I.: Determinants of social desirability bias in sensitive surveys: a literature review. Qual. Quant. 47(4), 2025–2047 (2013)

    Article  Google Scholar 

  24. Kudugunta, S., Ferrara, E.: Deep neural networks for bot detection. Inf. Sci. 467, 312–322 (2018)

    Article  Google Scholar 

  25. Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp. 591–600. ACM, New York (2010)

    Google Scholar 

  26. Lazer, D., Kennedy, R., King, G., Vespignani, A.: The parable of Google Flu: traps in big data analysis. Science 343(6176), 1203–1205 (2014)

    Article  Google Scholar 

  27. Liao, Q.V., Fu, W.T., Strohmaier, M.: # Snowden: understanding biases introduced by behavioral differences of opinion groups on social media. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, CHI 2016, pp. 3352–3363. ACM, New York (2016)

    Google Scholar 

  28. Lu, Y., Hu, X., Wang, F., Kumar, S., Liu, H., Maciejewski, R.: Visualizing social media sentiment in disaster scenarios. In: Proceedings of the 24th International Conference on World Wide Web, WWW 2015 Companion, pp. 1211–1215. ACM, New York (2015)

    Google Scholar 

  29. Murphy, J., et al.: Social Media in Public Opinion Research - AAPOR, May 2014

    Google Scholar 

  30. Nakov, P., et al.: Developing a successful SemEval task in sentiment analysis of Twitter and other social media texts. Lang. Resour. Eval. 50(1), 35–65 (2016)

    Article  Google Scholar 

  31. Olteanu, A., Castillo, C., Diaz, F., Kiciman, E.: Social data: biases, methodological pitfalls, and ethical boundaries. SSRN Scholarly Paper ID 2886526, Social Science Research Network, Rochester, December 2016

    Google Scholar 

  32. Paul, M.J., Dredze, M., Broniatowski, D.: Twitter improves influenza forecasting. PLoS currents 6 (2014)

    Google Scholar 

  33. Quinn, S.C., Jamison, A., An, J., Freimuth, V.S., Hancock, G.R., Musa, D.: Breaking down the monolith: understanding flu vaccine uptake among African Americans. SSM - Popul. Health 4, 25–36 (2018)

    Article  Google Scholar 

  34. Quinn, S.C., Jamison, A., Freimuth, V.S., An, J., Hancock, G.R., Musa, D.: Exploring racial influences on flu vaccine attitudes and behavior: results of a national survey of White and African American adults. Vaccine 35(8), 1167–1174 (2017)

    Article  Google Scholar 

  35. Schober, M.F., Pasek, J., Guggenheim, L., Lampe, C., Conrad, F.G.: Social media analyses for social measurement. Public Opin. Q. 80(1), 180–211 (2016)

    Article  Google Scholar 

  36. Schwartz, H.A., et al.: Toward personality insights from language exploration in social media. In: 2013 AAAI Spring Symposium Series (2013)

    Google Scholar 

  37. Shadish, W., Cook, T.D., Campbell, D.T.: Experimental and quasi-experimental designs for generalized causal inference. Wadsworth Cengage learning (2002)

    Google Scholar 

  38. Särndal, C., Swensson, B., Wretman, J.: Model Assisted Survey Sampling. Springer, Heidelberg (1992)

    Book  Google Scholar 

  39. Tourangeau, R., Rips, L.J., Rasinski, K.: The Psychology of Survey Response. Cambridge University Press, March 2000. Google-Books-ID: bjVYdyXXT3oC

    Google Scholar 

  40. Volkova, S., Bachrach, Y.: On predicting sociodemographic traits and emotions from communications in social networks and their implications to online self-disclosure. Cyberpsychol. Behav. Soc. Netw. 18(12), 726–736 (2015)

    Article  Google Scholar 

  41. Wood-Doughty, Z., Mahajan, P., Dredze, M.: Johns Hopkins or Johnny-Hopkins: classifying individuals versus organizations on Twitter. In: Proceedings of the Second Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media, pp. 56–61 (2018)

    Google Scholar 

  42. Wood-Doughty, Z., Smith, M., Broniatowski, D., Dredze, M.: How does twitter user behavior vary across demographic groups? In: Proceedings of the Second Workshop on NLP and Computational Social Science, pp. 83–89 (2017)

    Google Scholar 

  43. Yeager, D.S., et al.: Comparing the accuracy of RDD telephone surveys and internet surveys conducted with probability and non-probability samples. Public Opin. Q. 75(4), 709–747 (2011)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael C. Smith .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Smith, M.C., Mazzuchi, T.A., Broniatowski, D.A. (2020). Validating Social Media Monitoring: Statistical Pitfalls and Opportunities from Public Opinion. In: Thomson, R., Bisgin, H., Dancy, C., Hyder, A., Hussain, M. (eds) Social, Cultural, and Behavioral Modeling. SBP-BRiMS 2020. Lecture Notes in Computer Science(), vol 12268. Springer, Cham. https://doi.org/10.1007/978-3-030-61255-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-61255-9_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-61254-2

  • Online ISBN: 978-3-030-61255-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics