Abstract
The digital trails of activity on social media are valuable for public health due to their potential to reveal risky health behavior, but there are still considerable methodological issues associated to using data from social media. One particular source of bias is the presence of automated accounts, or social bots, whose activity may compromise predictive tasks based on social media data. In this work, we collected a corpus of public tweets about electronic vaping and combine them with data from the CDC to predict the incidence of lung injuries by state. We show that only when likely bot accounts are removed the relative volume of tweets about vaping predicts injuries, but this correlation disappears otherwise. We compare the predictive power of these data against survey-based predictions, and show that our models achieve the lowest generalization error. These results highlight the importance of bot detection as a data cleaning step and the potential value of social media data in the context of public health.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsChange history
13 December 2022
In an older version of this paper, there was error in figure 2. This has been corrected.
References
Achrekar, H., Gandhe, A., Lazarus, R., Yu, S.H., Liu, B.: Predicting flu trends using twitter data. In: 2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pp. 702–707 (2011)
Allem, J.P., Ferrara, E.: Could social bots pose a threat to public health? Am. J. Publ. Health 108(8), 1005–1006 (2018)
Allem, J.P., Ferrara, E., Uppu, S.P., Cruz, T.B., Unger, J.B.: E-cigarette surveillance with social media data: Social bots, emerging topics, and trends. JMIR Publ. Health Surveillance 3(4), e98 (2017)
Arrazola, R.A., et al.: Tobacco use among middle and high school students-united states, 2011–2014. Morb. Mortal. Wkly Rep. 64(14), 381 (2015)
Auxier, B., Anderson, M.: Social media use in 2021. Technical report, Pew Research Center, April 2021
Bollen, J., Mao, H., Zeng, X.: Twitter mood predicts the stock market. J. Comput. Sci. 2(1), 1–8 (2011)
Bond, R., Messing, S.: Quantifying social media’s political space: estimating ideology from publicly revealed preferences on Facebook. Am. Polit. Sci. Rev. 109(1), 62–78 (2015)
Broniatowski, D.A., et al.: Weaponized health communication: Twitter bots and Russian trolls amplify the vaccine debate. Am. J. Publ. Health 108(10), 1378–1384 (2018)
Chan, A.K.M., Nickson, C.P., Rudolph, J.W., Lee, A., Joynt, G.M.: Social media for rapid knowledge dissemination: early experience from the COVID-19 pandemic. Anaesthesia 75(12), 1579–1582 (2020)
Colditz, J.B., Welling, J., Smith, N.A., James, A.E., Primack, B.A.: World vaping day: contextualizing vaping culture in online social media using a mixed methods approach. J. Mixed Methods Res. 13(2), 196–215 (2019)
De Choudhury, M., Gamon, M., Counts, S., Horvitz, E.: Predicting depression via social media. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 7, no. 1, pp. 128–137 (2021)
Ferrara, E., Varol, O., Davis, C., Menczer, F., Flammini, A.: The rise of social bots. Commun. ACM 59(7), 96–104 (2016)
Friedman, A.S.: Association of vaping-related lung injuries with rates of e-cigarette and cannabis use across us states. Addiction 116(3), 651–657 (2021)
Gallotti, R., Valle, F., Castaldo, N., Sacco, P., Domenico, M.D.: Assessing the risks of ‘infodemics’ in response to COVID-19 epidemics. Nat. Hum. Behav. 4(12), 1285–1293 (2020)
Gruhl, D., Guha, R., Kumar, R., Novak, J., Tomkins, A.: The predictive power of online chatter. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. KDD 2005, pp. 78–87. Association for Computing Machinery, New York, August 2005
Kennedy, R., Wojcik, S., Lazer, D.: Improving election prediction internationally. Science 355(6324), 515–520 (2017)
Kergl, D., Roedler, R., Seeber, S.: On the endogenesis of Twitter’s spritzer and gardenhose sample streams. In: 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014), pp. 357–364 (2014)
Lazer, D., Kennedy, R., King, G., Vespignani, A.: The parable of google flu: traps in big data analysis. Science 343(6176), 1203–1205 (2014)
Pfeffer, J., Mayer, K., Morstatter, F.: Tampering with Twitter’s sample API. EPJ Data Sci. 7(1) (2018)
Ruths, D., Pfeffer, J.: Social media for large studies of behavior. Science 346(6213), 1063–1064 (2014)
Shao, C., Ciampaglia, G.L., Varol, O., Yang, K.C., Flammini, A., Menczer, F.: The spread of low-credibility content by social bots. Nat. Commun. 9(1), 1–9 (2018)
Varol, O., Ferrara, E., Davis, C., Menczer, F., Flammini, A.: Online human-bot interactions: Detection, estimation, and characterization. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 11, pp. 280–289, May 2017
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Statis. Soc. Ser. B (Statis. Methodol.) 67(2), 301–320 (2005)
Acknowledgement
The authors would like to thank Filippo Menczer and Kai-Cheng Yang for providing access to the BotometerLite API, and to Hunter Morera for help with data collection and coding during the initial part of this project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Gera, P., Ciampaglia, G.L. (2022). Chasing the Wrong Cloud: Mapping the 2019 Vaping Epidemic Using Data from Social Media. In: Thomson, R., Dancy, C., Pyke, A. (eds) Social, Cultural, and Behavioral Modeling. SBP-BRiMS 2022. Lecture Notes in Computer Science, vol 13558. Springer, Cham. https://doi.org/10.1007/978-3-031-17114-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-17114-7_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-17113-0
Online ISBN: 978-3-031-17114-7
eBook Packages: Computer ScienceComputer Science (R0)