Abstract
Twitter data has been shown broadly applicable for public health surveillance. Previous public heath studies based on Twitter data have largely relied on keyword-matching or topic models for clustering relevant tweets. However, both methods suffer from the short-length of texts and unpredictable noise that naturally occurs in user-generated contexts. In response, we introduce a deep learning approach that uses hashtags as a form of supervision and learns tweet embeddings for extracting informative textual features. In this case study, we address the specific task of estimating state-level obesity from dietary-related textual features. Our approach yields an estimation that strongly correlates the textual features to government data and outperforms the keyword-matching baseline. The results also demonstrate the potential of discovering risk factors using the textual features. This method is general-purpose and can be applied to a wide range of Twitter-based public health studies.
This work was conducted during the first author’s research intern at NYU Center for Data Science.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
App Spring Inc. List Challenges: Food, https://www.listchallenges.com/lists/food.
References
United states department of agriculture. national nutrient database (2014). http://ndb.nal.usda.gov/ndb/search/list?format=&count=&max=25&sort=&fg=&man=&lfacet=&qlookup=&offset=50
Abbar, S., Mejova, Y., Weber, I.: You tweet what you eat: studying food consumption through twitter. In: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pp. 3197–3206. ACM (2015)
Bird, S., Klein, E., Loper, E.: Natural language processing with python, July 2009
Chamberlain, B.P., Humby, C., Deisenroth, M.P.: Probabilistic inference of Twitter users’ age based on what they follow. In: Altun, Y., et al. (eds.) ECML PKDD 2017, Part III. LNCS (LNAI), vol. 10536, pp. 191–203. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71273-4_16
Culotta, A.: Towards detecting influenza epidemics by analyzing Twitter messages. In: Proceedings of the First Workshop on Social Media Analytics, pp. 115–122. ACM (2010)
Ghosh, D., Guha, R.: What are we ‘tweeting’about obesity? Mapping tweets with topic modeling and geographic information system. Cartogr. Geogr. Inf. Sci. 40(2), 90–102 (2013)
Jordan, S., Hovet, S., Fung, I., Liang, H., King-Wa, F., Tse, Z.: Using Twitter for public health surveillance from monitoring and prediction to public response. Data 4(1), 6 (2019)
Nguyen, Q.C., et al.: Building a national neighborhood dataset from geotagged twitter data for indicators of happiness, diet, and physical activity. JMIR Public Health Surveill. 2(2), e158 (2016)
Nguyen, Q.C., et al.: Twitter-derived neighborhood characteristics associated with obesity and diabetes. Sci. Rep. 7(1), 16425 (2017)
Paul, M.J., Dredze, M.: You are what you tweet: analyzing twitter for public health. In: Fifth International AAAI Conference on Weblogs and Social Media (2011)
Paul, M.J., Dredze, M.: Discovering health topics in social media using topic models. PLoS ONE 9(8), e103408 (2014)
Prier, K.W., Smith, M.S., Giraud-Carrier, C., Hanson, C.L.: Identifying health-related topics on Twitter. In: Salerno, J., Yang, S.J., Nau, D., Chai, S.-K. (eds.) SBP 2011. LNCS, vol. 6589, pp. 18–25. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19656-0_4
Sarma, K.V., Spiegel, B.M.R., Reid, M.W., Chen, S., Merchant, R.M., Seltzer, E., Arnold, C.W.: Estimating the health-related quality of life of twitter users using semantic processing. Stud. Health Technol. Inf. 264, 1065–1069 (2019)
Sridhar, V.K.R.: Unsupervised topic modeling for short texts using distributed representations of words. In: Proceedings of the 1st workshop on vector space modeling for natural language processing, pp. 192–200 (2015)
Weston, J., Chopra, S., Adams, K.: # tagspace: semantic embeddings from hashtags. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1822–1827 (2014)
Wu, L.Y., Fisch, A., Chopra, S., Adams, K., Bordes, A., Weston, J.: Starspace: embed all the things! In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Zou, B., Lampos, V., Gorton, R., Cox, I.J.: On infectious intestinal disease surveillance using social media content. In: Proceedings of the 6th International Conference on Digital Health Conference, pp. 157–161. ACM (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, X., Athanasiadou, R., Razavian, N. (2021). Tracing State-Level Obesity Prevalence from Sentence Embeddings of Tweets: A Feasibility Study. In: Gadepally, V., et al. Heterogeneous Data Management, Polystores, and Analytics for Healthcare. DMAH Poly 2020 2020. Lecture Notes in Computer Science(), vol 12633. Springer, Cham. https://doi.org/10.1007/978-3-030-71055-2_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-71055-2_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71054-5
Online ISBN: 978-3-030-71055-2
eBook Packages: Computer ScienceComputer Science (R0)