Skip to main content

Tracing State-Level Obesity Prevalence from Sentence Embeddings of Tweets: A Feasibility Study

  • Conference paper
  • First Online:
Heterogeneous Data Management, Polystores, and Analytics for Healthcare (DMAH 2020, Poly 2020)

Abstract

Twitter data has been shown broadly applicable for public health surveillance. Previous public heath studies based on Twitter data have largely relied on keyword-matching or topic models for clustering relevant tweets. However, both methods suffer from the short-length of texts and unpredictable noise that naturally occurs in user-generated contexts. In response, we introduce a deep learning approach that uses hashtags as a form of supervision and learns tweet embeddings for extracting informative textual features. In this case study, we address the specific task of estimating state-level obesity from dietary-related textual features. Our approach yields an estimation that strongly correlates the textual features to government data and outperforms the keyword-matching baseline. The results also demonstrate the potential of discovering risk factors using the textual features. This method is general-purpose and can be applied to a wide range of Twitter-based public health studies.

This work was conducted during the first author’s research intern at NYU Center for Data Science.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://developer.twitter.com/.

  2. 2.

    App Spring Inc. List Challenges: Food, https://www.listchallenges.com/lists/food.

References

  1. United states department of agriculture. national nutrient database (2014). http://ndb.nal.usda.gov/ndb/search/list?format=&count=&max=25&sort=&fg=&man=&lfacet=&qlookup=&offset=50

  2. Abbar, S., Mejova, Y., Weber, I.: You tweet what you eat: studying food consumption through twitter. In: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pp. 3197–3206. ACM (2015)

    Google Scholar 

  3. Bird, S., Klein, E., Loper, E.: Natural language processing with python, July 2009

    Google Scholar 

  4. Chamberlain, B.P., Humby, C., Deisenroth, M.P.: Probabilistic inference of Twitter users’ age based on what they follow. In: Altun, Y., et al. (eds.) ECML PKDD 2017, Part III. LNCS (LNAI), vol. 10536, pp. 191–203. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71273-4_16

    Chapter  Google Scholar 

  5. Culotta, A.: Towards detecting influenza epidemics by analyzing Twitter messages. In: Proceedings of the First Workshop on Social Media Analytics, pp. 115–122. ACM (2010)

    Google Scholar 

  6. Ghosh, D., Guha, R.: What are we ‘tweeting’about obesity? Mapping tweets with topic modeling and geographic information system. Cartogr. Geogr. Inf. Sci. 40(2), 90–102 (2013)

    Article  Google Scholar 

  7. Jordan, S., Hovet, S., Fung, I., Liang, H., King-Wa, F., Tse, Z.: Using Twitter for public health surveillance from monitoring and prediction to public response. Data 4(1), 6 (2019)

    Article  Google Scholar 

  8. Nguyen, Q.C., et al.: Building a national neighborhood dataset from geotagged twitter data for indicators of happiness, diet, and physical activity. JMIR Public Health Surveill. 2(2), e158 (2016)

    Article  Google Scholar 

  9. Nguyen, Q.C., et al.: Twitter-derived neighborhood characteristics associated with obesity and diabetes. Sci. Rep. 7(1), 16425 (2017)

    Article  Google Scholar 

  10. Paul, M.J., Dredze, M.: You are what you tweet: analyzing twitter for public health. In: Fifth International AAAI Conference on Weblogs and Social Media (2011)

    Google Scholar 

  11. Paul, M.J., Dredze, M.: Discovering health topics in social media using topic models. PLoS ONE 9(8), e103408 (2014)

    Article  Google Scholar 

  12. Prier, K.W., Smith, M.S., Giraud-Carrier, C., Hanson, C.L.: Identifying health-related topics on Twitter. In: Salerno, J., Yang, S.J., Nau, D., Chai, S.-K. (eds.) SBP 2011. LNCS, vol. 6589, pp. 18–25. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19656-0_4

    Chapter  Google Scholar 

  13. Sarma, K.V., Spiegel, B.M.R., Reid, M.W., Chen, S., Merchant, R.M., Seltzer, E., Arnold, C.W.: Estimating the health-related quality of life of twitter users using semantic processing. Stud. Health Technol. Inf. 264, 1065–1069 (2019)

    Google Scholar 

  14. Sridhar, V.K.R.: Unsupervised topic modeling for short texts using distributed representations of words. In: Proceedings of the 1st workshop on vector space modeling for natural language processing, pp. 192–200 (2015)

    Google Scholar 

  15. Weston, J., Chopra, S., Adams, K.: # tagspace: semantic embeddings from hashtags. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1822–1827 (2014)

    Google Scholar 

  16. Wu, L.Y., Fisch, A., Chopra, S., Adams, K., Bordes, A., Weston, J.: Starspace: embed all the things! In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)

    Google Scholar 

  17. Zou, B., Lampos, V., Gorton, R., Cox, I.J.: On infectious intestinal disease surveillance using social media content. In: Proceedings of the 6th International Conference on Digital Health Conference, pp. 157–161. ACM (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Narges Razavian .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, X., Athanasiadou, R., Razavian, N. (2021). Tracing State-Level Obesity Prevalence from Sentence Embeddings of Tweets: A Feasibility Study. In: Gadepally, V., et al. Heterogeneous Data Management, Polystores, and Analytics for Healthcare. DMAH Poly 2020 2020. Lecture Notes in Computer Science(), vol 12633. Springer, Cham. https://doi.org/10.1007/978-3-030-71055-2_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-71055-2_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-71054-5

  • Online ISBN: 978-3-030-71055-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics