Abstract
Population health analytics is fundamental to developing responsive public health promotion programs. A traditional method to interpret health statistics at population level is analyzing data aggregated from individuals, typically through telephone surveys. Recent studies have found that social media can be utilized as an alternative population health surveillance system, providing quality and timely data at virtually no cost. In this paper, we further investigate the use of social media to the task of population health estimation, based on a graph neural network approach. Specifically, we first introduce a graph modeling method to construct the representation of each county as a graph of interactions between health-related features in the community. We then adopt a graph neural network model to learn the population health representation, ended by a regression layer, to estimate the health indices. We validate our proposed method by large-scale experiments on Twitter data for the task of predicting health indices of the US counties. Empirical results show a significant correlation with the reported health statistics, up to a Spearman correlation coefficient (\(\rho \)) value of 0.69, and that our graph-based approach outperforms the existing methods. These promising results also suggest potential application of graph-based models to a range of societal-level analytics tasks through social media.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Andalibi, N., Ozturk, P., Forte, A.: Depression-related imagery on Instagram. In: Proceedings of the ACM Conference Companion on Computer Supported Cooperative Work & Social Computing, pp. 231–234 (2015)
Atwood, J., Towsley, D.: Diffusion-convolutional neural networks. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 1993–2001 (2016)
Bagroy, S., Kumaraguru, P., De Choudhury, M.: A social media based index of mental well-being in college campuses. In: Proceedings of the CHI Conference on Human factors in Computing Systems, pp. 1634–1646. ACM (2017)
Belkin, M., Niyogi, P.: Towards a theoretical foundation for Laplacian-based manifold methods. In: Proceedings of the International Conference on Computational Learning Theory, pp. 486–500 (2005)
Bruna, J., Zaremba, W., Szlam, A., LeCun, Y.: Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203 (2013)
Chen, M.K.: The effect of language on economic behavior: evidence from savings rates, health behaviors, and retirement assets. Am. Econ. Rev. 103(2), 690–731 (2013)
Culotta, A.: Estimating county health statistics with Twitter. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1335–1344 (2014)
De Choudhury, M., Counts, S., Horvitz, E.: Social media as a measurement tool of depression in populations. In: Proceedings of the Annual ACM Web Science Conference, pp. 47–56 (2013)
De Choudhury, M., Gamon, M., Counts, S., Horvitz, E.: Predicting depression via social media. In: Proceedings of the International AAAI Conference on Weblogs and Social Media, pp. 128–137 (2013)
Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 3844–3852 (2016)
Dittrich, J., Quiané-Ruiz, J.-A.: Efficient big data processing in Hadoop MapReduce. Proc. VLDB Endow. 5(12), 2014–2015 (2012)
Dredze, M., Paul, M.J.: Natural language processing for health and social media. IEEE Intell. Syst. 29(2), 64–67 (2014)
Gottschalk, L.A., Gleser, G.C.: The Measurement of Psychological States Through the Content Analysis of Verbal Behavior. University of California Press, Berkeley (1979)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR (2017)
Mowery, D., Bryan, C., Conway, M.: Feature studies to inform the classification of depressive symptoms from Twitter data for population health. arXiv:1701.08229 (2017)
Nguyen, T., et al.: Using spatiotemporal distribution of geocoded Twitter data to predict US county-level health indices. Future Gener. Comput. Syst. (2018)
Nguyen, T., et al.: Kernel-based features for predicting population health indices from geocoded social media data. Decis. Support Syst. 102, 22–31 (2017)
Nguyen, T., et al.: Prediction of population health indices from social media using kernel-based textual and temporal features. In: Proceedings of the International Conference on World Wide Web Companion, pp. 99–107 (2017)
Niepert, M., Ahmed, M., Kutzkov, K.: Learning convolutional neural networks for graphs. In: Proceedings of the International Conference on Machine Learning, pp. 2014–2023 (2016)
Paul, M.J., Dredze, M.: You are what you tweet: analysing Twitter for public health. In: Processing of the International AAAI Conference on Weblogs and Social Media (2011)
Paul, M.J., Dredze, M.: A model for mining public health topics from Twitter. Health 11, 16–6 (2012)
Pennebaker, J.W., Beall, S.K.: Confronting a traumatic event: toward an understanding of inhibition and disease. J. Abnorm. Psychol. 95(3), 274 (1986)
Pennebaker, J.W., Booth, R.J., Boyd, R.L., Francis, M.E.: Linguistic Inquiry and Word Count: LIWC 2015 [Computer software]. Pennebaker Conglomerates Inc. (2015)
Pennebaker, J.W., Francis, M.E., Booth, R.J.: Linguistic inquiry and word count: LIWC 2001. Mahway: Lawrence Erlbaum Associates, vol. 71, no. 2001, p. 2001 (2001)
Pennebaker, J.W., Mehl, M.R., Niederhoffer, K.G.: Psychological aspects of natural language use: our words, our selves. Ann. Rev. Psychol. 54(1), 547–577 (2003)
Reece, A.G., Danforth, C.M.: Instagram photos reveal predictive markers of depression. EPJ Data Sci. 6(1), 15 (2017)
Salathe, M., et al.: Digital epidemiology. PLoS Comput. Biol. 8(7), e1002616 (2012)
Schwartz, H.A., et al.: Characterizing geographic variation in well-being using tweets. In: Proceedings of the International AAAI Conference on Weblogs and Social Media, pp. 583–591 (2013)
Veličković, P., Cucurull, G., Casanova, A., Lio, P., Bengio, Y., Romero, A.: Graph attention networks. In: ICLR (2018)
Von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)
Xu, K., Hu, W., Leskovec, J., Jegelka, S.: How powerful are graph neural networks? In: ICLR (2019)
Zaharia, M., et al.: Fast and interactive analytics over Hadoop data with Spark. Usenix Login 37(4), 45–51 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Nguyen, H., Nguyen, D.T., Nguyen, T. (2019). Estimating County Health Indices Using Graph Neural Networks. In: Le, T., et al. Data Mining. AusDM 2019. Communications in Computer and Information Science, vol 1127. Springer, Singapore. https://doi.org/10.1007/978-981-15-1699-3_6
Download citation
DOI: https://doi.org/10.1007/978-981-15-1699-3_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-1698-6
Online ISBN: 978-981-15-1699-3
eBook Packages: Computer ScienceComputer Science (R0)