skip to main content
10.1145/2433396.2433476acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Modeling the impact of lifestyle on health at scale

Published: 04 February 2013 Publication History

Abstract

Research in computational epidemiology to date has concentrated on estimating summary statistics of populations and simulated scenarios of disease outbreaks. Detailed studies have been limited to small domains, as scaling the methods involved poses considerable challenges. By contrast, we model the associations of a large collection of social and environmental factors with the health of particular individuals. Instead of relying on surveys, we apply scalable machine learning techniques to noisy data mined from online social media and infer the health state of any given person in an automated way. We show that the learned patterns can be subsequently leveraged in descriptive as well as predictive fine-grained models of human health. Using a unified statistical model, we quantify the impact of social status, exposure to pollution, interpersonal interactions, and other important lifestyle factors on one's health. Our model explains more than 54% of the variance in people's health (as estimated from their online communication), and predicts the future health status of individuals with 91% accuracy. Our methods complement traditional studies in life sciences, as they enable us to perform large-scale and timely measurement, inference, and prediction of previously elusive factors that affect our everyday lives.

References

[1]
R. Ader and N. Cohen. Conditioning and immunity. Psychoneuroimmunology, 2:3--34, 2001.
[2]
R. Anderson and R. May. Population biology of infectious diseases: Part I. Nature, 280(5721):361, 1979.
[3]
C. Aperjis and B. Huberman. A market for unbiased private data: Paying individuals according to their privacy attitudes. Arxiv.org, 2012.
[4]
L. Backstrom and J. Leskovec. Supervised random walks: predicting and recommending links in social networks. In WSDM 2011, pages 635--644. ACM, 2011.
[5]
D. Bild, D. Bluemke, G. Burke, R. Detrano, A. Diez Roux, A. Folsom, P. Greenland, R. Kronmal, K. Liu, J. Nelson, et al. Multi-ethnic study of atherosclerosis: objectives and design. American Journal of Epidemiology, 156(9):871--881, 2002.
[6]
L. Breiman et al. Classification and Regression Trees. Chapman & Hall, New York, 1984.
[7]
P. Chen, M. David, and D. Kempe. Better vaccination strategies for better people. In Proceedings of the 11th ACM conference on Electronic commerce, pages 179--188. ACM, 2010.
[8]
E. Cho, S. A. Myers, and J. Leskovec. Friendship and mobility: User movement in location-based social networks. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2011.
[9]
R. Chunara, J. Andrews, and J. Brownstein. Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak. The American Journal of Tropical Medicine and Hygiene, 86(1):39--45, 2012.
[10]
S. Cohen, D. Tyrrell, and A. Smith. Psychological stress and susceptibility to the common cold. New England journal of medicine, 325(9):606--612, 1991.
[11]
N. Collier, N. Son, and N. Nguyen. OMG U got flu? Analysis of shared health messages for bio-surveillance. Journal of Biomedical Semantics, 2011.
[12]
C. Cortes and V. Vapnik. Support-vector networks. Machine learning, 20(3):273--297, 1995.
[13]
A. Culotta. Towards detecting influenza epidemics by analyzing Twitter messages. In Proceedings of the First Workshop on Social Media Analytics, pages 115--122. ACM, 2010.
[14]
F. Destefano, E. Eaker, S. Broste, D. Nordstrom, P. Peissig, R. Vierkant, K. Konitzer, R. Gruber, and P. Layde. Epidemiologic research in an integrated regional medical care system: the marshfield epidemiologic study area. Journal of clinical epidemiology, 49(6):643--652, 1996.
[15]
D. Easley and J. Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, 2010.
[16]
S. Eubank, H. Guclu, V. Anil Kumar, M. Marathe, A. Srinivasan, Z. Toroczkai, and N. Wang. Modelling disease outbreaks in realistic urban social networks. Nature, 429(6988):180--184, 2004.
[17]
C. Freifeld, R. Chunara, S. Mekaru, E. Chan, T. Kass-Hout, A. Iacucci, and J. Brownstein. Participatory epidemiology: use of mobile phones for community-based health reporting. PLoS medicine, 7(12):e1000376, 2010.
[18]
J. Ginsberg, M. Mohebbi, R. Patel, L. Brammer, M. Smolinski, and L. Brilliant. Detecting influenza epidemics using search engine query data. Nature, 457(7232):1012--1014, 2008.
[19]
B. Grenfell, O. Bjornstad, and J. Kappey. Travelling waves and spatial hierarchies in measles epidemics. Nature, 414(6865):716--723, 2001.
[20]
A. Gruzd, B. Wellman, and Y. Takhteyev. Imagining Twitter as an imagined community. In American Behavioral Scientist, Special issue on Imagined Communities, 2011.
[21]
I. Hanski, L. von Hertzen, N. Fyhrquist, K. Koskinen, K. Torppa, T. Laatikainen, P. Karisola, P. Auvinen, L. Paulin, M. M\"akel\"a, et al. Environmental biodiversity, human microbiota, and allergy are interrelated. Proceedings of the National Academy of Sciences, 2012.
[22]
D. Hemenway, B. Kennedy, I. Kawachi, and R. Putnam. Firearm prevalence and social capital. Annals of Epidemiology, 11(7):484--490, 2001.
[23]
T. Joachims. A support vector method for multivariate performance measures. In ICML 2005, pages 377--384. ACM, 2005.
[24]
I. Kawachi. The health of nations: Why inequality is harmful to your health author: Ichiro kawachi, bruce p. kennedy, publisher: New p. 2006.
[25]
M. Krieck, J. Dreesman, L. Otrusina, and K. Denecke. A new age of public health: Identifying disease outbreaks by analyzing tweets. Proceedings of Health WebScience Workshop, ACM Web Science Conference, 2011.
[26]
V. Lampos, T. De Bie, and N. Cristianini. Flu detector-tracking epidemics on Twitter. Machine Learning and Knowledge Discovery in Databases, pages 599--602, 2010.
[27]
M. Marmot. Status syndrome. JAMA: the journal of the American Medical Association, 295(11):1304--1307, 2006.
[28]
M. Marmot, G. Rose, M. Shipley, and P. Hamilton. Employment grade and coronary heart disease in british civil servants. Journal of Epidemiology and Community Health, 32(4):244--249, 1978.
[29]
M. Newman. Spread of epidemic disease on networks. Physical Review E, 66(1):016128, 2002.
[30]
M. Newman. A measure of betweenness centrality based on random walks. Social networks, 27(1):39--54, 2005.
[31]
M. Paul and M. Dredze. A model for mining public health topics from Twitter. Technical Report. Johns Hopkins University. 2011., 2011.
[32]
M. Paul and M. Dredze. You are what you tweet: Analyzing Twitter for public health. In Fifth International AAAI Conference on Weblogs and Social Media, 2011.
[33]
D. Q. Rich, H. M. Kipen, W. Huang, G. Wang, Y. Wang, P. Zhu, P. Ohman-Strickland, M. Hu, C. Philipp, S. R. Diehl, S.-E. Lu, J. Tong, J. Gong, D. Thomas, T. Zhu, and J. J. Zhang. Association between changes in air pollution levels during the beijing olympics and biomarkers of inflammation and thrombosis in healthy young adultsair pollution, inflammation, and thrombosis. The Journal of the American Medical Association, 307(19):2068--2078, 2012.
[34]
A. Sadilek, H. Kautz, and J. P. Bigham. Finding your friends and following them to where you are. In Fifth ACM International Conference on Web Search and Data Mining, 2012. (Best Paper Award).
[35]
A. Sadilek, H. Kautz, and V. Silenzio. Modeling spread of disease from social interactions. In Sixth AAAI International Conference on Weblogs and Social Media (ICWSM), 2012.
[36]
A. Sadilek, H. Kautz, and V. Silenzio. Predicting disease transmission from geo-tagged micro-blog data. In Twenty-Sixth AAAI Conference on Artificial Intelligence, 2012.
[37]
R. Sapolsky. Social status and health in humans and other animals. Annual Review of Anthropology, pages 393--418, 2004.
[38]
S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient solver for svm. In Proceedings of the 24th international conference on Machine learning, pages 807--814. ACM, 2007.
[39]
A. Smith. Pew internet & american life project. http://pewresearch.org/pubs/2007/twitter-users-cell-phone-2011-demographics, 2011.
[40]
C. W. Thompson, J. Roe, P. Aspinall, R. Mitchell, A. Clow, and D. Miller. More green space is linked to less stress in deprived communities: Evidence from salivary cortisol patterns. Landscape and Urban Planning, 105(3):221 -- 229, 2012.
[41]
B. I. Truman et al. CDC health disparities and inequalities report. Morbidity and Mortality Weekly Report, 2011.
[42]
J. Tung, L. Barreiro, Z. Johnson, K. Hansen, V. Michopoulos, D. Toufexis, K. Michelini, M. Wilson, and Y. Gilad. Social environment is associated with gene regulatory variation in the rhesus macaque immune system. Proceedings of the National Academy of Sciences, 2012.
[43]
C. Virgin Jr and R. Sapolsky. Styles of male social behavior and their endocrine correlates among low-ranking baboons. American Journal of Primatology, 42(1):25--39, 1997.
[44]
R. Wilkinson. Mind the gap. Weidenfeld & Nicolson, 2000.
[45]
H. Zou and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2):301--320, 2005.

Cited By

View all
  • (2023)Bayesian joint spatial modelling of anemia and malaria in GuineaAIMS Mathematics10.3934/math.20231458:2(2763-2782)Online publication date: 2023
  • (2023)Crime, inequality and public health: a survey of emerging trends in urban data scienceFrontiers in Big Data10.3389/fdata.2023.11245266Online publication date: 25-May-2023
  • (2023)Identifying latent activity behaviors and lifestyles using mobility data to describe urban dynamicsEPJ Data Science10.1140/epjds/s13688-023-00390-w12:1Online publication date: 18-May-2023
  • Show More Cited By

Index Terms

  1. Modeling the impact of lifestyle on health at scale

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WSDM '13: Proceedings of the sixth ACM international conference on Web search and data mining
    February 2013
    816 pages
    ISBN:9781450318693
    DOI:10.1145/2433396
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 February 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. computational epidemiology
    2. geo-temporal modeling
    3. machine learning
    4. online social networks
    5. ubiquitous computing

    Qualifiers

    • Research-article

    Conference

    WSDM 2013

    Acceptance Rates

    Overall Acceptance Rate 498 of 2,863 submissions, 17%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)19
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 01 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Bayesian joint spatial modelling of anemia and malaria in GuineaAIMS Mathematics10.3934/math.20231458:2(2763-2782)Online publication date: 2023
    • (2023)Crime, inequality and public health: a survey of emerging trends in urban data scienceFrontiers in Big Data10.3389/fdata.2023.11245266Online publication date: 25-May-2023
    • (2023)Identifying latent activity behaviors and lifestyles using mobility data to describe urban dynamicsEPJ Data Science10.1140/epjds/s13688-023-00390-w12:1Online publication date: 18-May-2023
    • (2022)Natural Language Processing for Social MediaundefinedOnline publication date: 24-Mar-2022
    • (2022)Natural Language Processing for Social MediaundefinedOnline publication date: 21-Mar-2022
    • (2021)Measuring spatio-textual affinities in twitter between two urban metropolisesJournal of Computational Social Science10.1007/s42001-021-00129-55:1(227-252)Online publication date: 2-Jun-2021
    • (2020)SmokingOppProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/33809874:1(1-26)Online publication date: 14-Sep-2020
    • (2019)Türkiye’de Hastanelerin Instagram Kullanımı: Medical Park, Acıbadem ve Memorial Sağlık Grupları ÖrneğiInstagram Usage of Hospitals in Turkey: Case Study of Medical Park, Acıbadem and Memorial Health GroupsErciyes İletişim Dergisi10.17680/erciyesiletisim.4955136:2(1309-1324)Online publication date: 22-Jul-2019
    • (2018)Using Twitter for Public Health Surveillance from Monitoring and Prediction to Public ResponseData10.3390/data40100064:1(6)Online publication date: 29-Dec-2018
    • (2018)Dynamics and Prediction of Clicks on News from TwitterProceedings of the 29th on Hypertext and Social Media10.1145/3209542.3209568(210-214)Online publication date: 3-Jul-2018
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media