Skip to main content

Estimating Disease Burden Using Google Trends and Wikipedia Data

  • Conference paper
  • First Online:
Advances in Artificial Intelligence: From Theory to Practice (IEA/AIE 2017)

Abstract

Data on disease burden is often used for assessing population health, evaluating the effectiveness of interventions, formulating health policies, and planning future resource allocation. We investigated whether Internet usage data, particularly the search volume on Google and page view counts on Wikipedia, are correlated with the disease burden, measured by prevalence and treatment cost, for 1,633 diseases over an 11-year period. We also applied the method of least absolute shrinkage and selection operator (LASSO) to predict the burden of diseases, using those Internet data together with three other variables we quantified previously. We found a relatively strong correlation for 39 of 1,633 diseases, including viral hepatitis, diabetes mellitus, other headache syndromes, multiple sclerosis, sleep apnea, hemorrhoids, and disaccharidase deficiency. However, an accurate analysis must consider each condition’s characteristics, including acute/chronic nature, severity, familiarity to the public, and presence of stigma.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Burden of disease (AIHW) (2016). http://www.aihw.gov.au/burden-of-disease/

  2. Paez, K.A., Zhao, L., Hwang, W.: Rising out-of-pocket spending for chronic conditions: a ten-year trend. Health Aff. 28, 15–25 (2009)

    Article  Google Scholar 

  3. Laing, S., et al.: Mortality from heart disease in a cohort of 23,000 patients with insulin-treated diabetes. Diabetologia 46, 760–765 (2003)

    Article  Google Scholar 

  4. Corbett, E.L., et al.: The growing burden of tuberculosis: global trends and interactions with the HIV epidemic. Arch. Intern. Med. 163, 1009–1021 (2003)

    Article  Google Scholar 

  5. Feigin, V.L., et al.: Global Burden of Diseases, Injuries, and Risk Factors Study 2010 (GBD 2010) and the GBD Stroke Experts Group. Global and Regional Burden of Stroke during 1990-2010: Findings from the Global Burden of Disease Study 2010. Lancet 383, 245–254 (2014)

    Article  Google Scholar 

  6. Melse, J.M., et al.: A national burden of disease calculation: dutch disability-adjusted life-years. dutch burden of disease group. Am. J. Public Health 90, 1241 (2000)

    Article  Google Scholar 

  7. Mathers, C., Vos, T., Stevenson, C.: The Burden of Disease and Injury in Australia. Australian Institute of Health and Welfare (1999)

    Google Scholar 

  8. Thacker, S.B., et al.: Measuring the public’s health. Public Health Rep. 121, 14–22 (2006)

    Google Scholar 

  9. Michaud, C.M., Murray, C.J., Bloom, B.R.: Burden of disease—implications for future research. JAMA 285, 535–539 (2001)

    Article  Google Scholar 

  10. McGinnis, J.M., Foege, W.H.: Actual causes of death in the United States. JAMA 270, 2207–2212 (1993)

    Article  Google Scholar 

  11. Murray, C.J.: Quantifying the burden of disease: the technical basis for disability-adjusted life years. Bull. World Health Org. 72, 429 (1994)

    Google Scholar 

  12. Mason, V., Bridgwood, A.: Methods of Collecting Morbidity Statistics: Revised Report to the Eurostat Task Force on ‘Health and Health-related Survey Data’ (2003)

    Google Scholar 

  13. Clavería, L.E., et al.: Prevalence of parkinson’s disease in cantalejo, Spain: a door-to-door survey. Mov. Disord. 17, 242–249 (2002)

    Article  Google Scholar 

  14. Benito-León, J., et al.: Prevalence of PD and other types of parkinsonism in three elderly populations of central Spain. Mov. Disord. 18, 267–274 (2003)

    Article  Google Scholar 

  15. Errea, J.M., et al.: Prevalence of Parkinson’s disease in lower aragon. Spain. Mov. Disord. 14, 596–604 (1999)

    Article  Google Scholar 

  16. Ginsberg, J., et al.: Detecting influenza epidemics using search engine query data. Nature 457, 1012–1014 (2009)

    Article  Google Scholar 

  17. Moat, H.S., et al.: Quantifying wikipedia usage patterns before stock market moves. Sci. Rep. 3, Article number 1801 (2013)

    Google Scholar 

  18. Yao, L., et al.: Health ROI as a measure of misalignment of biomedical needs and resources. Nat. Biotechnol. 33, 807–811 (2015)

    Article  Google Scholar 

  19. Schuyler, P.L., et al.: The UMLS metathesaurus: representing different views of biomedical concepts. Bull. Med. Libr. Assoc. 81, 217 (1993)

    Google Scholar 

  20. Denny, J.C., et al.: PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics 26, 1205–1210 (2010)

    Article  Google Scholar 

  21. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 58, 267–288 (1996)

    Google Scholar 

  22. Friedman, J., Hastie, T., Tibshirani, R.: glmnet: Lasso and Elastic-Net Regularized Generalized Linear Models. R Package Version 1 (2009)

    Google Scholar 

  23. Hartung, D.M., et al.: The cost of multiple sclerosis drugs in the US and the pharmaceutical industry: too big to fail? Neurology 84, 2185–2192 (2015)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lixia Yao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Qiu, R., Hadzikadic, M., Yao, L. (2017). Estimating Disease Burden Using Google Trends and Wikipedia Data. In: Benferhat, S., Tabia, K., Ali, M. (eds) Advances in Artificial Intelligence: From Theory to Practice. IEA/AIE 2017. Lecture Notes in Computer Science(), vol 10351. Springer, Cham. https://doi.org/10.1007/978-3-319-60045-1_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-60045-1_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-60044-4

  • Online ISBN: 978-3-319-60045-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics