Abstract
Data on disease burden is often used for assessing population health, evaluating the effectiveness of interventions, formulating health policies, and planning future resource allocation. We investigated whether Internet usage data, particularly the search volume on Google and page view counts on Wikipedia, are correlated with the disease burden, measured by prevalence and treatment cost, for 1,633 diseases over an 11-year period. We also applied the method of least absolute shrinkage and selection operator (LASSO) to predict the burden of diseases, using those Internet data together with three other variables we quantified previously. We found a relatively strong correlation for 39 of 1,633 diseases, including viral hepatitis, diabetes mellitus, other headache syndromes, multiple sclerosis, sleep apnea, hemorrhoids, and disaccharidase deficiency. However, an accurate analysis must consider each condition’s characteristics, including acute/chronic nature, severity, familiarity to the public, and presence of stigma.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Burden of disease (AIHW) (2016). http://www.aihw.gov.au/burden-of-disease/
Paez, K.A., Zhao, L., Hwang, W.: Rising out-of-pocket spending for chronic conditions: a ten-year trend. Health Aff. 28, 15–25 (2009)
Laing, S., et al.: Mortality from heart disease in a cohort of 23,000 patients with insulin-treated diabetes. Diabetologia 46, 760–765 (2003)
Corbett, E.L., et al.: The growing burden of tuberculosis: global trends and interactions with the HIV epidemic. Arch. Intern. Med. 163, 1009–1021 (2003)
Feigin, V.L., et al.: Global Burden of Diseases, Injuries, and Risk Factors Study 2010 (GBD 2010) and the GBD Stroke Experts Group. Global and Regional Burden of Stroke during 1990-2010: Findings from the Global Burden of Disease Study 2010. Lancet 383, 245–254 (2014)
Melse, J.M., et al.: A national burden of disease calculation: dutch disability-adjusted life-years. dutch burden of disease group. Am. J. Public Health 90, 1241 (2000)
Mathers, C., Vos, T., Stevenson, C.: The Burden of Disease and Injury in Australia. Australian Institute of Health and Welfare (1999)
Thacker, S.B., et al.: Measuring the public’s health. Public Health Rep. 121, 14–22 (2006)
Michaud, C.M., Murray, C.J., Bloom, B.R.: Burden of disease—implications for future research. JAMA 285, 535–539 (2001)
McGinnis, J.M., Foege, W.H.: Actual causes of death in the United States. JAMA 270, 2207–2212 (1993)
Murray, C.J.: Quantifying the burden of disease: the technical basis for disability-adjusted life years. Bull. World Health Org. 72, 429 (1994)
Mason, V., Bridgwood, A.: Methods of Collecting Morbidity Statistics: Revised Report to the Eurostat Task Force on ‘Health and Health-related Survey Data’ (2003)
Clavería, L.E., et al.: Prevalence of parkinson’s disease in cantalejo, Spain: a door-to-door survey. Mov. Disord. 17, 242–249 (2002)
Benito-León, J., et al.: Prevalence of PD and other types of parkinsonism in three elderly populations of central Spain. Mov. Disord. 18, 267–274 (2003)
Errea, J.M., et al.: Prevalence of Parkinson’s disease in lower aragon. Spain. Mov. Disord. 14, 596–604 (1999)
Ginsberg, J., et al.: Detecting influenza epidemics using search engine query data. Nature 457, 1012–1014 (2009)
Moat, H.S., et al.: Quantifying wikipedia usage patterns before stock market moves. Sci. Rep. 3, Article number 1801 (2013)
Yao, L., et al.: Health ROI as a measure of misalignment of biomedical needs and resources. Nat. Biotechnol. 33, 807–811 (2015)
Schuyler, P.L., et al.: The UMLS metathesaurus: representing different views of biomedical concepts. Bull. Med. Libr. Assoc. 81, 217 (1993)
Denny, J.C., et al.: PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics 26, 1205–1210 (2010)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 58, 267–288 (1996)
Friedman, J., Hastie, T., Tibshirani, R.: glmnet: Lasso and Elastic-Net Regularized Generalized Linear Models. R Package Version 1 (2009)
Hartung, D.M., et al.: The cost of multiple sclerosis drugs in the US and the pharmaceutical industry: too big to fail? Neurology 84, 2185–2192 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Qiu, R., Hadzikadic, M., Yao, L. (2017). Estimating Disease Burden Using Google Trends and Wikipedia Data. In: Benferhat, S., Tabia, K., Ali, M. (eds) Advances in Artificial Intelligence: From Theory to Practice. IEA/AIE 2017. Lecture Notes in Computer Science(), vol 10351. Springer, Cham. https://doi.org/10.1007/978-3-319-60045-1_39
Download citation
DOI: https://doi.org/10.1007/978-3-319-60045-1_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60044-4
Online ISBN: 978-3-319-60045-1
eBook Packages: Computer ScienceComputer Science (R0)