Abstract
Machine learning algorithms play a vital role in the prediction of an outbreak of diseases based on climate change. Dengue outbreak is caused by improper maintenance of water storages, lack of urbanization, deforestation, lack of vaccination and awareness. Moreover, a number of dengue cases are varying based on climate season. There is a need to develop the prediction model for modeling the dengue outbreak based climate change. To model the dengue outbreak, Gaussian process regression (GPR) model is applied in this paper that uses the seasonal average of various climate parameters such as maximum temperature, minimum temperature, precipitation, wind, relative humidity and solar. The number of dengue cases and climate data for each block of Tamil Nadu, India are collected from Integrated Disease Surveillance Project and Global Weather Data for SWAT Inc respectively. Local Moran’s I spatial autocorrelation is used in this paper for geographical visualization of hotspot regions. The outbreak of dengue and its hot spot regions are geographically visualized with the help of ArcGIS 10.1 software. The day wise big climate data is collected and stored in the Hadoop cluster computing environment. MapReduce framework is used to reduce the day wise climate data into seasonal climate averages such as winter, summer, and monsoon. The seasonal climate data and number of dengue incidence (health data) are integrated based on the geo-location (latitude and longitude). GPR is used to develop the prediction model for dengue based on the integrated data (climate and health data). The proposed Gaussian process based prediction model is compared with various machine learning approaches such as multiple regression, support vector machine and random forests. Experimental results demonstrate the effectiveness of our Gaussian process based prediction framework.
Similar content being viewed by others
References
Tanner, L., Schreiber, M., Low, J.G., Ong, A., Tolfvenstam, T., Lai, Y.L., Ng, L.C., Leo, Y.S., Puong, L.T., Vasudevan, S.G., Simmons, C.P.: Decision tree algorithms predict the diagnosis and outcome of dengue fever in the early phase of illness. PLoS Negl Trop Dis. 2(3), e196 (2008)
Gharbi, M., Quenel, P., Gustave, J., Cassadou, S., La Ruche, G., Girdary, L., Marrama, L.: Time series analysis of dengue incidence in Guadeloupe, French West Indies: forecasting models using climate variables as predictors. BMC Infect. Dis. 11(1), 1 (2011)
Eisen, L., Eisen, R.J.: Using geographic information systems and decision support systems for the prediction, prevention, and control of vector-borne diseases. Annu. Rev. Entomol. 7(56), 41–61 (2011)
Buczak, A.L., Koshute, P.T., Babin, S.M., Feighner, B.H., Lewis, S.H.: A data-driven epidemiological prediction method for dengue outbreaks using local and remote sensing data. BMC Med. Inform. Decis. Mak. 12(1), 1 (2012)
Chadwick, D., Arch, B., Wilder-Smith, A., Paton, N.: Distinguishing dengue fever from other infections on the basis of simple clinical and laboratory features: application of logistic regression analysis. J. Clin. Virol. 35(2), 147–53 (2006)
Rogers, D.J., Suk, J.E., Semenza, J.C.: Using global maps to predict the risk of dengue in Europe. Acta Trop. 31(129), 1–4 (2014)
Lopez, D., Gunasekaran, M.: Assessment of vaccination strategies using fuzzy multi-criteria decision making. In: Proceedings of the Fifth International Conference on Fuzzy and Neuro Computing (FANCCO-2015), pp. 195–208. Springer, New York (2015)
Lopez, D., Gunasekaran, M., Murugan, B.S., Kaur, H., Abbas, K.M.: Spatial big data analytics of influenza epidemic in Vellore, India. In: IEEE International Conference on InBig Data (Big Data), pp. 19–24 (2014)
Lopez, D., Sekaran, G.: Climate change and disease dynamics—a big data perspective. Int. J. Infect. Dis. 45, 23–24 (2016)
Pfeiffer, D.U., Stevens, K.B.: Spatial and temporal epidemiological analysis in the big data era. Prev. Vet. Med. 122(1), 213–20 (2015)
Pickard, B.R., Baynes, J., Mehaffey, M., Neale, A.C.: Translating big data into big climate ideas. Solutions 6(1), 64–73 (2015)
Schnase, J.L., Duffy, D.Q., Tamkin, G.S., Nadeau, D., Thompson, J.H., Grieg, C.M., McInerney, M.A., Webster, W.P.: MERRA analytic services: meeting the big data challenges of climate science through cloud-enabled climate analytics-as-a-service. Environ. Urban Syst. Comput. 61, 198–211 (2014)
Faghmous, J.H., Kumar, V.: A big data guide to understanding climate change: The case for theory-guided data science. Big Data 2(3), 155–163 (2014)
Lee, J.G., Kang, M.: Geospatial big data: challenges and opportunities. Big Data Res. 2(2), 74–81 (2015)
Nativi, S., Mazzetti, P., Santoro, M., Papeschi, F., Craglia, M., Ochiai, O.: Big data challenges in building the global earth observation system of systems. Environ. Model. Softw. 30(68), 1–26 (2015)
Groves, P., Kayyali, B., Knott, D., Van Kuiken, S.: The ‘big data’ revolution in healthcare. McKinsey Q. (2013)
Chawla, N.V., Davis, D.A.: Bringing big data to personalized healthcare: a patient-centered framework. J. Gen. Intern. Med. 28(3), 660–665 (2013)
Edlund, S.B., Davis, M.A., Kaufman, J.H.: The spatiotemporal epidemiological modeler. In: Proceedings of the 1st ACM International Health Informatics Symposium 2010 Nov 11, pp. 817–820. ACM
Seo, S., Wallat, M., Graepel, T., Obermayer, K., Gaussian process regression: Active data selection and test point rejection. In: Mustererkennung, pp. 27–34. Springer, Berlin (2000)
Albinati, J., Meira, Jr., W., Pappa, G.L.: An accurate gaussian process-based early warning system for dengue fever. arXiv:1608.03343 (2016)
Stegle, O., Fallert, S.V., MacKay, D.J., Brage, S.: Gaussian process robust regression for noisy heart rate data. IEEE Trans. Biomed. Eng. 55(9), 2143–2151 (2008)
Vathsangam, H., Emken, A., Spruijt-Metz, D., Sukhatme, G.S.: Toward free-living walking speed estimation using gaussian process-based regression with on-body accelerometers and gyroscopes. In: IEEE 2010 4th International Conference on Pervasive Computing Technologies for Healthcare 2010 Mar 22, pp. 1–8
Chandola, V., Vatsavai, R.R.: A scalable gaussian process analysis algorithm for biomass monitoring. Stat. Anal. Data Min. 4(4), 430–445 (2011)
Höhle, M.: Additive-multiplicative regression models for spatio-temporal epidemics. Biom. J. 51(6), 961–978 (2009)
Pang, J., Liu, D., Liao, H., Peng, Y., Peng, X.: Anomaly detection based on data stream monitoring and prediction with improved Gaussian process regression algorithm. In: IEEE Conference on Prognostics and Health Management (PHM), Jun 22, pp. 1–7 (2014)
Haran, M., Bhat, K.S., Molineros, J., De Wolf, E.: Estimating the risk of a crop epidemic from coincident spatio-temporal processes. J. Agric. Biol. Environ. Stat. 15(2), 158–175 (2010)
Dengue Fever Vaccine Program. Globalvaccines.org. http://www.globalvaccines.org/content/dengue+fever+vaccine+program/19615 (2016). Accessed 16 Sept 2016
WHO. Who.int. http://www.who.int/tdr/publications (2016). Accessed 16 Sept 2016
National Programmes under NRHM, Annual Report 2013-14. MOHFW. http://www.mohfw.nic.in/WriteReadData/l892s/Chapter615.pdf (2016). Accessed 16 Sept 2016
Herriman, R.: India: Dengue cases double, malaria cases down in 2015 | Outbreak News Today. Outbreaknewstoday.com. http://outbreaknewstoday.com/india-dengue-cases-double-malaria-cases-down-in-2015-2015/ (2016). Accessed 16 Sept 2016
Nath, D.: Dengue cases: Delhi sets record in 20 years. The Hindu. http://www.thehindu.com/news/cities/Delhi/dengue-cases-capital-sets-record-in-20-years/article7767767.ece (2015). Accessed 16 Sept 2016
India, P.: Delhi Faces Worst Dengue Outbreak Since 1996. Over 12,000 Cases Reported. NDTV.com. http://www.ndtv.com/delhi-news/delhi-faces-worst-dengue-outbreak-since-1996-over-12-000-cases-reported-1232700 (2016). 16 Sept 2016
Victor, T. J., Malathi, M., Asokan, R., Padmanaban, P.: Laboratory-based dengue fever surveillance in Tamil Nadu, India. Indian J. Med. Res. 126(2), 112 (2007)
NVBDCP | National Vector Borne Disease Control Programme. Nvbdcp.gov.in. http://nvbdcp.gov.in/den-cd.html (2016). Accessed 16 Sept 2016
Manogaran, G., Thota, C., Kumar, M.V.: MetaCloudDataStorage architecture for big data security in cloud computing. Procedia Comput. Sci. 31(87), 128–133 (2016)
Manogaran, G., Thota, C., Lopez, D., Vijayakumar, V., Abbas, K.M., Sundarsekar, R.: Big data knowledge system in healthcare. In: Internet of Things and Big Data Technologies for Next Generation Healthcare 2017, pp. 133–157. Springer, Berlin
Manogaran, G., Lopez, D.: Disease surveillance system for big climate data processing and dengue transmission. Int. J. Ambient Comput. Intell. 8(2), 88–105 (2017)
Gunasekaran, P., Kaveri, K., Mohana, S., Arunagiri, K., Babu, B.S., Priya, P.P., Kiruba, R., Kumar, V.S., Sheriff, A.K.: Dengue disease status in Chennai (2006–2008): a retrospective analysis. Indian J. Med. Res. 133(3), 322 (2011)
Bhuvaneswari, C., Raja, R., Arunagiri, K., Mohana, S., Sathiyamurthy, K., Krishnasamy, K., Gunasekaran, P.: Dengue epidemiology in Thanjavur and Trichy district, Tamilnadu-Jan 2011-Dec 2011. Indian J. Med. Sci. 65(6), 260 (2011)
Anuradha, M., Dandekar, R.H., Banoo, S.: Laboratory diagnosis and incidence of Dengue virus infection: a hospital based study. Perambalur. Int. J. Biomed. Res. 5(3), 207–210 (2014)
Lopez, D., Manogaran, G.: Big Data Architecture for Climate Change and Disease Dynamics. CRC Press, Boca Raton (2016)
Thota, C., Manogaran. G., Lopez, D., Vijayakumar, V.: Big data security framework for distributed cloud data centers. In: Cybersecurity Breaches and Issues Surrounding Online Threat Protection 2017, pp. 288–310. IGI Global
Lopez, D., Manogaran, G.: Modelling the H1N1 influenza using mathematical and neural network approaches. Biomed. Res. 28(8), 3711–3715 (2017)
Manogaran, G., Thota, C., Lopez, D., Sundarasekar, R.: Big data security intelligence for healthcare industry 4.0. In: Cybersecurity for Industry 4.0: Analysis for Design and Manufacturing, vol. 3, p. 103 (2017)
Manogaran, G., Lopez, D.: Spatial cumulative sum algorithm with big data analytics for climate change detection. Comput. Electr. Eng. (2017). doi:10.1016/j.compeleceng.2017.04.006
Anselin, L.: Local indicators of spatial association–LISA. Geogr. Anal. 27(2), 93–115 (1995)
Almeida, A.S., Medronho, R.D., Valencia, L.I.: Spatial analysis of dengue and the socioeconomic context of the city of Rio de Janeiro (Southeastern Brazil). Revista de Saúde Pública. 43(4), pp. 666–673 (2009)
Hu, W., Clements, A., Williams, G., Tong, S.: Spatial analysis of notified dengue fever infections. Epidemiol. Infect. 139(03), 391–399 (2011)
Fearn, T.: Gaussian process regression. NIR News 24(6), 23–24 (2013)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Manogaran, G., Lopez, D. A Gaussian process based big data processing framework in cluster computing environment. Cluster Comput 21, 189–204 (2018). https://doi.org/10.1007/s10586-017-0982-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-017-0982-5