Predicting dengue outbreaks in Brazil with manifold learning on climate data

https://doi.org/10.1016/j.eswa.2021.116324Get rights and content

Highlights

  • Dengue outbreak prediction using climate data.

  • Signal processing and manifold learning for prediction problems.

  • Improved earlier prediction accuracy scores from 0.72 to 0.80.

Abstract

Tropical countries face urgent public health challenges regarding epidemic control of Dengue. Since effective vector-control efforts depend on the timing in which public policies take place, there is an enormous demand for accurate prediction tools. In this work, we improve upon a recent approach of coarsely predicting outbreaks in Brazilian urban centers based solely on their yearly climate data. Our methodological advancements encompass a judicious choice of data pre-processing steps and usage of modern computational techniques from signal-processing and manifold learning. Altogether, our results improved earlier prediction accuracy scores from 0.72 to 0.80, solidifying manifold learning on climate data alone as a viable way to make (coarse) dengue outbreak prediction in large urban centers. Ultimately, this approach has the potential of radically simplifying the data required to do outbreak analysis, as municipalities with limited public health funds may not monitor a large number of features needed for more extensive machine learning approaches.

Introduction

Dengue fever is a mosquito-borne viral infection that affects more than 100 countries every year (WHO et al., 2014). Forty percent of the world’s population ( 3 billion people) currently lives in areas of dengue risk (NCEZID, 2019). The dengue virus (DENV) is primarily transmitted to humans through the bite of the female Aedes Aegypti mosquito (Gubler, 1998), which also transmits the Zika, Chikungunya, and Yellow fevers. While most infections result in mild symptoms, severe Dengue (or Hemorrhagic Fever) cases can lead to circulatory collapse (shock), internal bleeding, and death (WHO et al., 1997).

Dengue outbreaks in major urban cities may result from a complex interplay of factors. The circulation of the four different viral strains (DENV 1–4) has historically accounted for the disease’s re-emergence in many countries (Teixeira, Costa, Barreto, & Barreto, 2009). Although infection with any serotype induces immunity to that particular serotype, there is no long-term cross-immunity protection, and the risk of severe symptoms increases in a secondary infection (Kalayanarooj et al., 2007, Pancharoen et al., 2001, Pawitan, 2011). Human mobility also affects dengue spread within the city since the mosquito’s lifetime movement range is typically less than one kilometer, and can also be infected upon biting infected humans (Adams and Kapan, 2009, Enduri and Jolad, 2018, Harrington et al., 2005, Stolerman et al., 2015). Finally, the abundance and vector efficiency of Aedes Aegypti mosquitoes is crucial for dengue spread and highly dependent on sanitary conditions (Caprara et al., 2009, Torres and Castro, 2007) and climate factors, such as temperature and precipitation (Hopp and Foley, 2001, Lana et al., 2018, Macoris et al., 1997).

The intricacies of Dengue epidemiology gave rise to numerous types of predictions, with scopes that may vary considerably from work to work. See Section 2 for a short survey of the recent literature regarding different methodologies, training features, and specific tools. Stolerman, Maia, and Kutz (2019) recently addressed the problem with a provocative approach, coarsely predicting Dengue outbreaks in select capitals of Brazil based solely on their yearly climate data. Their approach has the potential of radically simplifying the data required to do outbreak analysis, as developing countries with limited public health funds may not monitor a large number of features needed for more extensive machine learning approaches. This paper aims to improve upon their original methodology, leveraging modern signal processing and manifold learning techniques, and provides better interpretability of each pre-processing step.

This paper aims to improve upon their original methodology, leveraging modern signal processing and manifold learning techniques, and provides better interpretability of each pre-processing step.

The first opportunity identified in this work for improving the computational pipeline from Stolerman et al. (2019) was the addition of a feature generation (expanding transformation) step in the pre-processing phase. Instead of using only two features (mean temperature and precipitation rate), we applied a commonly-used signal processing technique known as Laplacian Pyramid (Burt & Adelson, 1983) to decompose the signals into different bands. The applications of this technique range from image compression (Burt & Adelson, 1983) to deep generative image models (Denton, Chintala, szlam, & Fergus, 2015), leveraging its ability to reconstruct a decomposed signal with high computational efficiency. In practice, a sole time series gave rise to multiple features, with each one capturing a different timescale of the original signal. In what follows, we will show that a judicious choice of these bands leads to better dengue-outbreak predictions than using the entirety of the original signal.

To avoid the curse of dimensionality and potential overfitting caused by the new features, we applied a manifold learning technique that has been little explored in dengue prediction; Diffusion maps (Coifman & Lafon, 2006a) is a nonlinear dimensionality reduction technique that aims to learn the underlying manifold from which the data has been sampled. By using it to improve the features’ low dimensional embedding, the original (and more challenging) classification problem became significantly more tractable. After undergoing an appreciable empirical effort to test different combinations of pre-processing steps, dimensionality reduction techniques, and classification methods made available in the Scikit-learn machine learning library, we manage to improve the average accuracy of Stolerman et al. (2019) from 0.72 to 0.80. This is an important proof-of-concept step towards solidifying manifold learning solely on climate data as a viable way to make (coarse) dengue outbreak prediction in large urban centers.

Our motivation for choosing this area and work is that Dengue is a recurrent problem in tropical countries, especially the less developed ones, including Brazil. Consequently, tools and applications that use cheap data to make coarse-scale predictions with good antecedence are much more likely to be adopted by public authorities. In the next Section 2 (Related Work), we elaborate on the fact that most works make predictions on a finer scale or use more costly data and have smaller antecedence. Those points make most of them harder to be implemented in a developing country or used on long-term countermeasures.

In the following Sections we survey recent related works and present our methods. Further presentation, validation, and discussion of our results are covered in Sections 4 Results, 5 Discussion.

Section snippets

Related work

In this Section, we survey twenty-nine articles that use machine learning tools to predict the occurrence of dengue epidemics and dengue-related variables in tropical areas of the world. In Table 1, we present a summary of the related works, see http://www.impa.br/~vitorgr/dengue/ for an interactive map with additional information about the studies. Despite being a tropical disease essentially, according to Salami, Sousa, Martins, and Capinha (2020), there are imports to other regions of the

Materials and methods

This section describes our dataset acquisition, followed by an overview of our pipeline and design choices. Subsequently, we present a concise formulation of our key steps focusing on its application.

Results

In this Section we present our results and try to justify the rationale behind our key methodological choices regarding: (i) augmenting the training set with noisy data, (ii) properly choosing an ε value for the diffusion map dimensionality reduction step, and (iii) finding an optimal time window for dengue outbreak prediction. For our results, we use the 11 first years for training and validation, the rest of the years(usually 4–5 depending on the city) are used for out-of-sample testing. The

Discussion

Tropical countries such as Brazil face urgent public health challenges regarding epidemic control of dengue. Effective vector-control efforts are crucially dependent on the timing in which public policies take place. Consequently, there is an enormous demand for accurate prediction tools for Dengue outbreaks to mitigate the societal-level burdens it causes. The complexity of Dengue epidemics, however, poses an enormous challenge to the broader scientific community.

In Section 2, we reviewed the

Conclusion

In this work, we target to make yearly classification from temporal series. This problem implies that our datasets are constrained by the small number of years available. Moreover, their daily measures may be inaccurate or missing a large number of entries. Another critical point is that we use an “indirect” factor to make predictions using cheap data. While the weather is essential, the direct factor in dengue is the mosquito’s biological factors, which are costly to measure. We intrinsically

CRediT authorship contribution statement

Caio Souza: Conceived and planned the overall framework, Planned and carried out the simulations, Sample preparation, Writing the manuscript, Provided critical feedback and helped shape the research, analysis and manuscript. Pedro Maia: Conceived and planned the overall framework, Writing the manuscript, Provided critical feedback and helped shape the research, analysis and manuscript. Lucas M. Stolerman: Conceived and planned the overall framework, Provided critical feedback and helped shape

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Funding

Caio Souza was supported by Capes with PICME scholarship.

References (57)

  • Epidemiological report—dengue fever (january to june, 2008)Technical report

    (2008)
  • Brazilian ministry of health. Promotion of national mobilization effort against aedes aegypti for 2017

    (2016)
  • BuczakA.L. et al.

    Prediction of high incidence of dengue in the Philippines

    PLoS Neglected Tropical Diseases

    (2014)
  • BuczakA.L. et al.

    Ensemble method for dengue prediction

    PLoS One

    (2018)
  • BuczakA.L. et al.

    A data-driven epidemiological prediction method for dengue outbreaks using local and remote sensing data

    BMC Medical Informatics and Decision Making

    (2012)
  • BurtP. et al.

    The Laplacian pyramid as a compact image code

    IEEE Transactions on Communications

    (1983)
  • CapraraA. et al.

    Irregular water supply, household usage and dengue: a bio-social study in the Brazilian northeast

    Cadernos de Saude Publica

    (2009)
  • DentonE.L. et al.

    Deep generative image models using a Laplacian pyramid of adversarial networks

  • GluskinR.T. et al.

    Evaluation of internet-based dengue query data: Google dengue trends

    PLoS Neglected Tropical Diseases

    (2014)
  • GublerD.J.

    Dengue and dengue hemorrhagic fever

    Clinical Microbiology Reviews

    (1998)
  • GuoP. et al.

    Developing a dengue forecast model using machine learning: A case study in China

    PLoS Neglected Tropical Diseases

    (2017)
  • HarringtonL.C. et al.

    Dispersal of the dengue vector aedes aegypti within and between rural communities

    The American Journal of Tropical Medicine and Hygiene

    (2005)
  • HiiY.L. et al.

    Forecast of dengue incidence using temperature and rainfall

    PLoS Neglected Tropical Diseases

    (2012)
  • HoppM.J. et al.

    Global-scale relationships between climate and the dengue fever vector, aedes aegypti

    Climatic Change

    (2001)
  • Brazilian national institute of meteorology. Temperature and precipitation time series

    (2020)
  • JainR. et al.

    Prediction of dengue outbreaks based on disease surveillance, meteorological and socio-economic data

    BMC Infectious Diseases

    (2019)
  • JohanssonM.A. et al.

    Evaluating the performance of infectious disease forecasts: A comparison of climate-driven and seasonal dengue forecasts for Mexico

    Scientific Reports

    (2016)
  • KalayanaroojS. et al.

    Blood group AB is associated with increased risk for severe dengue disease in secondary infections

    The Journal of Infectious Diseases

    (2007)
  • The code (and data) in this article has been certified as Reproducible by Code Ocean: (https://codeocean.com/). More information on the Reproducibility Badge Initiative is available at https://www.elsevier.com/physical-sciences-and-engineering/computer-science/journals.

    View full text