Abstract
Diarrhoea (DH) disease pose significant threats to national morbidity and mortality in Vietnam, especially on children. Being a climate sensitive disease, it has strong links to various meteorological factors like rainfalls or temperatures. Hence, together with global climate changes, the risk of diarrhoea has been increasing gradually while Vietnam is already a hotspot of diarrhoea worldwide. Thus, having an effective early warning system is becoming an urgent need. However, it has not been paid enough attention with very few research works, mainly focusing on quantilizing the relationships among various climate factors and diarrhoea incidences. Exploring more sophisticated machine learning techniques is therefore an interesting work towards more efficient and effective warning systems. This paper consists of two main contributions. First, many different state-of-the-art prediction models from traditional to most recent advantaged methods, e.g., SARIMA, SARIMAX, LSTM, CNN, Xgboost, SVM, LightGBM, Catboost, LightGBM, N-HiST, BlockRNN, TCN, TFT, or Transformer, are studied for predicting DH rates for a large number of locations (55 provinces) with different climates, geographics and socio-economy factors. It provides a useful view on the overall performances of different ML models on the prediction task, which is extremely useful for other researchers when developing early-warning systems for DH in other places. Second, we introduce a novel ensemble prediction model, called dynamic weighted ensemble (DWE), for further improving the DH prediction performance. DWE is a two layer ensemble approach. The first generates different meta models based on four base component models. The second layer employs a novel approach to predict the performances of all selected meta models and uses these predicted results to dynamically combine these models in a weighted scheme to produce final results. This is totally different to traditional ensemble approaches which only rely on fixed combinations of their components. To the best of our knowledge, DWE is also the first ensemble approach for diarrhoea prediction. Extensive experiments are conducted over all 55 provinces of Vietnam to demonstrate the performance of DWE and to reveal its important characteristics.
















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
Available upon request.
Code availability
Available upon request.
Notes
The World Bank, Country Climate and Development Report for Vietnam. https://www.worldbank.org/en/country/vietnam/brief/key-highlights-country-climate-and-development-report-for-vietnam.
Germanwatch, Global Climate Risk Index 2020. https://www.germanwatch.org/en/17307.
United States Agency for International Development (USAID), Climate risk profile: Vietnam. https://www.climatelinks.org/countries/vietnam.
References
Abdullahi, T., & Nitschke, G. (2021). Predicting disease outbreaks with climate data. In 2021 IEEE congress on evolutionary computation (CEC) (pp. 989–996). IEEE.
Akiba, T., Sano, S., Yanase, T., Ohta, T., & Koyama, M. (2019). Optuna: A next-generation hyperparameter optimization framework. In KDD (pp. 2623–2631).
Ali, M., Kim, D. R., Yunus, M., & Emch, M. (2013). Time series analysis of cholera in matlab, Bangladesh, during 1988–2001. Journal of Health, Population and Nutrition, 31(1), 11.
Anders, K. L., Thompson, C. N., Van Thuy, N. T., Nguyet, N. M., Dung, T. T. N., Phat, V. V., Van, N. T. H., Hieu, N. T., Tham, N. T. H., Ha, P. T. T., et al. (2015). The epidemiology and aetiology of diarrhoeal disease in infancy in southern Vietnam: a birth cohort study. International Journal of Infectious Diseases, 35, 3–10.
Bai, S., Kolter, J.Z., & Koltun, V. (2018). An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271
Brady, O. J., Smith, D. L., Scott, T. W., & Hay, S. I. (2015). Dengue disease outbreak definitions are implicitly variable. Epidemics, 11, 92–102.
Censi, A. M., Ienco, D., Gbodjo, Y. J. E., Pensa, R. G., Interdonato, R., & Gaetano, R. (2021). Attentive spatial temporal graph CNN for land cover mapping from multi temporal remote sensing data. IEEE Access, 9, 23070–23082.
Challu, C., Olivares, K.G., Oreshkin, B.N., Garza, F., Mergenthaler, M., & Dubrawski, A. (2022). N-hits: Neural hierarchical interpolation for time series forecasting. arXiv preprint arXiv:2201.12886
Chen, H., Wang, T., Zhang, Y., Bai, Y., & Chen, X. (2023). Dynamic weighted ensemble of geoscientific models via automated machine learning-based classification. EGUsphere (pp. 1–26).
Cheng, J., Bambrick, H., Yakob, L., Devine, G., Frentiu, F. D., Toan, D. T. T., Thai, P. Q., Xu, Z., & Hu, W. (2020). Heatwaves and dengue outbreaks in Hanoi, Vietnam: New evidence on early warning. PLoS Neglected Tropical Diseases, 14(1), e0007997.
Colón-González, F. J., Soares Bastos, L., Hofmann, B., Hopkin, A., Harpham, Q., Crocker, T., Amato, R., Ferrario, I., Moschini, F., James, S., et al. (2021). Probabilistic seasonal dengue forecasting in Vietnam: A modelling study using superensembles. PLoS Medicine, 18(3), e1003542.
Dorogush, A.V., Ershov, V., & Gulin, A. (2018). Catboost: Gradient boosting with categorical features support. arXiv preprint arXiv:1810.11363
D’souza, R., Hall, G., & Becker, N. (2008). Climatic factors associated with hospitalizations for rotavirus diarrhoea in children under 5 years of age. Epidemiology & Infection, 136(1), 56–64.
Fang, X., Liu, W., Ai, J., He, M., Wu, Y., Shi, Y., Shen, W., & Bao, C. (2020). Forecasting incidence of infectious diarrhea using random forest in Jiangsu province, China. BMC Infectious Diseases, 20(1), 1–8.
Huyen, D. T. T., Hong, D. T., Trung, N. T., Hoa, T. T. N., Oanh, N. K., Thang, H. V., Thao, N. T. T., Iijima, M., et al. (2018). Epidemiology of acute diarrhea caused by rotavirus in sentinel surveillance sites of Vietnam, 2012–2015. Vaccine, 36(51), 7894–7900.
Kam, H., Choi, S., Cho, J., Min, Y., & Park, R. (2010). Acute diarrheal syndromic surveillance. Applied Clinical Informatics, 1(02), 79–95.
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T.Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems 30.
Li, K., Liu, W., Zhao, K., Shao, M., & Liu, L. (2015). A novel dynamic weight neural network ensemble model. International Journal of Distributed Sensor Networks, 11(8), 862056.
Lim, B., Arık, S. Ö., Loeff, N., & Pfister, T. (2021). Temporal fusion transformers for interpretable multi-horizon time series forecasting. Int. J. Forecast., 37(4), 1748–1764.
Mai, S.T., Phi, H.T., Abubakar, A., Kilpatrick, P., Nguyen, H.Q.V., & Vandierendonck, H. (2022) Dengue fever: From extreme climates to outbreak prediction. In ICDM.
McGough, S. F., Clemente, L., Kutz, J. N., & Santillana, M. (2021). A dynamic, ensemble learning approach to forecast dengue fever epidemic years in brazil using weather and population susceptibility cycles. Journal of the Royal Society Interface, 18(179), 20201006.
Naga, A.S., & Banerjee, S. (2020). Stock market forecasting using deep learning neural network. International Journal for Research in Engineering and Emerging Trends (IJ REET) 5.
Nguyen, T. V., Le Van, P., Le Huy, C., Gia, K. N., & Weintraub, A. (2006). Etiology and epidemiology of diarrhea in children in Hanoi, Vietnam. International Journal of Infectious Diseases, 10(4), 298–308.
Nguyen, V. H., Tuyet-Hanh, T. T., Mulhall, J., Minh, H. V., Duong, T. Q., & Chien, N. V. (2022). Deep learning models for forecasting dengue fever based on climate data in Vietnam. PLoS Neglected Tropical Diseases, 16, e0010509.
Onozuka, D., & Hashizume, M. (2011). Weather variability and paediatric infectious gastroenteritis. Epidemiology & Infection, 139(9), 1369–1378.
Oreshkin, B.N., Carpov, D., Chapados, N., & Bengio, Y. (2019). N-beats: Neural basis expansion analysis for interpretable time series forecasting. arXiv preprint arXiv:1905.10437
World Health Organization (2014). Quantitative risk assessment of the effects of climate change on selected causes of death, 2030s and 2050s. World Health Organization.
Pangestu, C. J., Piantari, E., & Munir, M. (2020). Prediction of diarrhea sufferers in bandung with seasonal autoregressive integrated moving average (SARIMA). Journal of Computers for Society, 1(1), 61–79.
Phung, C., Dung, C., Rutherford, S., Nguyen, H. L. T., Luong, M. A., Do, C. M., & Huang, C. (2017). Heavy rainfall and risk of infectious intestinal diseases in the most populous city in Vietnam. Science of The Total Environment, 580, 805–812.
Phung, D., Huang, C., Rutherford, S., Chu, C., Wang, X., Nguyen, M., Nguyen, N., Do, C., & Nguyen, T. (2015). Temporal and spatial patterns of diarrhoea in the Mekong delta area, Vietnam. Epidemiology & Infection, 143(16), 3488–3497.
Phung, D., Huang, C., Rutherford, S., Chu, C., Wang, X., Nguyen, M., Nguyen, N. H., Manh, C. D., & Nguyen, T. H. (2015). Association between climate factors and diarrhoea in a Mekong delta area. International Journal of Biometeorology, 59(9), 1321–1331.
Phung, D., Nguyen, H. X., Nguyen, H. L. T., Luong, A. M., Do, C. M., Tran, Q. D., & Chu, C. (2018). The effects of socioecological factors on variation of communicable diseases: A multiple-disease study at the national scale of vietnam. PloS One, 13(3), e0193246.
Ren, F., Li, Y., & Hu, M. (2018). Multi-classifier ensemble based on dynamic weights. Multimedia Tools and Applications, 77, 21083–21107.
Sahai, A., Mandal, R., Joseph, S., Saha, S., Awate, P., Dutta, S., Dey, A., Chattopadhyay, R., et al. (2020). Development of a probabilistic early health warning system based on meteorological parameters. Scientific Reports, 10(1), 1–13.
Thompson, C. N., Phan, M. V., Hoang, N. V. M., Minh, P. V., Vinh, N. T., Thuy, C. T., Nga, T. T. T., Rabaa, M. A., Duy, P. T., Dung, T. T. N., et al. (2015). A prospective multi-center observational study of children hospitalized with diarrhea in Ho Chi Minh city, Vietnam. The American Journal of Tropical Medicine and Hygiene, 92(5), 1045–1052.
Thompson, C. N., Zelner, J. L., Nhu, T. D. H., Phan, M. V., Le, P. H., Thanh, H. N., Thuy, D. V., Nguyen, N. M., Manh, T. H., Minh, T. V. H., et al. (2015). The impact of environmental and climatic variation on the spatiotemporal trends of hospitalized pediatric diarrhea in ho chi Minh city, Vietnam. Health & place, 35, 147–154.
Troeger, C., Blacker, B. F., Khalil, I. A., Rao, P. C., Cao, S., Zimsen, S. R., Albertson, S. B., Stanaway, J. D., Deshpande, A., Abebe, Z., et al. (2018). Estimates of the global, regional, and national morbidity, mortality, and aetiologies of diarrhoea in 195 countries: a systematic analysis for the global burden of disease study 2016. The Lancet Infectious Diseases, 18(11), 1211–1228.
Wang, Y., & Gu, J. (2014) Comparative study among three different artificial neural networks to infectious diarrhea forecasting. In BIBM (pp. 40–46).
Wang, Y., Li, J., Gu, J., Zhou, Z., & Wang, Z. (2015). Artificial neural networks for infectious diarrhea prediction using meteorological factors in Shanghai (China). Applied Soft Computing, 35, 280–290.
Wangdi, K., & Clements, A. C. (2017). Spatial and temporal patterns of diarrhoea in Bhutan 2003–2013. BMC Infectious Diseases, 17(1), 1–9.
Yang, X., Xiong, W., Huang, T., & He, J. (2021). Meteorological and social conditions contribute to infectious diarrhea in china. Scientific Reports, 11(1), 1–13.
Acknowledgements
This research is funded by Vietnam National University HoChiMinh City (VNU-HCM) under Grant Number DS2022-26-03.
Author information
Authors and Affiliations
Contributions
TDD, TDN, VCT, and STM develop core algorithms and perform experiments. THTT, DP and DTA perform data collection, preprocessing and perform experiments on some traditional models. TDN and STM supervise the project. All the authors participate on paper writing and project discussion.
Corresponding author
Ethics declarations
Conflicts of interest
Not applicable.
Ethics approval
Not applicable.
Consent to participate
TDD, TDN, VCT, STM, THTT, DP and DTA agree to participate.
Consent for publication
TDD, TDN, VCT, STM, THTT, DP and DTA agree that their individual’s data and image are published.1
Additional information
Editors: Dino Ienco, Robert Interdonato, Pascal Poncelet.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Figure 17 shows monthly averaged DH rate and climate factors (from Jan to Dec) for all provinces. Over the whole country, the peak DH rates fall into Mar to September, when rainfall and temperature are both higher.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Do, T.D., Nguyen, T.D., Ta, V.C. et al. Dynamic weighted ensemble for diarrhoea incidence predictions. Mach Learn 113, 2129–2152 (2024). https://doi.org/10.1007/s10994-023-06465-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-023-06465-z