Skip to main content
Log in

Predicting the concentration of sulfate using machine learning methods

  • Research Article
  • Published:
Earth Science Informatics Aims and scope Submit manuscript

Abstract

Continuous water monitoring is expensive and time consuming. Because it requires sampling information throughout 12 months and restricts the conduct of water aid management studies as well as the calibration and validation of excellent water models. To overcome this obstacle to better water quality management, improving water quality models is a necessary step. Various modelling strategies have been developed in recent years to improve the accuracy of predictions of major water parameters. In this work, for the prediction of raw water sulfate, we used five machine learning models were considered in this work: artificial neural network (ANN), support vector machine (SVM), Gaussian process regression (GPR), and decision tree (DT) and ensemble tree (ET). Moreover, the DT model was used to know the influence of the other physicochemical parameters (inputs) on the, and the ET model to improve the DT result and ensure the influence of the other physicochemical parameters on the sulfate. The experimental results indicate that all models were found to be effective in predicting sulfate levels, due to their very high correlation coefficients (close to 1) and very low statistical errors (close to 0); however, the most suitable water quality models were GPR and ANN, as their coefficients and statistical indicators do not show much difference between them. Indeed, the coefficients and the statistical indicators of the GPR model were R = 0.9991, R2 = 0.9982, R2adj = 0.9978, RMSE = 0.0182, MSE = 0, 0003, MAE = 0.0073 and EPM = 1.5386; while those of the ANN model were: R = 0.9989, R2 = 0.9978, R2adj = 0.9972, RMSE = 0.0124, MSE = 0.0001, MAE = 0.0083 and EPM = 2.0639. The only difference that favored the GPR model if compared to the ANN was the number of parameters, namely 70 parameters and a very weak loss, 3.3404e-04. In contrast, the ANN model was run with 190 parameters. The model tests (interpolation) confirmed this result, owing to the values of the the correlation coefficient (R = 0.99834) and the coefficient of determination (R2 = 0.9966), as well as that of statistical indicators (RMSE = 0.0309, MSE = 9.5219e-04, EPM = 3.0267 and MAE = 0.0122). In light of these results it can be concluded that the GPR model is the more efficient to predict sulfate in raw water. Additionally, its ability to deal with missing values, outliers, and the updating ability shows its relevance, which should be kept in the future. This efficiency seems to be due to the fact that the sulfate concentration in the raw water is linked to the physico-chemical characteristics of the environment by non-linear relationships. It is confirmed by a tree and ensemble model decision which provided information on how sulfate reacts with other physicochemical characteristics.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  • Afan HA, El-shafie A, Mohtar WHMW, Yaseen ZM (2016) Past, present and prospect of an artificial intelligence (AI) based model for sediment transport prediction. J Hydrol 541:902–913

    Article  Google Scholar 

  • Ahmadi MA, Soleimani R, Bahadori A (2014) A computational intelligence scheme for prediction equilibrium water dew point of natural gas in TEG dehydration systems. Fuel 137:145–154

    Article  Google Scholar 

  • Al-Anazi A, Gates ID (2010) A support vector machine algorithm to classify lithofacies and model permeability in heterogeneous reservoirs. Eng Geol 114:267–277

    Article  Google Scholar 

  • Bauer E, Kohavi R (1999) An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach Learn 36:105–139

    Article  Google Scholar 

  • Belsley DA, Kuh E, Welsch RE (1980) Identifying influential data and sources of collinearity. Regression Diagnostics

  • Bouhedda M, Lefnaoui S, Rebouh S, Yahoum MM (2019) Predictive model based on adaptive neuro-fuzzy inference system for estimation of cephalexin adsorption on the Octenyl succinic anhydride starch. Chemom Intell Lab Syst 193:103843. https://doi.org/10.1016/j.chemolab.2019.103843

    Article  Google Scholar 

  • Bousselma A, Abdessemed D, Tahraoui H, Amrane A (2021) Artificial intelligence and mathematical modelling of the drying kinetics of pre-treated whole apricots. Kemija u industriji: Časopis kemičara i kemijskih inženjera Hrvatske 70:651–667

    Google Scholar 

  • Chen W, Zheng R, Baade PD, Zhang S, Zeng H, Bray F, Jemal A, Yu XQ, He J (2016) Cancer statistics in China, 2015. CA Cancer J Clin 66:115–132

    Article  Google Scholar 

  • Dai J, Xu Q (2013) Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification. Appl Soft Comput 13:211–221

    Article  Google Scholar 

  • Debieche TH (2002) Evolution de la qualité des eaux (salinité, azote et métaux lourds) sous l’effet de la pollution saline, agricole et industrielle: application à la basse plaine de la Seybouse Nord-Est algérien. Besançon

  • Degrémont G (2005) Mémento technique de l’eau, Tome 1, 10éme édition, Edit. Tec et doc 3–38

  • Deng W, Wang G, Zhang X (2015) A novel hybrid water quality time series prediction method based on cloud model and fuzzy forecasting. Chemom Intell Lab Syst 149:39–49

    Article  Google Scholar 

  • Ding YR, Cai YJ, Sun PD, Chen B (2014) The use of combined neural networks and genetic algorithms for prediction of river water quality. Journal of applied research and technology 12:493–499

    Article  Google Scholar 

  • Dolling OR, Varas EA (2002) Artificial neural networks for streamflow prediction. J Hydraul Res 40:547–554

    Article  Google Scholar 

  • El Badaoui H, Abdallaoui A, Manssouri I, Lancelot L (2012) Elaboration de modèles mathématiques stochastiques pour la prédiction des teneurs en métaux lourds des eaux superficielles en utilisant les réseaux de neurones artificiels et la régression linéaire multiple. Journal of Hydrocarbons Mines and Environmental Research 3:31–36

    Google Scholar 

  • Faruk DÖ (2010) A hybrid neural network and ARIMA model for water quality time series prediction. Eng Appl Artif Intell 23:586–594

    Article  Google Scholar 

  • Galar M, Fernández A, Barrenechea E, Bustince H, Herrera F (2013) Dynamic classifier selection for one-vs-one strategy: avoiding non-competent classifiers. Pattern Recogn 46:3412–3424

    Article  Google Scholar 

  • Graindorge J, Landot É (2014) La qualité de l’eau potable: techniques et responsabilités. Territorial éd.

  • Guimarães D, Leão VA (2014) Batch and fixed-bed assessment of sulphate removal by the weak base ion exchange resin Amberlyst A21. J Hazard Mater 280:209–215

    Article  Google Scholar 

  • Haghiabi AH, Nasrolahi AH, Parsaie A (2018) Water quality prediction using machine learning methods. Water Quality Research Journal 53:3–13

    Article  Google Scholar 

  • Ho JY, Afan HA, El-Shafie AH et al (2019) Towards a time and cost effective approach to water quality index class prediction. J Hydrol 575:148–165. https://doi.org/10.1016/j.jhydrol.2019.05.016

    Article  Google Scholar 

  • Hong SH, Lee MW, Lee DS, Park JM (2007) Monitoring of sequencing batch reactor for nitrogen and phosphorus removal using neural networks. Biochem Eng J 35:365–370

    Article  Google Scholar 

  • Jamin D (2010) Recherche du boson de Higgs du Modèle Standard dans le canal de désintégration ZH-> nu nu bb sur le collisionneur Tevatron dans l’expérience D0. Développement d’une méthode d’étiquetage des jets de quark b avec des muons de basses impulsions transverses. Université de la Méditerranée-Aix-Marseille II

  • Kisi O, Ay M (2014) Comparison of Mann–Kendall and innovative trend method for water quality parameters of the Kizilirmak River, Turkey. J Hydrol 513:362–375

    Article  Google Scholar 

  • Kisi O, Sanikhani H, Zounemat-Kermani M, Niazi F (2015) Long-term monthly evapotranspiration modeling by several data-driven methods without climatic data. Comput Electron Agric 115:66–77

    Article  Google Scholar 

  • Koschmann C, Calinescu A-A, Nunez FJ et al (2016) ATRX loss promotes tumor growth and impairs nonhomologous end joining DNA repair in glioma. Science translational medicine 8:328ra28–328ra28

    Article  Google Scholar 

  • Lee C, Lee GG (2006) Information gain and divergence-based feature selection for machine learning-based text categorization. Inf Process Manag 42:155–165

    Article  Google Scholar 

  • Li S, Gu S, Liu W, Han H, Zhang Q (2008) Water quality in relation to land use and land cover in the upper Han River basin, China. Catena 75:216–222

    Article  Google Scholar 

  • Libera DA, Sankarasubramanian A (2018) Multivariate bias corrections of mechanistic water quality model predictions. J Hydrol 564:529–541. https://doi.org/10.1016/j.jhydrol.2018.07.043

    Article  Google Scholar 

  • Lukoševičius M, Jaeger H (2009) Reservoir computing approaches to recurrent neural network training. Computer Science Review 3:127–149

    Article  Google Scholar 

  • Manssouri I, El Hmaidi A, Manssouri TE, El Moumni B (2014) Prediction levels of heavy metals (Zn, cu and Mn) in current Holocene deposits of the eastern part of the Mediterranean Moroccan margin (Alboran Sea). IOSR Journal of Computer Engineering 16:117–123

    Article  Google Scholar 

  • Manssouri I, Manssouri M, El Kihel B (2011) Fault detection by k-NN algorithm and MLP neural networks in a distillation column: comparative study. Journal of Information, Intelligence and Knowledge 3:201

    Google Scholar 

  • Melesse AM, Ahmad S, McClain ME et al (2011) Suspended sediment load prediction of river systems: an artificial neural network approach. Agric Water Manag 98:855–866

    Article  Google Scholar 

  • Mendes-Moreira J, Soares C, Jorge AM, Sousa JFD (2012) Ensemble approaches for regression: a survey. Acm computing surveys (csur) 45:1–40

    Article  Google Scholar 

  • Najah A, El-Shafie A, Karim OA et al (2011) An application of different artificial intelligences techniques for water quality prediction. International Journal of Physical Sciences 6:5298–5308

    Google Scholar 

  • Noori N, Kalin L, Isik S (2020) Water quality prediction using SWAT-ANN coupled approach. J Hydrol 590:125220

  • Parsaie A, Emamgholizadeh S, Azamathulla HM, Haghiabi AH (2018) ANFIS-based PCA to predict the longitudinal dispersion coefficient in rivers. International Journal of Hydrology Science and Technology 8:410–424

    Article  Google Scholar 

  • Parsaie A, Haghiabi AH (2017) Computational modeling of pollution transmission in Rivers. Appl Water Sci 7:1213–1222. https://doi.org/10.1007/s13201-015-0319-6

    Article  Google Scholar 

  • Parsaie A, Haghiabi AH (2016) Numerical modeling of effect of dead zones on concentration profile of pollution in rivers. Water Sci Technol Water Supply 17:825–834

    Article  Google Scholar 

  • Parsaie A, Haghiabi AH, Moradinejad A (2019) Prediction of scour depth below river pipeline using support vector machine. KSCE J Civ Eng 23:2503–2513. https://doi.org/10.1007/s12205-019-1327-0

    Article  Google Scholar 

  • Qishlaqi A, Kordian S, Parsaie A (2017) Hydrochemical evaluation of river water quality—a case study. Appl Water Sci 7:2337–2342. https://doi.org/10.1007/s13201-016-0409-0

    Article  Google Scholar 

  • Rajaee T, Khani S, Ravansalar M (2020) Artificial intelligence-based single and hybrid models for prediction of water quality in rivers: A review. Chemometr Intell Lab Sys 200:103978

  • Rajaee T, Mirbagheri SA, Zounemat-Kermani M, Nourani V (2009) Daily suspended sediment concentration simulation using ANN and neuro-fuzzy models. Sci Total Environ 407:4916–4927

    Article  Google Scholar 

  • CKI RCW (2006) Gaussian processes for machine learning. Int J Neural Syst 14

  • Rodier J (1975) L’analyse de l’eau: eaux naturelles, euax résiduales, eaux de mer. Paris: Dunod

  • Rodier J, Legube B, Merlet N et al (2009) L’analyse de l’eau-9e éd. Eaux naturelles, eaux résiduaires, eau de mer Dunod 564–571

  • Runtti H, Tuomikoski S, Kangas T, Kuokkanen T, Rämö J, Lassi U (2016) Sulphate removal from water by carbon residue from biomass gasification: effect of chemical modification methods on Sulphate removal efficiency. BioResources 11:3136–3152

    Article  Google Scholar 

  • Tahraoui H, Belhadj A-E, Hamitouche A, Bouhedda M, Amrane A (2021a) Predicting the concentration of sulfate (SO42–) in drinking water using artificial neural networks: a case study: Médéa-Algeria. Desalin Water Treat 14:181–194

    Article  Google Scholar 

  • Tahraoui H, Belhadj AE, Hamitouche AE (2020) Prediction of the bicarbonate amount in drinking water in the region of Médéa using artificial neural network modelling. Kemija u industriji 69:595–602. https://doi.org/10.15255/KUI.2020.002

    Article  Google Scholar 

  • Tahraoui H, Belhadj A-E, Moula N, Bouranene S, Amrane A, JCE (2021b) Optimisation and prediction of the coagulant dose for the elimination of organic micropollutants based on turbidity. Kemija u industriji: Časopis kemičara i kemijskih inženjera Hrvatske 70:675–691

    Google Scholar 

  • Vapnik VN (1999) An overview of statistical learning theory. IEEE Transactions on Neural Networks 10:988–999

  • Yacef R, Mellit A, Belaid S, Şen Z (2014) New combined models for estimating daily global solar radiation from measured air temperature in semi-arid climates: application in Ghardaïa, Algeria. Energy Convers Manag 79:606–615

    Article  Google Scholar 

  • Yitzhaki S (1979) Relative deprivation and the Gini coefficient. Q J Econ 93:321–324

    Article  Google Scholar 

  • Yu H, Rezaee R, Wang Z et al (2017) A new method for TOC estimation in tight shale gas reservoirs. Int J Coal Geol 179:269–277

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hichem Tahraoui.

Additional information

Communicated by: H. Babaie

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

ESM 1

(DOCX 25 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tahraoui, H., Belhadj, AE., Amrane, A. et al. Predicting the concentration of sulfate using machine learning methods. Earth Sci Inform 15, 1023–1044 (2022). https://doi.org/10.1007/s12145-022-00785-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12145-022-00785-9

Keywords

Navigation