Rivers, as one of the freshwater resources, are generally put in the state of jeopardy in terms of quantity and quality due to the development in industry, agriculture, and urbanization. Management of water quality is inextricably bound up with a reliable prediction of the Water Quality Index (WQI) for various purposes. In this way, an accurate estimation of WQI is one of the most challenging issues in the water quality studies of surface water resources. There is a board range of traditional methodologies for the WQI evaluation. Due to the intrinsic limitations of conventional models, Data-Driven Models (DDMs) have been frequently employed to assess the WQI for natural streams. In the present research, WQI values and their typical classifications were obtained by guidelines of the National Sanitation Foundation (NSF). Hence, four well-known DDMs such as Evolutionary Polynomial Regression (EPR), M5 Model Tree (MT), Gene-Expression Programming (GEP), and Multivariate Adaptive Regression Spline (MARS) are employed to predict WQI in Karun River. In this way, 12 Water Quality Parameters (i.e., Dissolved Oxygen, Chemical Oxygen Demand, Biochemical Oxygen Demand, Electrical Conductivity, Nitrate, Nitrite, Phosphate, Turbidity, pH, Calcium, Magnesium, and Sodium) were accumulated from nine hydrometry stations and additionally missing values of water temperature were extracted from images analysis of Landsat-7 ETM+. Furthermore, the Gamma Test (GT), Forward Selection (FS), Polynomial Chaotic Expression (PCE), and Principle Component Analysis (PCA) were used to reduce the volume of DDMs-feeding-input variables. Results of DDMs demonstrated that FS-M5 MT had the best performance for the estimation of WQI classification. WQI values for Karun River were assessed in the reliability-based probabilistic framework to consider the effect of any uncertainty and randomness in the input parameters. To this end, the Monte-Carlo scenario sampling technique was conducted to evaluate the limit state function from the DDMs-based-WQI formulation. Based on the qualitative description of the WQI, it was observed that the WQI of Karun River is classified into “Relatively Bad” quality. Moreover, based on the reliability analysis, there is only a 19% chance exists for a specimen from Karun River to have a better quality index.

- AI:
Artificial Intelligence
Adaptive Neuro-Fuzzy Inference System
- A:
Slope parameter returned as this normally includes useful data related to the complexity among input and output variables
- aij:
Weighting coefficients of principle components
- AL:
Offset which has a certain value for each band
- ANNs:
Artificial Neural Networks
Adaptive Piecewise Linear Regression
- b1, b2, b3, b4:
Weighting coefficients of multivariate linear equation by MT
- BFs:
Basis Functions
- BOD:
Biochemical Oxygen Demand
- C:
A closed bounded set
- c1, c2, c3,… c13:
Weighting coefficients of multivariate linear equation by MT
- Ca2+ :
Canadian Council of Ministers of the Environment
- CL:
Confidence Level
- COD:
Chemical Oxygen Demand
- DDMs:
Data-Driven Models
- DN:
Digital Number
- Dn:
Absolute difference between numerical and theoretical accumulative distribution associated with the parameter
- \(D_{n}^{u}\) :
Acceptable limit for the Dn
- DO:
Dissolved Oxygen
- DOsat:
Dissolved Oxygen in the saturated state
- e:
Model error known as the uncertainty parameter
- EC:
Electrical Conductivity
- EPR:
Evolutionary Polynomial Regression
- ET:
Expressions Tree
- ETM+ :
Enhanced Thematic Mapper Plus
- F0:
F ratio
- FC:
Fecal Coliform
First Order Reliability Method
First-Order Second Moment
- FS:
Forward Selection
- FX(r):
Theoretical cumulative distribution associated with r parameter
- G:
Green Spectral Band
- GA:
Genetic Algorithm
- GEP:
Gene-Expression Programming
Group Method of Data Handling
- Gn(r):
Numerical cumulative distribution associated with r parameter
- GP:
Genetic Programming
- GT:
Gamma Test
- GT0:
The intercept on the vertical axis (δ = 0)
- h:
A function for establishing a relationship among WQPs and WQI
- I:
Unit matrix
- IOA:
Index of Agreement
- k:
Number of the nearest neighbors
- k′:
Number of elements in input variables
- K1, K2:
Band-specific thermal conservation constant
- KMO:
- K-S:
The Kolmogorov–Smirnov
- LS:
Least Squares
- LSF:
Limit-State Function
- Lλ:
Top of Atmospheric Radiance
- M:
Maximum number of mathematical terms
- MAE:
Mean Absolute Error
Multivariate Adaptive Regression Spline
- Mg2+ :
- ML:
Gain coefficient
Minimum Mean Square Error
Multi-Objective Genetic Algorithm
- MSE:
Mean Squared Error
- MT:
Model Tree
- n:
Number of input variable
- n′:
Number of observations
- Na + :
Normalized Difference Water Index
- NH4:
- NIR:
Near Infra-Red Spectral Band
- \(NO_{3}^{ - }\) :
Nitrate Nitrogen
- NSF:
National Sanitation Foundation
- p:
Maximum number of input variables
- PCA:
Principle Component Analysis
- PCC:
Positive Coefficient of Correlation
- PCE:
Polynomial Chaotic Expression
- PE:
Probability of Exceedance
- pf:
Probability of Failure
- pH:
Potential of Hydrogen
- \(PO_{4}^{3 - }\) :
- Qcal:
Value of DN
- R:
Coefficient of correlation
- RAE:
Relative Absolute Error
- RE:
Relative Error
Root Mean Square Error
- ROI:
Region of Interest
Root Relative Squared Error
- s:
Number of basis functions
Second-Order Reliability Method
- SSE:
Sum Square Error
- SST:
Sea Surface Temperature
- SVM:
Support Vector Machine
- T:
- TB:
Brightness Temperature
- TH:
Total Hardness
- Tu:
- u:
Significance level
United State Geographical Survey
- VCM:
Variance–covariance matrix
- WCs:
Weighting Coefficients
- WQI:
Water Quality Index
Acceptable values of WQI
Measured values of WQI
- WQPs:
Water Quality Parameters
- WST:
Water Surface Temperature
- x:
Input vectors known as WQPs
- X1, X2, X3,… Xn:
Input variables associated with the limit state function
- y:
Output vector known as WQI
- α:
The significant level used in F test
- δ:
The function associated with the Euclidean distance
- δ′:
A collection of coefficients used in EPR formulation
- θ:
Input variables vector for a specific problem
- λ:
- μ(x):
Basis function
- π:
Overall formulation by EPR
- ρ:
Weighting coefficients used in formulation obtained by MARS model
- ϕ(x):
Formulation obtained by MARS model
- ϕ1, ϕ2, ϕ3:
Functions for establishing a relationship among WQPs
- ω:
User-defined-function with various mathematical structure
- \(\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle\thicksim}$}}{x}\) :
Vector of random variables
Najafzadeh, M., Homaei, F. & Farhadi, H. Reliability assessment of water quality index based on guidelines of national sanitation foundation in natural streams: integration of remote sensing and data-driven models. Artif Intell Rev 54, 4619–4651 (2021). https://doi.org/10.1007/s10462-021-10007-1
DOI: https://doi.org/10.1007/s10462-021-10007-1