Skip to main content
Log in

Data-driven modelling framework for streamflow prediction in a physio-climatically heterogeneous river basin

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

The prediction of streamflows is essential for efficient water resources management at basin scale. The present study examines the performance of model tree (MT) data-driven technique in predicting streamflows for an intermittent and a perennial river in a physio-climatically heterogeneous river basin. The correlation and mutual information analyses of predictor (hydrometeorological) variables are performed to determine the model input structure. Overall, seventy-two model configurations are formulated for each stream gauging station based on the combination of input variables, MT variants and variable lengths of calibration and validation datasets. The model simulation results are analysed by estimating a suite of statistical performance indices for each model configuration. The influence of parameter sensitivity on model performance is also assessed. The results indicate that selection of input variables play a governing role in capturing the interplay of hydrological processes in a catchment. The model outputs displayed more sensitivity to pruning than smoothing in MT, and minimal sensitivity towards data portioning, since the datasets were homoscedastic. The study also proposes a procedure for model evaluation considering multiple criteria, such as forecasting error, efficiency, predictability and false alarms, and enabling multi-model comparisons for better decision making. The proposed procedure was successfully applied for selection of best-fit model to predict one-day ahead streamflows at each stream gauging station.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Availability of data and material

The rainfall data, used in this study, were procured from India Meteorological Department (IMD), Pune, on payment basis. The streamflow and reservoir inflow data were procured from the government agencies, viz., Central Water Commission (CWC), Tapi Division, Surat, and Ukai Civil Circle, Ukai, Government of Gujarat, respectively. The authors do not have permission to share the data without the permission of aforesaid data disseminating agencies.

References

Download references

Acknowledgements

The first author thankfully acknowledges the financial support received from Department of Science and Technology (DST), Ministry of Science and Technology, Government of India, vide their letter no. DST/INSPIRE Fellowship/2015/IF150634 dated 11 January 2016. The authors appreciate the Centre of Excellence (CoE) on ‘Water Resources and Flood Management’, TEQIP-II, Ministry of Human Resources Development (MHRD) and INCCC-sponsored research project ‘Impact of Climate Change on Water Resources of Tapi Basin’, Ministry of Water Resources, River Development and Ganga Rejuvenation (MoWR,RD&GR), Government of India, for providing resourceful support in conducting the present study. The authors express sincere thanks to Central Water Commission (CWC), Tapi Division, Surat; India Meteorological Department (IMD), Pune; and Ukai Civil Circle, Ukai, Government of Gujarat, for providing essential data to conduct the reported study. The authors are thankful to the anonymous reviewers for their constructive suggestions in improving the quality of the manuscript.

Funding

The first author received financial support in the form of scholarship from Department of Science and Technology (DST), Ministry of Science and Technology, Government of India, for conducting the research work. The second author secured funding through Centre of Excellence (CoE) on ‘Water Resources and Flood Management’, TEQIP-II, Ministry of Human Resources Development (MHRD), in providing resource and infrastructural support in the form of data procurement and computing facilities. The second author also secured funding through INCCC-sponsored research project ‘Impact of Climate Change on Water Resources of Tapi Basin’, Ministry of Water Resources, River Development and Ganga Rejuvenation (MoWR,RD&GR), Government of India, for procurement of software tools.

Author information

Authors and Affiliations

Authors

Contributions

PJS, PLP, VJ helped in conceptualization; PJS helped in methodology; PJS formally analysed and investigated the study; PJS contributed to writing—original draft preparation; PJS, PLP, VJ contributed to writing—review and editing; PJS, PLP acquired the funding; PLP, VJ supervised the study.

Corresponding author

Correspondence to P. L. Patel.

Ethics declarations

Conflict of interest

The authors declare that they do not have any conflicting or conflict of interest.

Code availability

The streamflow modelling has been carried out using WEKA software (version 3.8), developed at the University of Waikato, New Zealand. The software is freely available for download at https://www.cs.waikato.ac.nz/ml/weka/. In addition to that, the MATLAB codes were developed for processing the model outputs and generating plots, which are available with the first author.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 277 kb)

Appendix A: Statistical performance indices

Appendix A: Statistical performance indices

1.1 Performance measures for prediction of individual values

1.1.1 Root-mean-square error (RMSE)

The root-mean-square error is a measure of the goodness-of-fit related to high flows. It is expressed as Eqn. (A1):

$$ {\text{RMSE}} = \sqrt {\frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {Q_{{{\text{obsi}}}} - Q_{{{\text{simi}}}} } \right)^{2} }}{n}} $$
(A1)

where \(Q_{{{\text{obs}}i}}\) and \(Q_{{{\text{sim}}i}}\) indicate observed and simulated streamflows, respectively, and n is the total number of observations. A lower value of RMSE represents good performance of the model (Karran et al. 2014). It has the same unit as the hydrologic variable under investigation.

1.1.2 Coefficient of determination (R2)

The coefficient of determination explains the collinearity between simulated and observed values. The value of R2 ranges from 0 to 1, with R2 = 1 showing perfect prediction ability. It is expressed as Eqn. (A2):

$$ R^{2} = \left( {\frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {Q_{{{\text{obs}}i}} - \overline{Q}_{{{\text{obs}}}} } \right) \cdot \left( {Q_{{{\text{sim}}i}} - \overline{Q}_{{{\text{sim}}}} } \right)}}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {Q_{{{\text{obs}}i}} - \overline{Q}_{{{\text{obs}}}} } \right)^{2} \cdot \mathop \sum \nolimits_{i = 1}^{n} \left( {Q_{{{\text{sim}}i}} - \overline{Q}_{{{\text{sim}}}} } \right)^{2} } }}} \right)^{2} $$
(A2)

where \(\overline{Q}_{{{\text{obs}}}}\) and \(\overline{Q}_{{{\text{sim}}}}\) denote mean observed and simulated streamflows.

1.1.3 Fractional standard error (FSE)

The FSE is obtained when RMSE is divided by corresponding mean of the observed time series. The FSE is considered as scalable measure of model precision (Karran et al. 2014). The model attains better precision as the value of FSE tends to zero. It can be expressed as Eqn. (A3):

$$ {\text{FSE}} = \frac{{{\text{RMSE}}}}{{\overline{Q}_{{{\text{obs}}}} }} $$
(A3)

1.1.4 Mean absolute error (MAE)

The mean absolute error measures the goodness-of-fit for moderate flows (Jothiprakash and Kote 2011). It is expressed as Eqn. (A4):

$$ {\text{MAE}} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left| {Q_{{{\text{obs}}i}} - Q_{{{\text{sim}}i}} } \right|}}{n} $$
(A4)

1.1.5 RMSE to standard deviation ratio (RSR)

The RMSE to standard deviation ratio (RSR) is estimated by dividing the RMSE by standard deviation of the observed data. Thus, RSR incorporates the model error and a scaling factor such that datasets with different characteristics can be compared. RSR varies in the range [0, ∞], where zero is the optimal value indicating perfect model simulation. It is given by Eqn. (A5) (Moriasi et al. 2007):

$$ {\text{RSR}} = \frac{{{\text{RMSE}}}}{{Q\sigma_{{{\text{obs}}}} }} $$
(A5)

where \(Q\sigma_{{{\text{obs}}}}\) denotes standard deviation of observed streamflow.

1.1.6 MAE to mean ratio (MMR)

Analogous to RSR, the MAE to mean ratio (MMR) is devised in this study for scaling the mean absolute error. It is estimated by dividing the MAE by mean of the observed data. MAE also varies in the range [0, ∞], where zero is the optimal value indicating perfect model simulation. It is given by Eqn. (A6):

$$ {\text{MMR}} = \frac{{{\text{MAE}}}}{{\overline{Q}_{{{\text{obs}}}} }} $$
(A6)

1.2 Performance measures for hydrologic interpretation

1.2.1 Nash–Sutcliffe efficiency (NSE)

The Nash–Sutcliffe efficiency assesses the predictive capability of any numerical or hydrological model which determines the relative magnitude of the residual vis-à-vis observed variance, thereby indicating the degree of agreement to which observed versus simulated data fits 1:1 line (Moriasi et al. 2007). The values of NSE are found to be in the range -∞ to 1. It is given by Eqn. (A7):

$$ {\text{NSE}} = 1 - \left\{ {\frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {Q_{{{\text{obs}}i}} - Q_{{{\text{sim}}i}} } \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{n} \left( {Q_{{{\text{obs}}i}} - \overline{Q}_{{{\text{obs}}}} } \right)^{2} }}} \right\} $$
(A7)

If NSE = 1, it shows perfect agreement between simulated and observed streamflows, NSE = 0 indicates that model predictions are same as mean value of observed streamflow, whereas -∞ < NSE < 0 occurs when the mean observed value is a better predictor than the model predicted value, which indicates unacceptable model performance (Moriasi et al. 2007).

1.2.2 Multiplicative bias (MB)

Multiplicative bias is a measure to assess whether the model overestimates (MB > 1) or underestimates (MB < 1) compared to the observed values, and MB = 1 indicates perfect model performance. It is expressed as Eqn. (A8):

$$ {\text{MB}} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} Q_{{{\text{sim}}i}} }}{{\mathop \sum \nolimits_{i = 1}^{n} Q_{{{\text{obs}}i}} }} $$
(A8)

1.2.3 Probability of detection (POD)

Probability of detection is based on the user-defined threshold, which tests the ability of model to predict streamflow peaks in relation to observed streamflows (Karran et al. 2014). In the present analysis, the threshold is set to 90th percentile of total daily monsoon streamflow during validation period for each station. The 90th percentile was chosen since it eliminates the events exhibiting larger periodicity. It is expressed using Eqn. (A9):

$$ {\text{POD}} = \frac{{{\text{Number}}\;{\text{of}}\;{\text{times}}\;\forall i\left( {Q_{{{\text{sim}}i}} \ge Q_{90} \left| {Q_{{{\text{obs}}i}} \ge Q_{90} } \right.} \right)}}{{Number{ }of{ }times{ }\forall i{ }\left( {Q_{{{\text{obs}}i}} \ge Q_{90} } \right)}} $$
(A9)

where Q90 is the 90th percentile of observed streamflow. The POD values range between 0 and 1, which express the percentage of times the model correctly predicts the events having discharge > Q90.

1.2.4 False alarm rate (FA)

The false alarm rate indicates percentage of times the model predicts events having discharge > Q90, when no such observation was actually recorded (Karran et al. 2014). It is given by Eqn. (A10):

$$ A = \frac{{{\text{Number}}\;{\text{of}}\;{\text{times}}\;\forall i \left( {Q_{{{\text{sim}}i}} \ge Q_{90} \left| {Q_{{{\text{obs}}i}} < Q_{90} } \right.} \right)}}{{{\text{Number}}\;{\text{of}}\;{\text{times}}\;\forall i (Q_{{{\text{obs}}i}} < Q_{90} )}} $$
(A10)

1.2.5 Mean absolute relative error (MARE)

In this study, the mean absolute relative error (MARE) is used to evaluate the relative errors in the model performance with reference to the peak flows (i.e. observed flows > Q99). The lower values of MARE are preferred; however, MARE would be zero for a perfect model. It is estimated using Eqn. (A11):

$$ {\text{MARE}} = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \frac{{\left| {Q_{{{\text{sim}}i}} - Q_{{{\text{obs}}i}} } \right|}}{{Q_{{{\text{obs}}i}} }}\;\forall Q_{i} > Q_{99} $$
(A11)

where Q99 is the 99th percentile of observed streamflow.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sharma, P.J., Patel, P.L. & Jothiprakash, V. Data-driven modelling framework for streamflow prediction in a physio-climatically heterogeneous river basin. Soft Comput 25, 5951–5978 (2021). https://doi.org/10.1007/s00500-021-05585-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-021-05585-9

Keywords

Navigation