Abstract
The prediction of streamflows is essential for efficient water resources management at basin scale. The present study examines the performance of model tree (MT) data-driven technique in predicting streamflows for an intermittent and a perennial river in a physio-climatically heterogeneous river basin. The correlation and mutual information analyses of predictor (hydrometeorological) variables are performed to determine the model input structure. Overall, seventy-two model configurations are formulated for each stream gauging station based on the combination of input variables, MT variants and variable lengths of calibration and validation datasets. The model simulation results are analysed by estimating a suite of statistical performance indices for each model configuration. The influence of parameter sensitivity on model performance is also assessed. The results indicate that selection of input variables play a governing role in capturing the interplay of hydrological processes in a catchment. The model outputs displayed more sensitivity to pruning than smoothing in MT, and minimal sensitivity towards data portioning, since the datasets were homoscedastic. The study also proposes a procedure for model evaluation considering multiple criteria, such as forecasting error, efficiency, predictability and false alarms, and enabling multi-model comparisons for better decision making. The proposed procedure was successfully applied for selection of best-fit model to predict one-day ahead streamflows at each stream gauging station.
Similar content being viewed by others
Availability of data and material
The rainfall data, used in this study, were procured from India Meteorological Department (IMD), Pune, on payment basis. The streamflow and reservoir inflow data were procured from the government agencies, viz., Central Water Commission (CWC), Tapi Division, Surat, and Ukai Civil Circle, Ukai, Government of Gujarat, respectively. The authors do not have permission to share the data without the permission of aforesaid data disseminating agencies.
References
Arunkumar R, Jothiprakash V (2012) Reservoir evaporation prediction using data-driven techniques. J Hydrol Eng 18:40–49. https://doi.org/10.1061/(ASCE)HE.1943-5584.0000597
Bhattacharya B, Solomatine DP (2005) Neural networks and M5 model trees in modelling water level-discharge relationship. Neurocomp 63:381–396. https://doi.org/10.1016/j.neucom.2004.04.016
Bhattacharya B, Price RK, Solomatine DP (2007) Machine learning approach to modeling sediment transport. J Hydrol Eng 133:440–450. https://doi.org/10.1061/(ASCE)0733-9429(2007)133:4(440)
Delafrouz H, Ghaheri A, Ghorbani MA (2018) A novel hybrid neural network based on phase space reconstruction technique for daily river flow prediction. Soft Comp 22:2205–2215. https://doi.org/10.1007/s00500-016-2480-8
Eckhardt K (2005) How to construct recursive digital filters for baseflow separation. Hydrol Proc 19:507–515. https://doi.org/10.1002/hyp.5675
Esmaeilzadeh B, Sattari MT, Samadianfard S (2017) Performance evaluation of ANNs and an M5 model tree in Sattarkhan Reservoir inflow prediction. ISH J Hydraul Eng 23:283–292. https://doi.org/10.1080/09715010.2017.1308277
Galelli S, Castelletti A (2013) Tree-based iterative input variable selection for hydrological modeling. Water Resour Res 49:4295–4310. https://doi.org/10.1002/wrcr.20339
Garg V, Jothiprakash V (2013) Evaluation of reservoir sedimentation using data driven techniques. Appl Soft Comp 13:3567–3581. https://doi.org/10.1016/j.asoc.2013.04.019
Ghorbani MA, Deo RC, Kim S, Kashani MH, Karimi V, Izadkhah M (2020) Development and evaluation of the cascade correlation neural network and the random forest models for river stage and river flow prediction in Australia. Soft Comp 24:12079–12090. https://doi.org/10.1007/s00500-019-04648-2
Goyal MK (2014) Modeling of sediment yield prediction using M5 model tree algorithm and wavelet regression. Water Resour Manage 28:1991–2003. https://doi.org/10.1007/s11269-014-0590-6
Jothiprakash V, Kote AS (2010) Effect of pruning and smoothing while using M5 model tree technique for reservoir inflow prediction. J Hydrol Eng 16:563–574. https://doi.org/10.1061/(ASCE)HE.1943-5584.0000342
Jothiprakash V, Kote AS (2011) Improving the performance of data-driven techniques through data pre-processing for modelling daily reservoir inflow. Hydrol Sci J 56:168–186. https://doi.org/10.1080/02626667.2010.546358
Jung NC, Popescu I, Kelderman P, Solomatine DP, Price RK (2010) Application of model trees and other machine learning techniques for algal growth prediction in Yongdam reservoir, Republic of Korea. J Hydroinform 12(3):262–274. https://doi.org/10.2166/hydro.2009.004
Karran DJ, Morin E, Adamowski J (2014) Multi-step streamflow forecasting using data-driven non-linear methods in contrasting climate regimes. J Hydroinform 16:671–689. https://doi.org/10.2166/hydro.2013.042
Kashid SS, Ghosh S, Maity R (2010) Streamflow prediction using multi-site rainfall obtained from hydroclimatic teleconnection. J Hydrol 395:23–38. https://doi.org/10.1016/j.jhydrol.2010.10.004
Kennel MB, Abarbanel HD (2002) False neighbors and false strands: a reliable minimum embedding dimension algorithm. Phy Rev E 66:026209. https://doi.org/10.1103/PhysRevE.66.026209
Keshtegar B, Kisi O, Zounemat-Kermani M (2019) Polynomial chaos expansion and response surface method for nonlinear modelling of reference evapotranspiration. Hydrol Sci J 64:720–730. https://doi.org/10.1080/02626667.2019.1601727
Kisi O, Choubin B, Deo RC, Yaseen ZM (2019) Incorporating synoptic-scale climate signals for streamflow modelling over the Mediterranean region using machine learning models. Hydrol Sci J 64:1240–1252. https://doi.org/10.1080/02626667.2019.1632460
Lim KJ, Engel BA, Tang Z, Choi J, Kim KS, Muthukrishnan S, Tripathy D (2005) Automated web GIS based hydrograph analysis tool. WHAT J Am Water Resour Asso 41(6):1407–1416. https://doi.org/10.1111/j.1752-1688.2005.tb03808.x
Londhe SN, Narkhede S (2018) Forecasting stream flow using hybrid neuro-wavelet technique. ISH J Hydraul Eng 24:275–284. https://doi.org/10.1080/09715010.2017.1360158
Londhe S, Charhate S (2010) Comparison of data-driven modelling techniques for river flow forecasting. Hydrol Sci J 55:1163–1174. https://doi.org/10.1080/02626667.2010.512867
Mandal T, Jothiprakash V (2012) Short-term rainfall prediction using ANN and MT techniques. ISH J Hydraul Eng 18:20–26. https://doi.org/10.1080/09715010.2012.661629
Mehdizadeh S, Fathian F, Adamowski JF (2019) Hybrid artificial intelligence-time series models for monthly streamflow modeling. Appl Soft Comput 80:873–887. https://doi.org/10.1016/j.asoc.2019.03.046
Meshram SG, Ghorbani MA, Shamshirband S, Karimi V, Meshram C (2019) River flow prediction using hybrid PSOGSA algorithm based on feed-forward neural network. Soft Comput 23:10429–10438. https://doi.org/10.1007/s00500-018-3598-7
More D, Magar RB, Jothiprakash V (2019) Intermittent reservoir daily inflow prediction using stochastic and model tree techniques. J Inst Eng India Ser A 100:439–446. https://doi.org/10.1007/s40030-019-00368-w
Moriasi DN, Arnold JG, Van Liew MW, Bingner RL, Harmel RD, Veith TL (2007) Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Trans ASABE 50:885–900. https://doi.org/10.13031/2013.23153.
Moriasi DN, Gitau MW, Pai N, Daggupati P (2015) Hydrologic and water quality models: performance measures and evaluation criteria. Trans ASABE, 58(6): 1763–1785. https://doi.org/10.13031/trans.58.10715
Nalarajan NA, Mohandas C (2015) Groundwater level prediction using M5 model trees. J Inst Eng India Ser A 96:57–62. https://doi.org/10.1007/s40030-014-0093-8
Nourani V, Davanlou Tajbakhsh A, Molajou A, Gokcekus H (2019) Hybrid wavelet-M5 model tree for rainfall-runoff modeling. J Hydrol Eng 24(5):04019012. https://doi.org/10.1061/(ASCE)HE.1943-5584.0001777
Oyebode O, Otieno F, Adeyemo J (2014) Review of three data-driven modelling techniques for hydrological modelling and forecasting. Fresenius Environ Bull 23:1443–1454
Pal M, Singh NK, Tiwari NK (2012) M5 model tree for pier scour prediction using field dataset. KSCE J Civil Eng 16:1079–1084. https://doi.org/10.1007/s12205-012-1472-1
Quinlan JR (1992) Learning with continuous classes. In: Adams A, Sterling L (eds.) Proceedings of AI'92 fifth Australian joint conference on artificial intelligence, Singapore: World Scientific, pp 343–348.
Rezaie-Balf M, Zahmatkesh Z, Kim S (2017) Soft computing techniques for rainfall-runoff simulation: local non–parametric paradigm vs. model classification methods. Water Resour Manag 31(12):3843–3865. https://doi.org/10.1007/s11269-017-1711-9.
Rubel F, Kottek M (2010) Observed and projected climate shifts 1901–2100 depicted by world maps of the Köppen-Geiger climate classification. Meteorol Z 19:135–141. https://doi.org/10.1127/0941-2948/2010/0430
Rubel F, Brugger K, Haslinger K, Auer I (2017) The climate of the European Alps: shift of very high resolution Köppen-Geiger climate zones 1800–2100. Meteorol Z 26:115–125. https://doi.org/10.1127/metz/2016/0816
Senthil Kumar AR, Goyal MK, Ojha CSP, Singh RD, Swamee PK (2013) Application of artificial neural network, fuzzy logic and decision tree algorithms for modelling of streamflow at Kasol in India. Water Sci Technol 68:2521–2526. https://doi.org/10.2166/wst.2013.491
Sharma PJ, Patel PL, Jothiprakash V (2018) Assessment of variability in runoff coefficients and their linkages with physiographic and climatic characteristics of two contrasting catchments. J Water Clim Chang 10:464–483. https://doi.org/10.2166/wcc.2018.139
Sharma PJ, Patel PL, Jothiprakash V (2019) Impact of rainfall variability and anthropogenic activities on streamflow changes and water stress conditions across Tapi basin in India. Sci Tot Environ 687:885–897. https://doi.org/10.1016/j.scitotenv.2019.06.097
Shortridge JE, Guikema SD, Zaitchik BF (2016) Machine learning methods for empirical streamflow simulation: a comparison of model accuracy, interpretability, and uncertainty in seasonal watersheds. Hydrol Earth Syst Sci 20:2611–2628. https://doi.org/10.5194/hess-20-2611-2016
Solomatine DP (2006) Data‐driven modeling and computational intelligence methods in hydrology. In: Anderson M (ed) Encyclopedia of hydrological sciences, Wiley, New York. https://doi.org/10.1002/0470848944.hsa021.
Solomatine DP, Dulal KN (2003) Model trees as an alternative to neural networks in rainfall-runoff modelling. Hydrol Sci J 48:399–411. https://doi.org/10.1623/hysj.48.3.399.45291
Solomatine DP, Xue Y (2004) M5 model trees and neural networks: Application to flood forecasting in the upper reach of the Huai River in China. J Hydrol Eng 9:491–501. https://doi.org/10.1061/(ASCE)1084-0699(2004)9:6(491)
Tongal H, Booij MJ (2018) Simulation and forecasting of streamflows using machine learning models coupled with base flow separation. J Hydrol 564:266–282. https://doi.org/10.1016/j.jhydrol.2018.07.004
Vignesh R, Jothiprakash V, Sivakumar B (2015) Streamflow variability and classification using false nearest neighbor method. J Hydrol 531:706–715. https://doi.org/10.1016/j.jhydrol.2015.10.056
Vora A, Sharma PJ, Loliyana VD, Patel PL, Timbadiya PV (2018) Assessment and prioritization of flood protection levees along the lower Tapi River. India Nat Haz Rev 19:05018009. https://doi.org/10.1061/(ASCE)NH.1527-6996.0000310
Witten IH, Frank E (2005) Data mining: Practical machine learning tools and techniques. Morgan Kaufmann Publishers, San Francisco. https://doi.org/10.1016/C2009-0-19715-5
Yaseen ZM, Kisi O, Demir V (2016) Enhancing long-term streamflow forecasting and predicting using periodicity data component: application of artificial intelligence. Water Resour Manage 30:4125–4151. https://doi.org/10.1007/s11269-016-1408-5
Zhang Z, Hong WC (2019) Electric load forecasting by complete ensemble empirical mode decomposition adaptive noise and support vector regression with quantum-based dragonfly algorithm. Nonlinear Dyn 98(2):1107–1136. https://doi.org/10.1007/s11071-019-05252-7
Zhang Z, Hong WC, Li J (2020) Electric load forecasting by hybrid self-recurrent support vector regression model with variational mode decomposition and improved cuckoo search algorithm. IEEE Access 8:14642–14658. https://doi.org/10.1109/ACCESS.2020.2966712
Acknowledgements
The first author thankfully acknowledges the financial support received from Department of Science and Technology (DST), Ministry of Science and Technology, Government of India, vide their letter no. DST/INSPIRE Fellowship/2015/IF150634 dated 11 January 2016. The authors appreciate the Centre of Excellence (CoE) on ‘Water Resources and Flood Management’, TEQIP-II, Ministry of Human Resources Development (MHRD) and INCCC-sponsored research project ‘Impact of Climate Change on Water Resources of Tapi Basin’, Ministry of Water Resources, River Development and Ganga Rejuvenation (MoWR,RD&GR), Government of India, for providing resourceful support in conducting the present study. The authors express sincere thanks to Central Water Commission (CWC), Tapi Division, Surat; India Meteorological Department (IMD), Pune; and Ukai Civil Circle, Ukai, Government of Gujarat, for providing essential data to conduct the reported study. The authors are thankful to the anonymous reviewers for their constructive suggestions in improving the quality of the manuscript.
Funding
The first author received financial support in the form of scholarship from Department of Science and Technology (DST), Ministry of Science and Technology, Government of India, for conducting the research work. The second author secured funding through Centre of Excellence (CoE) on ‘Water Resources and Flood Management’, TEQIP-II, Ministry of Human Resources Development (MHRD), in providing resource and infrastructural support in the form of data procurement and computing facilities. The second author also secured funding through INCCC-sponsored research project ‘Impact of Climate Change on Water Resources of Tapi Basin’, Ministry of Water Resources, River Development and Ganga Rejuvenation (MoWR,RD&GR), Government of India, for procurement of software tools.
Author information
Authors and Affiliations
Contributions
PJS, PLP, VJ helped in conceptualization; PJS helped in methodology; PJS formally analysed and investigated the study; PJS contributed to writing—original draft preparation; PJS, PLP, VJ contributed to writing—review and editing; PJS, PLP acquired the funding; PLP, VJ supervised the study.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they do not have any conflicting or conflict of interest.
Code availability
The streamflow modelling has been carried out using WEKA software (version 3.8), developed at the University of Waikato, New Zealand. The software is freely available for download at https://www.cs.waikato.ac.nz/ml/weka/. In addition to that, the MATLAB codes were developed for processing the model outputs and generating plots, which are available with the first author.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Appendix A: Statistical performance indices
Appendix A: Statistical performance indices
1.1 Performance measures for prediction of individual values
1.1.1 Root-mean-square error (RMSE)
The root-mean-square error is a measure of the goodness-of-fit related to high flows. It is expressed as Eqn. (A1):
where \(Q_{{{\text{obs}}i}}\) and \(Q_{{{\text{sim}}i}}\) indicate observed and simulated streamflows, respectively, and n is the total number of observations. A lower value of RMSE represents good performance of the model (Karran et al. 2014). It has the same unit as the hydrologic variable under investigation.
1.1.2 Coefficient of determination (R2)
The coefficient of determination explains the collinearity between simulated and observed values. The value of R2 ranges from 0 to 1, with R2 = 1 showing perfect prediction ability. It is expressed as Eqn. (A2):
where \(\overline{Q}_{{{\text{obs}}}}\) and \(\overline{Q}_{{{\text{sim}}}}\) denote mean observed and simulated streamflows.
1.1.3 Fractional standard error (FSE)
The FSE is obtained when RMSE is divided by corresponding mean of the observed time series. The FSE is considered as scalable measure of model precision (Karran et al. 2014). The model attains better precision as the value of FSE tends to zero. It can be expressed as Eqn. (A3):
1.1.4 Mean absolute error (MAE)
The mean absolute error measures the goodness-of-fit for moderate flows (Jothiprakash and Kote 2011). It is expressed as Eqn. (A4):
1.1.5 RMSE to standard deviation ratio (RSR)
The RMSE to standard deviation ratio (RSR) is estimated by dividing the RMSE by standard deviation of the observed data. Thus, RSR incorporates the model error and a scaling factor such that datasets with different characteristics can be compared. RSR varies in the range [0, ∞], where zero is the optimal value indicating perfect model simulation. It is given by Eqn. (A5) (Moriasi et al. 2007):
where \(Q\sigma_{{{\text{obs}}}}\) denotes standard deviation of observed streamflow.
1.1.6 MAE to mean ratio (MMR)
Analogous to RSR, the MAE to mean ratio (MMR) is devised in this study for scaling the mean absolute error. It is estimated by dividing the MAE by mean of the observed data. MAE also varies in the range [0, ∞], where zero is the optimal value indicating perfect model simulation. It is given by Eqn. (A6):
1.2 Performance measures for hydrologic interpretation
1.2.1 Nash–Sutcliffe efficiency (NSE)
The Nash–Sutcliffe efficiency assesses the predictive capability of any numerical or hydrological model which determines the relative magnitude of the residual vis-à-vis observed variance, thereby indicating the degree of agreement to which observed versus simulated data fits 1:1 line (Moriasi et al. 2007). The values of NSE are found to be in the range -∞ to 1. It is given by Eqn. (A7):
If NSE = 1, it shows perfect agreement between simulated and observed streamflows, NSE = 0 indicates that model predictions are same as mean value of observed streamflow, whereas -∞ < NSE < 0 occurs when the mean observed value is a better predictor than the model predicted value, which indicates unacceptable model performance (Moriasi et al. 2007).
1.2.2 Multiplicative bias (MB)
Multiplicative bias is a measure to assess whether the model overestimates (MB > 1) or underestimates (MB < 1) compared to the observed values, and MB = 1 indicates perfect model performance. It is expressed as Eqn. (A8):
1.2.3 Probability of detection (POD)
Probability of detection is based on the user-defined threshold, which tests the ability of model to predict streamflow peaks in relation to observed streamflows (Karran et al. 2014). In the present analysis, the threshold is set to 90th percentile of total daily monsoon streamflow during validation period for each station. The 90th percentile was chosen since it eliminates the events exhibiting larger periodicity. It is expressed using Eqn. (A9):
where Q90 is the 90th percentile of observed streamflow. The POD values range between 0 and 1, which express the percentage of times the model correctly predicts the events having discharge > Q90.
1.2.4 False alarm rate (FA)
The false alarm rate indicates percentage of times the model predicts events having discharge > Q90, when no such observation was actually recorded (Karran et al. 2014). It is given by Eqn. (A10):
1.2.5 Mean absolute relative error (MARE)
In this study, the mean absolute relative error (MARE) is used to evaluate the relative errors in the model performance with reference to the peak flows (i.e. observed flows > Q99). The lower values of MARE are preferred; however, MARE would be zero for a perfect model. It is estimated using Eqn. (A11):
where Q99 is the 99th percentile of observed streamflow.
Rights and permissions
About this article
Cite this article
Sharma, P.J., Patel, P.L. & Jothiprakash, V. Data-driven modelling framework for streamflow prediction in a physio-climatically heterogeneous river basin. Soft Comput 25, 5951–5978 (2021). https://doi.org/10.1007/s00500-021-05585-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-021-05585-9