Bayesian framework of parameter sensitivity, uncertainty, and identifiability analysis in complex water quality models

https://doi.org/10.1016/j.envsoft.2018.03.001Get rights and content

Highlights

  • An efficient uncertainty analysis (UA) analytical framework was developed.

  • A multi-chain MCMC method with comprehensive GSA guarantees robustness.

  • Difficulties in parameter identification were attributed to cognitive limitations.

  • The framework provided model users with reminders for future forecasts.

  • The framework was helpful for model developers to identify model imperfection.

Abstract

An efficient Bayesian analytical framework was developed to address the challenges of uncertainty analysis and assess the parameter identification problems of complex water quality models with high-dimensional parameter space. The inclusion of a multi-chain Markov Chain Monte Carlo method and comprehensive global sensitive analysis (GSA) guarantees the results to be robust. A high-frequency synthetic data case study was conducted in the EFDC water quality module including 54 parameters. The comprehensive GSA identified 39 completely or partially sensitive parameters for reducing dimensionality, among which only nine were identifiable without significant bias. The fundamental causes of the parameter identification problem could be traced to the cognitive limitations of the real water quality assessment process instead of data scarcity. The framework is powerful for exploring these limitations, generating reminders for model users to use Bayesian estimates in future forecasts, and providing directions for model developers to perfect a model in future work.

Introduction

With rapid urbanization and economic development, water quality deterioration has become a global concern. Serious problems necessitate the prevention and control of water pollution, as well as aquatic ecosystem management. Water quality models are powerful mathematical tools for water quality assessment, pollution control, emergency preparedness and response, and aquatic environmental planning (Mirchi and Watkins, 2012; Melching et al., 2013; Xiao et al., 2015).

Water quality models are established based on an understanding of the relevant hydrodynamic, chemical, and biochemical pollutant migration and transformation processes in aquatic ecosystems, as well as the hypotheses on inaccessible behaviors (Beck, 1987; Melching et al., 1990; Walker et al., 2003; Lindenschmidt et al., 2007). These inaccessible but complicated behaviors are often parameterized based on different state variables (e.g., the Monod model uses two parameters to describe the microbial usage of BOD5). With deepening insight into such mechanisms and related processes, water quality models have become increasingly complex. On the one hand, non-monotonic and non-linear relationships among state variables have replaced the initial monotonic and linear relationships, yielding extensive local optima. On the other hand, dimensions of the parameter space have increased dramatically, and redundant relationships have simultaneously arisen (Refsgaard et al., 2006; Freni et al., 2011), causing extensive equifinality (i.e., multiple “optimal” parameter vectors that yield similar goodness-of-fit).

Parameter identifiability is the possibility of learning the true values of underlying parameters with an infinite experimental dataset (Raue et al., 2009). Parameter identification for complex water quality models is inevitably challenging and parameter true values are often not learned because of the increased computation cycles and the aforementioned factors. Against such serious parameter identification problems (Omlin et al., 2001; Müller et al., 2002; Brun et al., 2002; Raue et al., 2009), an efficient and robust uncertainty analysis (UA) will aid both model users and developers to assess parameter identifiability, and eventually determine the cognitive limitations of the real behaviors of an aquatic ecosystem. In turn, targeted control of model imperfections can mitigate the adverse impacts of non-identifiability and strengthen the reliability of simulation results, thereby preventing decision-making errors (Wagener and Kollat, 2007; Ascough et al., 2008; Jiang et al., 2017).

The Bayesian method represents a modern branch of UA techniques developed based on Bayes’ theorem (Eq. (1)). This method describes parameter uncertainty by deriving the posterior parameter distribution (π(Θ|x)) from a combination of prior parameter distribution (π(Θ)) and the likelihood function (p(x|Θ)), in which empirical knowledge (such as past research experience, previous comparable experiments, and even intuition or belief) and sampling information are encoded, respectively.π(Θ|x)π(Θ)p(x|Θ)

In general, the explicit functional form of the posterior distribution is unlikely to be derived analytically. Therefore, sampling is indispensable. The Markov Chain Monte Carlo (MCMC) method provides a series of efficient sampling algorithms to obtain the posterior parameter distribution (Tierney, 2015). Multi-chain MCMC methods, represented by the DiffeRential Evolution Adaptive Metropolis (DREAM) algorithm (Vrugt, 2016), are a substantially competitive branch. Multi-chain MCMC methods are based on a genetic algorithm integrated with the “population” concept. Each Markov chain uses a randomly sampled point from the prior parameter space as the initial population to initiate an evolution. Multiple Markov chains interact with one another to co-generate the transition kernel (Ter Braak, 2006; Ter Braak and Vrugt, 2008), different from single-chain MCMC methods. Hence, multi-chain MCMC methods have been used to improve sampling and searching capabilities, as well as to prevent premature convergence (i.e., being trapped in local optimum) in high-dimensional parameter spaces (Vrugt et al., 2009). These advantages resulted in the current popularity of multi-chain MCMC methods in hydrological research (Keating et al., 2010; He et al., 2011; Harrison et al., 2012; Joseph and Guillaume, 2013). However, relevant applications in complex water quality models still remain insufficient.

To further increase the efficiency of UA, dimensionality reduction is often initially required for complex water quality models. Sensitivity analysis (SA), which assesses the degree to which parameter uncertainty causes output variations, plays an important role in dimensionality reduction via parameter prioritization and fixing (Saltelli et al., 2008; Bilotta et al., 2012; Sun et al., 2012; Ganji et al., 2016). SA techniques are often categorized into local (LSA) and global sensitivity analysis (GSA) methods. LSA is a partial-derivative-based method to investigate the response of a small disturbance of each parameter around a specific location in parameter space on model output (Matott et al., 2009; Baroni and Tarantola, 2014). A common method of conducting LSA is to utilize the one-factor-at-a-time (OAT) method (Yang, 2011). Although LSA is computationally economical and popular (Luo and Zhang, 2009; Jia et al., 2015; Abdul-Aziz and Al-Amin, 2016), it is not suitable for reducing the dimensionality of complex water quality models, because of its location dependence as well as the lack of knowledge on the suitable location, i.e., the parameter true value.

GSA investigates the effect of the variations over the entire prior parameter space on the model output (Saltelli et al., 2008; Zhan and Zhang, 2013; Pianosi et al., 2016). GSA does not rely on a pre-known suitable location; thus, it overcomes the limitations of LSA. Common GSA methods can be classified into four categories: (i) variance-based methods, such as the Sobol's method; (ii) entropy-based methods, such as Kullback–Leibler (KL) entropy; (iii) derivative-based methods, such as the Morris screening method; and (iv) regression-based methods, such as standardized regression coefficients (SRC). Due to the ambiguous definition of “global sensitivity”, different GSA methods reveal different relationships between the parameters and model responses (Razavi and Gupta, 2015), which lead to varying results of GSA. Razavi and Gupta (2015, 2016) introduced important characteristics (i.e., local sensitivities and their global distribution, global distribution of model responses, and structural organization of the response's surface) to interpret global sensitivity and indicated that existing GSA methods (such as the Sobol's and Morris screening methods) only focus on one or a few of these characteristics while not considering the others. This indicates the strengths and weaknesses of a single GSA method. In addition to Razavi and Gupta (2015, 2016), several other researchers have also recommended comprehensive and complementary use of different GSA techniques for robust SA (Cloke et al., 2008; Pappenberger et al., 2008; Mishra et al., 2009; Neumann, 2012; Cosenza et al., 2013; Gamerith et al., 2013; Wainwright et al., 2013; Gan et al., 2014; Vanrolleghem et al., 2015; Sarrazin et al., 2016).

Peer-reviewed literature remains insufficient for proposing an efficient and robust UA parameter framework to assess the parameter identification problem (i.e., model imperfection) of complex water quality models. “Efficient” refers to the computational frugality of the UA process as well as the achievement of convergence with infinite iterations, which can be realized by reducing the dimensionality. “Robust” indicates that the framework will correctly evaluate whether the true value of a certain parameter can be learned. With this background, this study organically combines the comprehensive GSA, multi-chain MCMC method, and Bayesian estimation to establish a new analytical framework. With this framework, the Environmental Fluid Dynamics Code (EFDC) water quality module with a built-in case for the Lower Charles River Basin (Hamrick, 1992) is used as a synthetic data case study to assess the parameter identification problem caused by the cognitive limitations.

Section snippets

Analytical framework

The proposed analytical framework (Fig. 1) aims to address the challenge of UA, and to assess the rationality of the parameterized processes in complex water quality models.

At first, the prior distribution of each parameter should be specified by combining past research experience, previous comparable experiments, and intuition or belief, where subjective assumptions are sometimes inevitable. The purpose of using a comprehensive GSA on the entire prior parameter hyperspace is to classify the

Summary of results

Fig. 4 illustrates the parameter identification results after implementing the proposed analytical framework. The SIs of the Sobol's, Morris screening, and SRC methods were calculated following the methods of previous studies (Sobol, 2001; Morris, 1991; Saltelli et al., 2004). For the Sobol's and SRC methods, convergence was assumed after 5000 bootstrap samplings (Efron, 1981). The convergence of the Morris screening method was assumed when the parameter space was gridded into a 10054 hypercube

Conclusions

Considering parameter identification problems of complex water quality models derived from the cognitive limitations regarding water quality processes (i.e., model imperfections), a Bayesian analytical framework was developed based on the tri-variate relationships among sensitivity, uncertainty, and identifiability to assess parameter identifiability in a high-dimensional parameter space. Framework efficiency was achieved through dimensionality reduction (i.e., GSA). The involvement of

Acknowledgements

This research was supported by the National Water Pollution Control Special Project No. 2017ZX07205003, and Tsinghua Fudaoyuan Research Fund. The authors greatly thank the editors and reviewers for providing valuable suggestions on perfection of the methodology.

References (90)

  • Y. Gan et al.

    A comprehensive evaluation of various sensitivity analysis methods: a case study with a hydrological model

    Environ. Model. Software

    (2014)
  • A. Ganji et al.

    A modified Sobol’ sensitivity analysis method for decision-making in environmental problems

    Environ. Model. Software

    (2016)
  • M. He et al.

    Characterizing parameter sensitivity and uncertainty for a snow model across hydroclimatic regimes

    Adv. Water Resour

    (2011)
  • H. Jia et al.

    LID-BMPs planning for urban runoff control and the case study in China

    J. Environ. Manag.

    (2015)
  • Q. Jiang et al.

    Parameter uncertainty-based pattern identification and optimization for robust decision making on water shed load reduction

    J. Hydrol.

    (2017)
  • J.F. Joseph et al.

    Using a parallelized MCMC algorithm in R to identify appropriate likelihood functions for SWAT

    Environ. Model. Software

    (2013)
  • K. Lindenschmidt et al.

    Structural uncertainty in a river water quality modelling system

    Ecol. Model.

    (2007)
  • Y. Liu et al.

    Water quality modelling for load reduction under uncertainty: a Bayesian approach

    Water Res.

    (2008)
  • Y. Luo et al.

    Management-oriented sensitivity analysis for pesticide transport in watershed-scale water quality modelling using SWAT

    Environ. Pollut.

    (2009)
  • C.S. Melching et al.

    Modelling evaluation of integrated strategies to meet proposed dissolved oxygen standards for the Chicago waterway system

    J. Environ. Manag.

    (2013)
  • T.G. Müller et al.

    Parameter identification in dynamical models of anaerobic waste water treatment

    Math. Biosci.

    (2002)
  • M.B. Neumann

    Comparison of sensitivity analysis methods for pollutant degradation modelling: a case study from drinking water treatment

    Sci. Total Environ.

    (2012)
  • M. Omlin et al.

    Biogeochemical model of Lake Zürich: sensitivity, identifiability and uncertainty analysis

    Ecol. Model.

    (2001)
  • F. Pappenberger et al.

    Multi-method global sensitivity analysis of flood inundation models

    Adv. Water Resour.

    (2008)
  • K. Park et al.

    Three-dimensional hydrodynamic-eutrophication model (HEM-3D): application to Kwang-Yang Bay, Korea

    Mar. Environ. Res.

    (2005)
  • F. Pianosi et al.

    Sensitivity analysis of environmental models: a systematic review with practical workflow

    Environ. Model. Software

    (2016)
  • J.C. Refsgaard et al.

    A framework for dealing with uncertainty due to model structure error

    Adv. Water Resour.

    (2006)
  • F. Sarrazin et al.

    Global sensitivity analysis of environmental models: convergence and validation

    Environ. Model. Software

    (2016)
  • T. Smith et al.

    Modelling residual hydrologic errors with Bayesian inference

    J. Hydrol.

    (2015)
  • I.M. Sobol'

    Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates

    Math. Comput. Simulat.

    (2001)
  • X.Y. Sun et al.

    Three complementary methods for sensitivity analysis of a water quality model

    Environ. Model. Software

    (2012)
  • Y.T. Tang et al.

    Tools for investigating the prior distribution in Bayesian hydrology

    J. Hydrol.

    (2016)
  • P.A. Vanrolleghem et al.

    Global sensitivity analysis for urban water quality modelling: terminology, convergence and comparison of different methods

    J. Hydrol.

    (2015)
  • E. Vanuytrecht et al.

    Global sensitivity analysis of yield output from the water productivity model

    Environ. Model. Software

    (2014)
  • J.A. Vrugt

    Markov chain Monte Carlo simulation using the DREAM software package: theory, concepts, and MATLAB implementation

    Environ. Model. Software

    (2016)
  • T. Wagener et al.

    Numerical and visual evaluation of hydrological and environmental models using the Monte Carlo analysis toolbox

    Environ. Model. Software

    (2007)
  • H.M. Wainwright et al.

    Modelling the performance of large-scale CO2 storage systems: a comparison of different sensitivity analysis methods

    Int. J. Greenh. Gas Contr.

    (2013)
  • Y. Wan et al.

    Three dimensional water quality modelling of a shallow subtropical estuary

    Mar. Environ. Res.

    (2012)
  • Y. Wang et al.

    3-D hydro-environmental simulation of Miyun reservoir, Beijing

    J. Hydro Environ. Res.

    (2014)
  • G. Wu et al.

    Prediction of algal blooming using EFDC model: case study in the Daoxiang Lake

    Ecol. Model.

    (2011)
  • J. Yang

    Convergence and uncertainty analyses in Monte-Carlo based sensitivity analysis

    Environ. Model. Software

    (2011)
  • X. Yi et al.

    Global sensitivity analysis of a three-dimensional nutrients-algae dynamic model for a large shallow lake

    Ecol. Model.

    (2016)
  • Y. Zhan et al.

    Application of a combined sensitivity analysis approach on a pesticide environmental risk indicator

    Environ. Model. Software

    (2013)
  • Z. Zhu et al.

    Integrated urban hydrologic and hydraulic modelling in Chicago, Illinois

    Environ. Model. Software

    (2016)
  • O.I. Abdul-Aziz et al.

    Climate, land use and hydrologic sensitivities of stormwater quantity and quality in a complex coastal-urban watershed

    Urban Water J.

    (2016)
  • Cited by (0)

    View full text