Bayesian framework of parameter sensitivity, uncertainty, and identifiability analysis in complex water quality models
Introduction
With rapid urbanization and economic development, water quality deterioration has become a global concern. Serious problems necessitate the prevention and control of water pollution, as well as aquatic ecosystem management. Water quality models are powerful mathematical tools for water quality assessment, pollution control, emergency preparedness and response, and aquatic environmental planning (Mirchi and Watkins, 2012; Melching et al., 2013; Xiao et al., 2015).
Water quality models are established based on an understanding of the relevant hydrodynamic, chemical, and biochemical pollutant migration and transformation processes in aquatic ecosystems, as well as the hypotheses on inaccessible behaviors (Beck, 1987; Melching et al., 1990; Walker et al., 2003; Lindenschmidt et al., 2007). These inaccessible but complicated behaviors are often parameterized based on different state variables (e.g., the Monod model uses two parameters to describe the microbial usage of BOD5). With deepening insight into such mechanisms and related processes, water quality models have become increasingly complex. On the one hand, non-monotonic and non-linear relationships among state variables have replaced the initial monotonic and linear relationships, yielding extensive local optima. On the other hand, dimensions of the parameter space have increased dramatically, and redundant relationships have simultaneously arisen (Refsgaard et al., 2006; Freni et al., 2011), causing extensive equifinality (i.e., multiple “optimal” parameter vectors that yield similar goodness-of-fit).
Parameter identifiability is the possibility of learning the true values of underlying parameters with an infinite experimental dataset (Raue et al., 2009). Parameter identification for complex water quality models is inevitably challenging and parameter true values are often not learned because of the increased computation cycles and the aforementioned factors. Against such serious parameter identification problems (Omlin et al., 2001; Müller et al., 2002; Brun et al., 2002; Raue et al., 2009), an efficient and robust uncertainty analysis (UA) will aid both model users and developers to assess parameter identifiability, and eventually determine the cognitive limitations of the real behaviors of an aquatic ecosystem. In turn, targeted control of model imperfections can mitigate the adverse impacts of non-identifiability and strengthen the reliability of simulation results, thereby preventing decision-making errors (Wagener and Kollat, 2007; Ascough et al., 2008; Jiang et al., 2017).
The Bayesian method represents a modern branch of UA techniques developed based on Bayes’ theorem (Eq. (1)). This method describes parameter uncertainty by deriving the posterior parameter distribution () from a combination of prior parameter distribution () and the likelihood function (), in which empirical knowledge (such as past research experience, previous comparable experiments, and even intuition or belief) and sampling information are encoded, respectively.
In general, the explicit functional form of the posterior distribution is unlikely to be derived analytically. Therefore, sampling is indispensable. The Markov Chain Monte Carlo (MCMC) method provides a series of efficient sampling algorithms to obtain the posterior parameter distribution (Tierney, 2015). Multi-chain MCMC methods, represented by the DiffeRential Evolution Adaptive Metropolis (DREAM) algorithm (Vrugt, 2016), are a substantially competitive branch. Multi-chain MCMC methods are based on a genetic algorithm integrated with the “population” concept. Each Markov chain uses a randomly sampled point from the prior parameter space as the initial population to initiate an evolution. Multiple Markov chains interact with one another to co-generate the transition kernel (Ter Braak, 2006; Ter Braak and Vrugt, 2008), different from single-chain MCMC methods. Hence, multi-chain MCMC methods have been used to improve sampling and searching capabilities, as well as to prevent premature convergence (i.e., being trapped in local optimum) in high-dimensional parameter spaces (Vrugt et al., 2009). These advantages resulted in the current popularity of multi-chain MCMC methods in hydrological research (Keating et al., 2010; He et al., 2011; Harrison et al., 2012; Joseph and Guillaume, 2013). However, relevant applications in complex water quality models still remain insufficient.
To further increase the efficiency of UA, dimensionality reduction is often initially required for complex water quality models. Sensitivity analysis (SA), which assesses the degree to which parameter uncertainty causes output variations, plays an important role in dimensionality reduction via parameter prioritization and fixing (Saltelli et al., 2008; Bilotta et al., 2012; Sun et al., 2012; Ganji et al., 2016). SA techniques are often categorized into local (LSA) and global sensitivity analysis (GSA) methods. LSA is a partial-derivative-based method to investigate the response of a small disturbance of each parameter around a specific location in parameter space on model output (Matott et al., 2009; Baroni and Tarantola, 2014). A common method of conducting LSA is to utilize the one-factor-at-a-time (OAT) method (Yang, 2011). Although LSA is computationally economical and popular (Luo and Zhang, 2009; Jia et al., 2015; Abdul-Aziz and Al-Amin, 2016), it is not suitable for reducing the dimensionality of complex water quality models, because of its location dependence as well as the lack of knowledge on the suitable location, i.e., the parameter true value.
GSA investigates the effect of the variations over the entire prior parameter space on the model output (Saltelli et al., 2008; Zhan and Zhang, 2013; Pianosi et al., 2016). GSA does not rely on a pre-known suitable location; thus, it overcomes the limitations of LSA. Common GSA methods can be classified into four categories: (i) variance-based methods, such as the Sobol's method; (ii) entropy-based methods, such as Kullback–Leibler (KL) entropy; (iii) derivative-based methods, such as the Morris screening method; and (iv) regression-based methods, such as standardized regression coefficients (SRC). Due to the ambiguous definition of “global sensitivity”, different GSA methods reveal different relationships between the parameters and model responses (Razavi and Gupta, 2015), which lead to varying results of GSA. Razavi and Gupta (2015, 2016) introduced important characteristics (i.e., local sensitivities and their global distribution, global distribution of model responses, and structural organization of the response's surface) to interpret global sensitivity and indicated that existing GSA methods (such as the Sobol's and Morris screening methods) only focus on one or a few of these characteristics while not considering the others. This indicates the strengths and weaknesses of a single GSA method. In addition to Razavi and Gupta (2015, 2016), several other researchers have also recommended comprehensive and complementary use of different GSA techniques for robust SA (Cloke et al., 2008; Pappenberger et al., 2008; Mishra et al., 2009; Neumann, 2012; Cosenza et al., 2013; Gamerith et al., 2013; Wainwright et al., 2013; Gan et al., 2014; Vanrolleghem et al., 2015; Sarrazin et al., 2016).
Peer-reviewed literature remains insufficient for proposing an efficient and robust UA parameter framework to assess the parameter identification problem (i.e., model imperfection) of complex water quality models. “Efficient” refers to the computational frugality of the UA process as well as the achievement of convergence with infinite iterations, which can be realized by reducing the dimensionality. “Robust” indicates that the framework will correctly evaluate whether the true value of a certain parameter can be learned. With this background, this study organically combines the comprehensive GSA, multi-chain MCMC method, and Bayesian estimation to establish a new analytical framework. With this framework, the Environmental Fluid Dynamics Code (EFDC) water quality module with a built-in case for the Lower Charles River Basin (Hamrick, 1992) is used as a synthetic data case study to assess the parameter identification problem caused by the cognitive limitations.
Section snippets
Analytical framework
The proposed analytical framework (Fig. 1) aims to address the challenge of UA, and to assess the rationality of the parameterized processes in complex water quality models.
At first, the prior distribution of each parameter should be specified by combining past research experience, previous comparable experiments, and intuition or belief, where subjective assumptions are sometimes inevitable. The purpose of using a comprehensive GSA on the entire prior parameter hyperspace is to classify the
Summary of results
Fig. 4 illustrates the parameter identification results after implementing the proposed analytical framework. The SIs of the Sobol's, Morris screening, and SRC methods were calculated following the methods of previous studies (Sobol, 2001; Morris, 1991; Saltelli et al., 2004). For the Sobol's and SRC methods, convergence was assumed after 5000 bootstrap samplings (Efron, 1981). The convergence of the Morris screening method was assumed when the parameter space was gridded into a 10054 hypercube
Conclusions
Considering parameter identification problems of complex water quality models derived from the cognitive limitations regarding water quality processes (i.e., model imperfections), a Bayesian analytical framework was developed based on the tri-variate relationships among sensitivity, uncertainty, and identifiability to assess parameter identifiability in a high-dimensional parameter space. Framework efficiency was achieved through dimensionality reduction (i.e., GSA). The involvement of
Acknowledgements
This research was supported by the National Water Pollution Control Special Project No. 2017ZX07205003, and Tsinghua Fudaoyuan Research Fund. The authors greatly thank the editors and reviewers for providing valuable suggestions on perfection of the methodology.
References (90)
- et al.
Future research challenges for incorporation of uncertainty in environmental and ecological decision-making
Ecol. Model.
(2008) - et al.
A general probabilistic framework for uncertainty and global sensitivity analysis of deterministic models: a hydrological case study
Environ. Model. Software
(2014) - et al.
Sensitivity analysis of the MAGFLOW cellular automaton model for lava flow simulation
Environ. Model. Software
(2012) - et al.
Practical identifiability of ASM2d parameters — systematic selection and tuning of parameter subsets
Water Res.
(2002) - et al.
Sensitivity analysis of an environmental model: an application of different analysis methods
Reliab. Eng. Syst. Saf.
(1997) - et al.
Assessing the eutrophication risk of the Danjiangkou Reservoir based on the EFDC model
Ecol. Eng.
(2016) - et al.
Global sensitivity analysis in wastewater applications: a comprehensive comparison of different methods
Environ. Model. Software
(2013) - et al.
Quantitative global sensitivity analysis of the RZWQM to warrant a robust and effective calibration
J. Hydrol.
(2014) - et al.
Bayesian approach for uncertainty quantification in water quality modelling: the influence of prior distribution
J. Hydrol.
(2010) - et al.
Applying global sensitivity analysis to the modelling of flow and water quality in sewers
Water Res.
(2013)