Assessment of uncertainty in chemical models by Bayesian probabilities: Why, when, how?

Sahlin, Ullrika

doi:10.1007/s10822-014-9822-3

Assessment of uncertainty in chemical models by Bayesian probabilities: Why, when, how?

Special Series: Statistics in Molecular Modeling
Guest Editor: Anthony Nicholls
Published: 10 December 2014

Volume 29, pages 583–594, (2015)
Cite this article

Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Ullrika Sahlin¹

492 Accesses
5 Citations
Explore all metrics

Abstract

A prediction of a chemical property or activity is subject to uncertainty. Which type of uncertainties to consider, whether to account for them in a differentiated manner and with which methods, depends on the practical context. In chemical modelling, general guidance of the assessment of uncertainty is hindered by the high variety in underlying modelling algorithms, high-dimensionality problems, the acknowledgement of both qualitative and quantitative dimensions of uncertainty, and the fact that statistics offers alternative principles for uncertainty quantification. Here, a view of the assessment of uncertainty in predictions is presented with the aim to overcome these issues. The assessment sets out to quantify uncertainty representing error in predictions and is based on probability modelling of errors where uncertainty is measured by Bayesian probabilities. Even though well motivated, the choice to use Bayesian probabilities is a challenge to statistics and chemical modelling. Fully Bayesian modelling, Bayesian meta-modelling and bootstrapping are discussed as possible approaches. Deciding how to assess uncertainty is an active choice, and should not be constrained by traditions or lack of validated and reliable ways of doing it.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Confidence limits, error bars and method comparison in molecular modeling. Part 1: The calculation of confidence intervals

Article Open access 05 June 2014

Simple measures of uncertainty for model selection

Article 01 November 2020

Model Extension and Model Selection

Notes

We provide both ways to express the model to demonstrate the transition from classical statistical model specification, where the probabilistic model is implemented to the errors, to the general model specification, where the whole model is probabilistic.
The Bayesian framework is usually presented with parametric models, but is possible to apply on non-parametric models as well.
It does not have to be the classifier. Later we give an example where C is a variable expressing reliability in prediction given by the number of times a compound is classified as active from a set of ensemble predictions.

References

Nicholls A (2014) Confidence limits, error bars and method comparison in molecular modeling. Part 1: the calculation of confidence intervals. JCAMD 28(9):887–918
CAS Google Scholar
Sahlin U, Golsteijn L, Iqbal MS, Peijnenburg W (2013) Arguments for considering uncertainty in QSAR predictions in hazard and risk assessments. ATLA 41(1):91–110
CAS Google Scholar
Iqbal MS, Golsteijn L, Oberg T, Sahlin U, Papa E, Kovarich S, Huijbregts MAJ (2013) Understanding quantitative structure–property relationships uncertaity in environmental fate modelling. Environ Toxicol Chem 32(5):1069–1076
Article CAS Google Scholar
Jaworska J, Gabbert S, Aldenberg T (2010) Towards optimization of chemical testing under REACH: a Bayesian network approach to integrated testing strategies. Regul Toxicol Pharmacol 57(2–3):157–167
Article CAS Google Scholar
Eriksson L, Jaworska J, Worth AP, Cronin MTD, McDowell RM, Gramatica P (2003) Methods for reliability and uncertainty assessment and for applicability evaluations of classification- and regression-based QSARs. Environ Health Perspect 111(10):1361–1375
Article CAS Google Scholar
Geisser S (1993) Predictive inference: an introduction. Chapman & Hall, New York
Book Google Scholar
Wood DJ, Carlsson L, Eklund M, Norinder U, Stalring J (2013) QSAR with experimental and predictive distributions: an information theoretic approach for assessing model quality. JCAMD 27(3):203–219
CAS Google Scholar
Gelman A, Hill J (2007) Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, Cambridge
Google Scholar
Hastie T, Tibshirani R, Friedman JH (2009) The elements of statistical learning : data mining, inference, and prediction, 2nd edn. Springer, New York
Book Google Scholar
Bosnic Z, Kononenko I (2009) An overview of advances in reliability estimation of individual predictions in machine learning. Intell Data Anal 13(2):385–401
Google Scholar
Cox DR (2006) Principles of statistical inference. Cambridge University Press, Cambridge
Book Google Scholar
Aldenberg T, Jaworska JS (2000) Uncertainty of the hazardous concentration and fraction affected for normal species sensitivity distributions. Ecotoxicol Environ Saf 46(1):1–18
Article CAS Google Scholar
Aven T, Kvaløy JT (2002) Implementing the Bayesian paradigm in risk analysis. Reliab Eng Syst Saf 78(2):195–201
Article Google Scholar
Sahlin U (2013) Uncertainty in QSAR predictions. ATLA 41:111–125
CAS Google Scholar
Fielding AH, Bell JF (1997) A review of methods for the assessment of prediction errors in conservation presence/absence models. Environ Conserv 24(1):38–49
Article Google Scholar
O’Hara RB, Sillanpaa MJ (2009) A review of Bayesian variable selection methods: What, how and which. Bayesian Anal 4(1):85–117
Article Google Scholar
Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999) Bayesian model averaging: a tutorial. Stat Sci 14(4):382–401
Article Google Scholar
Andrieu C, Doucet A, Holenstein R (2010) Particle Markov chain Monte Carlo methods. J R Stat Soc Series B Stat Methodol 72:269–342
Article Google Scholar
Petralias A, Dellaportas P (2013) An MCMC model search algorithm for regression problems. J Stat Comput Simul 83(9):1722–1740
Article Google Scholar
Park T, Casella G (2008) The Bayesian Lasso. J Am Stat Assoc 103(482):681–686
Article CAS Google Scholar
Tipping ME (2004) Bayesian inference: an introduction to principles and practice in machine learning. In: Bousquet O, VonLuxburg U, Ratsch G (eds) Advanced Lectures on Machine Learning, vol 3176. Springer-verlag, Hiedelberg, pp 41–62
Rue H, Martino S, Chopin N (2009) Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J R Stat Soc B Series Methodol 71:319–392
Article Google Scholar
Rasmussen CE (2004) Gaussian processes in machine learning. In: Bousquet O, VonLuxburg U, Ratsch G (eds) Lecture notes in artificial intelligence, vol 3176. Springer-verlag, Hiedelberg, pp 63–71
Schwaighofer A, Schroeter T, Mika S, Blanchard G (2009) How wrong can we get? A review of machine learning approaches and error bars. Comb Chem High Throughput Screen 12(5):453–468
Article CAS Google Scholar
Denham MC (1997) Prediction intervals in partial least squares. J Chemom 11(1):39–52
Article CAS Google Scholar
O’Hagan A (2006) Bayesian analysis of computer code outputs: a tutorial. Reliab Eng Syst Saf 91(10–11):1290–1300
Article Google Scholar
Clark RD, Liang W, Lee AC, Lawless MS, Fraczkiewicz R, Waldman M (2014) Using beta binomials to estimate classification uncertainty for ensemble models. J Chemom 6:34
Google Scholar
Tetko IV, Sushko I, Pandey AK, Zhu H, Tropsha A, Papa E, Oberg T, Todeschini R, Fourches D, Varnek A (2008) Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: focusing on applicability domain and overfitting by variable selection. J Chem Inf Model 48(9):1733–1746
Article CAS Google Scholar
Sahlin U, Jeliazkova N, Öberg T (2013) Applicability domain dependent predictive uncertainty in QSAR regressions. Mol Inform 33(1):26–35
Article Google Scholar
Davison AC, Hinkley DV (1997) Bootstrap methods and their application. Cambridge Univ. Press, Cambridge
Book Google Scholar
Rubin DB (1981) The Bayesian Bootstrap. Ann Stat 9(1):130–134
Article Google Scholar

Download references

Acknowledgments

This work has been funded by the Swedish Research Council Formas through the project 219-2013-1271 “Scaling up uncertain environmental evidence-Quality assurance in ecosystem service predictions” and through the strategic research area Biodiversity and Ecosystems in a Changing Climate, BECC and by the European Seventh Framework Programme through the CADASTER (CAse studies on the Development and Application of in-Silico Techniques for Environmental hazard and Risk assessment) project FP7-ENV-2007-1-212668. The author wish to thank Rasmus Bååth and Tom Aldenberg for nice discussions on Bayesian concepts and Niklas Vareman and Yann Clough for valuable comments.

Author information

Authors and Affiliations

Centre of Environmental and Climate Research, Lund University, Lund, Sweden
Ullrika Sahlin

Authors

Ullrika Sahlin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ullrika Sahlin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sahlin, U. Assessment of uncertainty in chemical models by Bayesian probabilities: Why, when, how?. J Comput Aided Mol Des 29, 583–594 (2015). https://doi.org/10.1007/s10822-014-9822-3

Download citation

Received: 28 August 2014
Accepted: 04 December 2014
Published: 10 December 2014
Issue Date: July 2015
DOI: https://doi.org/10.1007/s10822-014-9822-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Assessment of uncertainty in chemical models by Bayesian probabilities: Why, when, how?

Abstract

Access this article

Similar content being viewed by others

Confidence limits, error bars and method comparison in molecular modeling. Part 1: The calculation of confidence intervals

Simple measures of uncertainty for model selection

Model Extension and Model Selection

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Assessment of uncertainty in chemical models by Bayesian probabilities: Why, when, how?

Abstract

Access this article

Similar content being viewed by others

Confidence limits, error bars and method comparison in molecular modeling. Part 1: The calculation of confidence intervals

Simple measures of uncertainty for model selection

Model Extension and Model Selection

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation