Abstract
Approximate Bayesian computation refers to a family of algorithms that perform Bayesian inference under intractable likelihoods. In this paper we propose replacing the distance metric in certain algorithms with hypothesis testing. The benefits of which are that summary statistics are no longer required and censoring can be present in the observed data set without needing to simulate any censored data. We illustrate our proposed method through a nanotechnology application in which we estimate the concentration of particles in a liquid suspension. We prove that our method results in an approximation to the true posterior and that the parameter estimates are consistent. We further show, through comparative analysis, that it is more efficient than existing methods for censored data.
Similar content being viewed by others
References
Balakrishnan N, Cramer E (2014) The art of progressive censoring: applications to reliability and quality. Springer, New York
Beaumont M (2010) Approximate Bayesian computation in evolution and ecology. Annu Rev Ecol Evol Syst 41:379–406
Beaumont M, Zhang W, Balding D (2002) Approximate Bayesian computation in population genetics. Genetics 162:2025–2035
Blum M (2010) Approximate Bayesian computation: a nonparametric perspective. J Am Stat Assoc 105(491):1178–1187
Blum M, Nunes M, Prangle D, Sisson S (2013) A comparative review of dimension reduction methods in approximate Bayesian computation. Stat Sci 28(2):189–208
Braeckmans K, Buyens K, Bouquet W, Vervaet C, Joye P, De Vos F, Plawinskli L, Doeuvrei L, Angles-Canol E, Sanders N, Demeester J, Smedt S (2010) Sizing nanomatter in biological fluids by fluorescence single particle tracking. Nano Lett 10(11):4435–4442
Cameron E, Pettitt AN (2012) Approximate Bayesian computation for astronomical model analysis: a case study in galaxy demographics and morphological transformation at high redshift. Mon Not R Astron Soc 425:44–65
Chen HQPZ (2017) An improved two-stage procedure to compare hazard curves. J Stat Comput Simul 87(9):1877–1886
Csilléry K, Blum MG, Gaggiotti OE, François O (2010) Approximate Bayesian computation (ABC) in practice. Trends Ecol Evol 25(7):410–418
Dmitrieva T, McCullough K, Ebrahimi N (2020) Improved approximate Bayesian computation methods via empirical likelihood. Comput Stat 36:1–20
Ebrahimi N, McCullough K (2016) Using approximate Bayesian computation to assess the reliability of nanocomponents of a nanosystem. Int J Reliab Qual Saf Eng 23(2):1650009
Fearnhead P, Prangle D (2012) Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation. J R Stat Soc 74(3):419–474
Frazier D, Martin G, Robert C, Rousseau J (2018) Asymptotic properties of approximate bayesian computation. Biometrika 105(3):503–697
Grazian C, Liseo B (2015) Approximate Bayesian computation for copula estimation. Statistica 75(1):111–127
Griffin A, Shaw L, Stewart E (2018) Technical note: approximate Bayesian computation to improve long-return flood estimates using historical data. https://hess.copernicus.org/preprints/hess-2018-325/
Gutmann M, Dutta R, Kaski S, Corander J (2018) Likelihood-free inference via classification. Stat Comput 28:411–425
Harrison J, Baker R (2017) An automatic adaptive method to combine summary statistics in approximate bayesian computation. PLoS ONE 15(8):e0236954
Jarvenpaa M, Gutmann M, Vehtari A (2018) Gaussian process modeling in approximate Bayesian computation to estimate horizontal gene transfer in bacteria. Ann Appl Stat 12(4):2228–2251
Jennings E, Madigan M (2017) Astroabc?: An approximate Bayesian computation sequential Monte Carlo sampler for cosmological parameter estimation. Astron Comput 19:16–22
Jiang B, Wu T, Zheng C, Wong W (2017) Learning summary statistic for approximate Bayesian computation via deep neural network. Stat Sin 27(4):1595–1618
Kraus D (2009) Adaptive Neyman’s smooth tests of homogeneity of two samples of survival data. Stat Plan Infer 139(10):3559–3569
Krishnanathan K, Anderson S, Billings S, Kadirkamanathan V (2015) Computational system identification of continuous-time nonlinear systems using approximate Bayesian computation. Int J Syst Sci 47(15):3537–3544
Li H, Han D, Hou Y, Chen H, Chen Z (2015) Statistical inference methods for two crossing survival curves: a comparison of methods. PLoS ONE 10(1):e0116774
Lintusaari J, Gutmann M, Dutta R, Kaski S, Corander J (2017) Fundamentals and recent developments in approximate Bayesian computation. Syst Biol 66(1):66–82
Mansinghka V, Kulkarni T, Perov Y, Tenenbaum J (2013) Approximate bayesian image interpretation using generative probabilistic graphics programs. In: NIPS’13: Proceedings of the 26th international conference on neural information processing systems. Curran Associates Inc., Red Hook, NY, NIPS’13, pp 1520–1528
Marin J, Pudlo P, Robert CP, Ryder RJ (2012) Approximate Bayesian computational methods. Stat Comput 22(6):1167–1180
Mason P (2016) Approximate Bayesian computation of the occurrence and size of defects in advanced gas-cooled nuclear reactor boilers. Rel Eng Syst Saf 146:21–25
Masuda H, Ashoh H, Watanabe M, Nishio K, Nakao M, Tamamura T (2001) Square and triangular nanohole array architectures in anodic alumina. Adv Mater 13(3):189–192
McCullough K, Ebrahimi N (2018) Approximate Bayesian computation for censored data and its application to reliability assessment. IISE Trans 50(5):419–430
Qiu P, Sheng J (2008) A two-stage procedure for comparing hazard rate functions. J R Stat Soc Ser B Stat Methodol 70(1):191–208
Raynal L, Marin J, Pudlo P, Ribatet M, Robert C, Estoup A (2019) Abc random forests for Bayesian parameter inference. Bioinformatics 35(10):1720–1728
Robert C (2016) Approximate bayesian computation: a survey on recent results. In: Monte Carlo and quasi-Monte Carlo methods. Springer, pp 185–205
Roding M, Zagato E, Remaut K, Braeckmans K (2016) Approximate bayesian computation for estimating number concentrations of monodisperse nanoparticles in suspension by optical microscopy. Phys Rev E 93(6):063311
Ruiz-Suarez S, Leos-Barajas V, Alvarez-Castro I, Morales JM (2020) Using approximate bayesian inference for a “steps and turns” continuous-time random walk observed at regular time intervals. PeerJ 8:e8452
Sheng J, Qiu P, Geyer C (2019) TSHRC: Two Stage Hazard Rate Comparison. R package version 0.1-6
Simola U, Cisewski-Kehe J, Gutmann M, Corander M (2021) Adaptive approximate bayesian computation tolerance selection. Bayesian Anal 16(2):371–395
Spooner A, Sowmy A, Sachdev P, Kochan N, Trollor J, Brodaty H (2020) A comparison of machine learning methods for survival analysis of high-dimensional clinical data for dementia prediction. Sci Rep 10:20410
Sweeting T, Kharroubi S (2005) Application of a predictive distribution formula to Bayesian computation for incomplete data models. Stat Comput 15:167–178
Vock D, Wolfson J, Bandyopadhyay S, Adomavicius G, Johnson P, Vazquez-Benitez G, O’Connor P (2016) Adapting machine learning techniques to censored time-to-event health record data: a general-purpose approach using inverse probability of censoring weighting. J Biomed Inform 61:119–131
Wang Z, Kim J (2018) Approximate Bayesian inference under informative sampling. Biometrika 105(1):91–102
Williams J, Kim H, Crespi C (2020) Modeling observations with a detection limit using a truncated normal distribution with censoring. BMC Med Res Methodol 20:170
Zeng X, Latimer M, Xiao Z, Panuganti S, Welp U, Kwok W, Xu T (2011) Hydrogen gas sensing with networks of ultrasmall palladium nanowires formed on filtration membranes. Nano Lett 11(1):262–268
Zeng X, Wang Y, Deng H, Latimer M, Xiao Z, Pearson J, Xu T, Welp U, Crabtree G, Kwok W (2011) Networks of ultrasmall Pd/Cr nanowires as high performance hydrogen sensors. ACS Nano 5(9):7443–7452
Zeng X, Wang Y, Xiao Z, Latimer M, Xu T, Kwok W (2012) Hydrogen responses of ultrathin Pd films and nanowire networks with a Ti buffer layer. J Mater Sci 47(18):6647–6651
Zhou J, Fukumizu K (2018) Local kernel dimension reduction in approximate Bayesian computation. Open J Stat 8:479–496
Acknowledgements
We would like to thank Dr. Magnus Roding from the Bioscience and Materials division at RISE Research Institutes of Sweden for providing us with the data used in Sect. 5.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Proof of Lemma 1
It is clear that \(pr(Y>y)=pr(X_1>y)pr(X_2>y)\). Then,
Also,
Thus,
\(\square \)
Proof of Lemma 2
The moment generating function of \(Y_i\) is \(M_{Y_i}(t)=M_{X_i}(t)M_{U_i}(t)\) for \(i=1,2\). If \(Y_1\) is stochastically equivalent to \(Y_2\), then
This implies \(M_{X_1}(t)=M_{X_2}(t)\), and thus \(X_1\) is stochastically equivalent to \(X_2\).
If \(X_1\) is stochastically equivalent to \(X_2\), then \(M_{X_1}(t)=M_{X_2}(t)\), and thus \(Y_1\) is stochastically equivalent to \(Y_2\). \(\square \)
Proof of Lemma 3
Let A represent the event that \(H_0\) is not rejected, i.e. accepted. Let T represent the event that \(H_0\) is true, while F represents \(H_0\) is false, i.e. the complement of T. The power of the test is \(1-\beta \), where \(\beta =pr(A|F)\).
We first look at the bias.
Since \({\hat{\theta }}\) is an asymptotically unbiased estimator of \(\theta \) when \(H_0\) is true, \(E({\hat{\theta }}|A\cap T)\rightarrow \theta \) as \(n\rightarrow \infty \). The power of the test goes to one, so \(\beta \rightarrow 0\) as \(n\rightarrow \infty \). Therefore the bias \(\theta -E({\hat{\theta }}| A) \rightarrow 0\) as \(n\rightarrow \infty \). \(\square \)
Proof of Theorem 1
Let A represent the event that \(H_0\) is not rejected, i.e. accepted. Let T represent the event that \(H_0\) is true, while F represents \(H_0\) is false. The power of the test is \(1-\beta \).
Applying the total conditional variance formula, we have:
By Lemma 3 the second term in Eq. (13) goes to zero as \(n\rightarrow \infty \). Now consider the first term:
We know that \(\mathrm{var}({\hat{\theta }}|A\cap T) \rightarrow 0\) as \(n\rightarrow \infty \), because \({\hat{\theta }}\) is a consistent estimator when \(H_0\) is true. The power of the test goes to one, so \(\beta \rightarrow 0\) as \(n\rightarrow \infty \). Thus, \(E(\mathrm{var}({\hat{\theta }} | A) |A)\rightarrow 0 \text { as } n\rightarrow \infty \). Therefore, \({\hat{\theta }}\) is a consistent estimator if \(H_0\) is not rejected by a sufficient condition. \(\square \)
Proof of Theorem 2
Let \(\{ \theta ^*_i\}_{i=1}^N\) be the resulting sequence of the algorithm. Each \( \theta ^*_i\) is an independent draw from \(f(\theta |A)\), where A represent the event that \(H_0\) is not rejected. Then
where \({\mathcal {F}}\) is \(\sigma \)-algebra on some given set \(\varOmega \), \(x^*\) is simulated data, \(f(x^*| \cdot )\) is the model for simulated data, and \(\pi (\cdot )\) is the prior distribution. The resulting approximate posterior distribution \(\pi _{H_0}(\theta ^*_i|x)\) depends on the test which helps to make a decision to reject the null hypothesis or not.
As the sample size of the observed data set approaches infinity, the sample size of the simulated data set also approaches infinity. Then the power of the test approaches one,
Meaning that when, in reality, the null hypothesis is false, the alternative hypothesis will not be rejected, and the corresponding candidate parameter value will not be taken into the resulting sequence of parameter values. Thus, the wrong parameter values will not be taken into the resulting sequence of parameter values at all. Hence, the resulting posterior distribution will be a true posterior,
as the sample sizes of the observed and simulated data sets approach infinity. \(\square \)
Rights and permissions
About this article
Cite this article
McCullough, K., Dmitrieva, T. & Ebrahimi, N. New approximate Bayesian computation algorithm for censored data. Comput Stat 37, 1369–1397 (2022). https://doi.org/10.1007/s00180-021-01167-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-021-01167-3