Skip to main content
Log in

New approximate Bayesian computation algorithm for censored data

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Approximate Bayesian computation refers to a family of algorithms that perform Bayesian inference under intractable likelihoods. In this paper we propose replacing the distance metric in certain algorithms with hypothesis testing. The benefits of which are that summary statistics are no longer required and censoring can be present in the observed data set without needing to simulate any censored data. We illustrate our proposed method through a nanotechnology application in which we estimate the concentration of particles in a liquid suspension. We prove that our method results in an approximation to the true posterior and that the parameter estimates are consistent. We further show, through comparative analysis, that it is more efficient than existing methods for censored data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Balakrishnan N, Cramer E (2014) The art of progressive censoring: applications to reliability and quality. Springer, New York

    Book  Google Scholar 

  • Beaumont M (2010) Approximate Bayesian computation in evolution and ecology. Annu Rev Ecol Evol Syst 41:379–406

    Article  Google Scholar 

  • Beaumont M, Zhang W, Balding D (2002) Approximate Bayesian computation in population genetics. Genetics 162:2025–2035

    Article  Google Scholar 

  • Blum M (2010) Approximate Bayesian computation: a nonparametric perspective. J Am Stat Assoc 105(491):1178–1187

    Article  MathSciNet  Google Scholar 

  • Blum M, Nunes M, Prangle D, Sisson S (2013) A comparative review of dimension reduction methods in approximate Bayesian computation. Stat Sci 28(2):189–208

    Article  MathSciNet  Google Scholar 

  • Braeckmans K, Buyens K, Bouquet W, Vervaet C, Joye P, De Vos F, Plawinskli L, Doeuvrei L, Angles-Canol E, Sanders N, Demeester J, Smedt S (2010) Sizing nanomatter in biological fluids by fluorescence single particle tracking. Nano Lett 10(11):4435–4442

    Article  Google Scholar 

  • Cameron E, Pettitt AN (2012) Approximate Bayesian computation for astronomical model analysis: a case study in galaxy demographics and morphological transformation at high redshift. Mon Not R Astron Soc 425:44–65

    Article  Google Scholar 

  • Chen HQPZ (2017) An improved two-stage procedure to compare hazard curves. J Stat Comput Simul 87(9):1877–1886

    Article  MathSciNet  Google Scholar 

  • Csilléry K, Blum MG, Gaggiotti OE, François O (2010) Approximate Bayesian computation (ABC) in practice. Trends Ecol Evol 25(7):410–418

    Article  Google Scholar 

  • Dmitrieva T, McCullough K, Ebrahimi N (2020) Improved approximate Bayesian computation methods via empirical likelihood. Comput Stat 36:1–20

    MathSciNet  MATH  Google Scholar 

  • Ebrahimi N, McCullough K (2016) Using approximate Bayesian computation to assess the reliability of nanocomponents of a nanosystem. Int J Reliab Qual Saf Eng 23(2):1650009

    Article  Google Scholar 

  • Fearnhead P, Prangle D (2012) Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation. J R Stat Soc 74(3):419–474

    Article  MathSciNet  Google Scholar 

  • Frazier D, Martin G, Robert C, Rousseau J (2018) Asymptotic properties of approximate bayesian computation. Biometrika 105(3):503–697

    Article  MathSciNet  Google Scholar 

  • Grazian C, Liseo B (2015) Approximate Bayesian computation for copula estimation. Statistica 75(1):111–127

    MATH  Google Scholar 

  • Griffin A, Shaw L, Stewart E (2018) Technical note: approximate Bayesian computation to improve long-return flood estimates using historical data. https://hess.copernicus.org/preprints/hess-2018-325/

  • Gutmann M, Dutta R, Kaski S, Corander J (2018) Likelihood-free inference via classification. Stat Comput 28:411–425

    Article  MathSciNet  Google Scholar 

  • Harrison J, Baker R (2017) An automatic adaptive method to combine summary statistics in approximate bayesian computation. PLoS ONE 15(8):e0236954

  • Jarvenpaa M, Gutmann M, Vehtari A (2018) Gaussian process modeling in approximate Bayesian computation to estimate horizontal gene transfer in bacteria. Ann Appl Stat 12(4):2228–2251

    Article  MathSciNet  Google Scholar 

  • Jennings E, Madigan M (2017) Astroabc?: An approximate Bayesian computation sequential Monte Carlo sampler for cosmological parameter estimation. Astron Comput 19:16–22

    Article  Google Scholar 

  • Jiang B, Wu T, Zheng C, Wong W (2017) Learning summary statistic for approximate Bayesian computation via deep neural network. Stat Sin 27(4):1595–1618

    MathSciNet  MATH  Google Scholar 

  • Kraus D (2009) Adaptive Neyman’s smooth tests of homogeneity of two samples of survival data. Stat Plan Infer 139(10):3559–3569

    Article  MathSciNet  Google Scholar 

  • Krishnanathan K, Anderson S, Billings S, Kadirkamanathan V (2015) Computational system identification of continuous-time nonlinear systems using approximate Bayesian computation. Int J Syst Sci 47(15):3537–3544

    Article  MathSciNet  Google Scholar 

  • Li H, Han D, Hou Y, Chen H, Chen Z (2015) Statistical inference methods for two crossing survival curves: a comparison of methods. PLoS ONE 10(1):e0116774

  • Lintusaari J, Gutmann M, Dutta R, Kaski S, Corander J (2017) Fundamentals and recent developments in approximate Bayesian computation. Syst Biol 66(1):66–82

    Google Scholar 

  • Mansinghka V, Kulkarni T, Perov Y, Tenenbaum J (2013) Approximate bayesian image interpretation using generative probabilistic graphics programs. In: NIPS’13: Proceedings of the 26th international conference on neural information processing systems. Curran Associates Inc., Red Hook, NY, NIPS’13, pp 1520–1528

  • Marin J, Pudlo P, Robert CP, Ryder RJ (2012) Approximate Bayesian computational methods. Stat Comput 22(6):1167–1180

    Article  MathSciNet  Google Scholar 

  • Mason P (2016) Approximate Bayesian computation of the occurrence and size of defects in advanced gas-cooled nuclear reactor boilers. Rel Eng Syst Saf 146:21–25

    Article  Google Scholar 

  • Masuda H, Ashoh H, Watanabe M, Nishio K, Nakao M, Tamamura T (2001) Square and triangular nanohole array architectures in anodic alumina. Adv Mater 13(3):189–192

    Article  Google Scholar 

  • McCullough K, Ebrahimi N (2018) Approximate Bayesian computation for censored data and its application to reliability assessment. IISE Trans 50(5):419–430

    Article  Google Scholar 

  • Qiu P, Sheng J (2008) A two-stage procedure for comparing hazard rate functions. J R Stat Soc Ser B Stat Methodol 70(1):191–208

    MathSciNet  MATH  Google Scholar 

  • Raynal L, Marin J, Pudlo P, Ribatet M, Robert C, Estoup A (2019) Abc random forests for Bayesian parameter inference. Bioinformatics 35(10):1720–1728

    Article  Google Scholar 

  • Robert C (2016) Approximate bayesian computation: a survey on recent results. In: Monte Carlo and quasi-Monte Carlo methods. Springer, pp 185–205

  • Roding M, Zagato E, Remaut K, Braeckmans K (2016) Approximate bayesian computation for estimating number concentrations of monodisperse nanoparticles in suspension by optical microscopy. Phys Rev E 93(6):063311

  • Ruiz-Suarez S, Leos-Barajas V, Alvarez-Castro I, Morales JM (2020) Using approximate bayesian inference for a “steps and turns” continuous-time random walk observed at regular time intervals. PeerJ 8:e8452

  • Sheng J, Qiu P, Geyer C (2019) TSHRC: Two Stage Hazard Rate Comparison. R package version 0.1-6

  • Simola U, Cisewski-Kehe J, Gutmann M, Corander M (2021) Adaptive approximate bayesian computation tolerance selection. Bayesian Anal 16(2):371–395

    Article  MathSciNet  Google Scholar 

  • Spooner A, Sowmy A, Sachdev P, Kochan N, Trollor J, Brodaty H (2020) A comparison of machine learning methods for survival analysis of high-dimensional clinical data for dementia prediction. Sci Rep 10:20410

  • Sweeting T, Kharroubi S (2005) Application of a predictive distribution formula to Bayesian computation for incomplete data models. Stat Comput 15:167–178

    Article  MathSciNet  Google Scholar 

  • Vock D, Wolfson J, Bandyopadhyay S, Adomavicius G, Johnson P, Vazquez-Benitez G, O’Connor P (2016) Adapting machine learning techniques to censored time-to-event health record data: a general-purpose approach using inverse probability of censoring weighting. J Biomed Inform 61:119–131

  • Wang Z, Kim J (2018) Approximate Bayesian inference under informative sampling. Biometrika 105(1):91–102

  • Williams J, Kim H, Crespi C (2020) Modeling observations with a detection limit using a truncated normal distribution with censoring. BMC Med Res Methodol 20:170

    Article  Google Scholar 

  • Zeng X, Latimer M, Xiao Z, Panuganti S, Welp U, Kwok W, Xu T (2011) Hydrogen gas sensing with networks of ultrasmall palladium nanowires formed on filtration membranes. Nano Lett 11(1):262–268

    Article  Google Scholar 

  • Zeng X, Wang Y, Deng H, Latimer M, Xiao Z, Pearson J, Xu T, Welp U, Crabtree G, Kwok W (2011) Networks of ultrasmall Pd/Cr nanowires as high performance hydrogen sensors. ACS Nano 5(9):7443–7452

  • Zeng X, Wang Y, Xiao Z, Latimer M, Xu T, Kwok W (2012) Hydrogen responses of ultrathin Pd films and nanowire networks with a Ti buffer layer. J Mater Sci 47(18):6647–6651

    Article  Google Scholar 

  • Zhou J, Fukumizu K (2018) Local kernel dimension reduction in approximate Bayesian computation. Open J Stat 8:479–496

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank Dr. Magnus Roding from the Bioscience and Materials division at RISE Research Institutes of Sweden for providing us with the data used in Sect. 5.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nader Ebrahimi.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Proof of Lemma 1

It is clear that \(pr(Y>y)=pr(X_1>y)pr(X_2>y)\). Then,

$$\begin{aligned} \begin{aligned} pr(Y+U)&=\int pr(Y>y-u)f_{U}(u) \mathrm{d}u \\&=\int pr(X_1>y-u)pr(X_2>y-u)f_{U}(u) \mathrm{d}u. \end{aligned} \end{aligned}$$

Also,

$$\begin{aligned} \begin{aligned} pr(\min (X_1+U, X_2+U)>y)&= \int pr(X_1+U>y) pr(X_2+U>y) f_{U}(u) \mathrm{d}u \\&=\int pr(X_1>y-u)pr(X_2>y-u)f_{U}(u) \mathrm{d}u. \end{aligned} \end{aligned}$$

Thus,

$$\begin{aligned} pr(Y+U>y)=pr(\min (X_1+U, X_2+U)>y)). \end{aligned}$$

\(\square \)

Proof of Lemma 2

The moment generating function of \(Y_i\) is \(M_{Y_i}(t)=M_{X_i}(t)M_{U_i}(t)\) for \(i=1,2\). If \(Y_1\) is stochastically equivalent to \(Y_2\), then

$$\begin{aligned} M_{Y_1}(t)=M_{X_1}(t)M_{U_1}(t) =M_{X_2}(t)M_{U_2}(t) =M_{Y_2}(t). \end{aligned}$$

This implies \(M_{X_1}(t)=M_{X_2}(t)\), and thus \(X_1\) is stochastically equivalent to \(X_2\).

If \(X_1\) is stochastically equivalent to \(X_2\), then \(M_{X_1}(t)=M_{X_2}(t)\), and thus \(Y_1\) is stochastically equivalent to \(Y_2\). \(\square \)

Proof of Lemma 3

Let A represent the event that \(H_0\) is not rejected, i.e. accepted. Let T represent the event that \(H_0\) is true, while F represents \(H_0\) is false, i.e. the complement of T. The power of the test is \(1-\beta \), where \(\beta =pr(A|F)\).

We first look at the bias.

$$\begin{aligned} \begin{aligned} E({\hat{\theta }}| A)&=E({\hat{\theta }}|A\cap T)pr(T|A) + E({\hat{\theta }}|A\cap F)pr(F|A) \\&=E({\hat{\theta }}|A\cap T) \Big [1-pr(F|A) \Big ] + E({\hat{\theta }}|A\cap F)pr(F|A) \\&=E({\hat{\theta }}|A\cap T) \Bigg [1-\dfrac{pr(A|F)pr(F)}{pr(A)}\Bigg ] + E({\hat{\theta }}|A\cap F)\dfrac{pr(A|F)pr(F)}{pr(A)} \\&=E({\hat{\theta }}|A\cap T) \Bigg [1-\dfrac{\beta pr(F)}{pr(A)}\Bigg ] + E({\hat{\theta }}|A\cap F)\dfrac{\beta pr(F)}{pr(A)} \rightarrow \theta \quad \text { as } n\rightarrow \infty . \end{aligned} \end{aligned}$$

Since \({\hat{\theta }}\) is an asymptotically unbiased estimator of \(\theta \) when \(H_0\) is true, \(E({\hat{\theta }}|A\cap T)\rightarrow \theta \) as \(n\rightarrow \infty \). The power of the test goes to one, so \(\beta \rightarrow 0\) as \(n\rightarrow \infty \). Therefore the bias \(\theta -E({\hat{\theta }}| A) \rightarrow 0\) as \(n\rightarrow \infty \). \(\square \)

Proof of Theorem 1

Let A represent the event that \(H_0\) is not rejected, i.e. accepted. Let T represent the event that \(H_0\) is true, while F represents \(H_0\) is false. The power of the test is \(1-\beta \).

Applying the total conditional variance formula, we have:

$$\begin{aligned} \mathrm{var}({\hat{\theta }}|A)=E(\mathrm{var}({\hat{\theta }} | A) |A) + \mathrm{var}( E ({\hat{\theta }}| A) |A). \end{aligned}$$
(13)

By Lemma 3 the second term in Eq. (13) goes to zero as \(n\rightarrow \infty \). Now consider the first term:

$$\begin{aligned} \begin{aligned} E(\mathrm{var}({\hat{\theta }} | A) |A)&=\mathrm{var}({\hat{\theta }}|A\cap T)pr(T|A) + \mathrm{var}({\hat{\theta }}|A\cap F)pr(F|A) \\&=\mathrm{var}({\hat{\theta }}|A\cap T) \Big [1-pr(F|A) \Big ] + \mathrm{var}({\hat{\theta }}|A\cap F)pr(F|A) \\&=\mathrm{var}({\hat{\theta }}|A\cap T) \Bigg [1-\dfrac{pr(A|F)pr(F)}{pr(A)}\Bigg ] \\&\quad + \mathrm{var}({\hat{\theta }}|A\cap F)\dfrac{pr(A|F)pr(F)}{pr(A)} \\&=\mathrm{var}({\hat{\theta }}|A\cap T) \Bigg [1-\dfrac{\beta pr(F)}{pr(A)}\Bigg ] + \mathrm{var}({\hat{\theta }}|A\cap F)\dfrac{\beta pr(F)}{pr(A)}. \end{aligned} \end{aligned}$$

We know that \(\mathrm{var}({\hat{\theta }}|A\cap T) \rightarrow 0\) as \(n\rightarrow \infty \), because \({\hat{\theta }}\) is a consistent estimator when \(H_0\) is true. The power of the test goes to one, so \(\beta \rightarrow 0\) as \(n\rightarrow \infty \). Thus, \(E(\mathrm{var}({\hat{\theta }} | A) |A)\rightarrow 0 \text { as } n\rightarrow \infty \). Therefore, \({\hat{\theta }}\) is a consistent estimator if \(H_0\) is not rejected by a sufficient condition. \(\square \)

Proof of Theorem 2

Let \(\{ \theta ^*_i\}_{i=1}^N\) be the resulting sequence of the algorithm. Each \( \theta ^*_i\) is an independent draw from \(f(\theta |A)\), where A represent the event that \(H_0\) is not rejected. Then

$$\begin{aligned} f(\theta ^*_i) \propto \displaystyle \sum _ {x^* \in {\mathcal {F}}} f(x^*|\theta ^*_i) \pi (\theta ^*_i) \mathbb {1}_{A} \propto \displaystyle \sum _ {x^*:~ A} f(x^*|\theta ^*_i) \pi (\theta ^*_i) \propto \pi _{A}(\theta ^*_i|x), \end{aligned}$$

where \({\mathcal {F}}\) is \(\sigma \)-algebra on some given set \(\varOmega \), \(x^*\) is simulated data, \(f(x^*| \cdot )\) is the model for simulated data, and \(\pi (\cdot )\) is the prior distribution. The resulting approximate posterior distribution \(\pi _{H_0}(\theta ^*_i|x)\) depends on the test which helps to make a decision to reject the null hypothesis or not.

As the sample size of the observed data set approaches infinity, the sample size of the simulated data set also approaches infinity. Then the power of the test approaches one,

$$\begin{aligned} 1-\beta = pr(H_0~\mathrm{is}~\mathrm{rejected}~ |~H_0~\mathrm{is}~\mathrm{false}) \rightarrow 1. \end{aligned}$$

Meaning that when, in reality, the null hypothesis is false, the alternative hypothesis will not be rejected, and the corresponding candidate parameter value will not be taken into the resulting sequence of parameter values. Thus, the wrong parameter values will not be taken into the resulting sequence of parameter values at all. Hence, the resulting posterior distribution will be a true posterior,

$$\begin{aligned} \pi _{A}(\theta ^*_i|x) \rightarrow \pi (\theta ^*_i|x), \end{aligned}$$

as the sample sizes of the observed and simulated data sets approach infinity. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

McCullough, K., Dmitrieva, T. & Ebrahimi, N. New approximate Bayesian computation algorithm for censored data. Comput Stat 37, 1369–1397 (2022). https://doi.org/10.1007/s00180-021-01167-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-021-01167-3

Keywords

Navigation