Abstract
In this paper, we present a novel approach to the estimation of a density function at a specific chosen point. With this approach, we can estimate a normalizing constant, or equivalently compute a marginal likelihood, by focusing on estimating a posterior density function at a point. Relying on the Fourier integral theorem, the proposed method is capable of producing quick and accurate estimates of the marginal likelihood, regardless of how samples are obtained from the posterior; that is, it uses the posterior output generated by a Markov chain Monte Carlo sampler to estimate the marginal likelihood directly, with no modification to the form of the estimator on the basis of the type of sampler used. Thus, even for models with complicated specifications, such as those involving challenging hierarchical structures, or for Markov chains obtained from a black-box MCMC algorithm, the method provides a straightforward means of quickly and accurately estimating the marginal likelihood. In addition to developing theory to support the favorable behavior of the estimator, we also present a number of illustrative examples.
Similar content being viewed by others
References
Abramowitz, M., Stegun, I.A.: Sine and cosine integrals. In: Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, pp. 231–233. Dover (1972)
Abrams, D.I., Goldman, A.I., Launer, C., Korvick, J.A., Neaton, J.D., Crane, L.R., Grodesky, M., Wakefield, S., Muth, K., Kornegay, S., et al.: A comparative trial of didanosine or zalcitabine after treatment with zidovudine in patients with human immunodeficiency virus infection. N. Engl. J. Med. 330(10), 657–662 (1994)
Botev, Z., L’Ecuyer, P., Tuffin, B.: Markov chain importance sampling with applications to rare event probability estimation. Stat. Comput. 23, 271–285 (2013)
Carlin, B.P., Chib, S.: Bayesian model choice via Markov chain Monte Carlo methods. J. R. Stat. Soc. Ser. B (Methodol.) 57(3), 473–484 (1995)
Chan, J., Eisenstat, E.: Marginal likelihood estimation with the cross-entropy method. Econ. Rev. 34, 256–285 (2015)
Chen, M.H.: Computing marginal likelihoods from a single MCMC output. Stat. Neerlandica 59, 256–285 (2005)
Chen, M.H., Shao, Q.M.: On Monte Carlo methods for estimating ratios of normalizing constants. Ann. Stat. 25(4), 1563–1594 (1997)
Chib, S.: Marginal likelihood from the Gibbs output. J. Am. Stat. Assoc. 90(432), 1313–1321 (1995)
Chib, S., Carlin, B.P.: On MCMC sampling in hierarchical longitudinal models. Stat. Comput. 9(1), 17–26 (1999)
Chib, S., Greenberg, E.: Analysis of multivariate probit models. Biometrika 85(2), 347–361 (1998)
Chib, S., Jeliazkov, I.: Marginal likelihood from the Metropolis-Hastings output. J. Am. Stat. Assoc. 96(453), 270–281 (2001)
Clyde, M., George, E.I.: Model uncertainty. Stat. Sci. pp. 81–94 (2004)
Dellaportas, P., Forster, J.J., Ntzoufras, I.: On Bayesian model and variable selection using MCMC. Stat. Comput. 12(1), 27–36 (2002)
Folland, G.B.: Fourier analysis and its applications, vol. 4. Am. Math. Soc. (2009)
Friel, N., Pettitt, A.N.: Marginal likelihood estimation via power posteriors. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 70(3), 589–607 (2008)
Friel, N., Wyse, J.: Estimating the evidence: a review. Stat. Neerl. 66, 288–308 (2012)
Frühwirth-Schnatter, S.: Estimating marginal likelihoods for mixture and Markov switching models using bridge sampling techniques. Economet. J. 7(1), 143–167 (2004)
Gelman, A., Meng, X.L.: Simulating normalizing constants: from importance sampling to bridge sampling to path sampling. Stat. Sci. pp. 163–185 (1998)
Green, P.J.: Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82(4), 711–732 (1995)
Greene, W.H.: Econometric Analysis. Prentice Hall, Hoboken (1997)
Gronau, Q., Singmann, H., Wagenmakers, E.J., et al.: bridgesampling: An r package for estimating normalizing constants. J. Stat. Softw. 92(10) (2020)
Han, C., Carlin, B.P.: Markov chain Monte Carlo methods for computing Bayes factors: a comparative review. J. Am. Stat. Assoc. 96(455), 1122–1132 (2001)
Ho, N., Walker, S.G.: Multivariate smoothing via the Fourier integral theorem and Fourier kernel. arXiv preprint arXiv:2012.14482 (2020)
Knuth, K.H., Habeck, M., Malakar, N.K., Mubeen, A.M., Placek, B.: Bayesian evidence and model selection. Digit. Signal Process. 47, 50–67 (2015)
Lenk, P.: Simulation pseudo-bias correction to the harmonic mean estimator of integrated likelihoods. J. Comput. Graph. Stat. 18(4), 941–960 (2009)
Lewis, S.M., Raftery, A.E.: Estimating Bayes factors via posterior simulation with the Laplace-metropolis estimator. J. Am. Stat. Assoc. 92, 648–655 (1997)
Llorente, F., Martino, L., Lopez-Santiago, J.: Marginal likelihood computation for model selection and hypothesis testing: an extensive review (2021)
Meng, X.L., Wong, W.H.: Simulating ratios of normalizing constants via a simple identity: a theoretical exploration. Stat. Sin. pp. 831–860 (1996)
Mira, A., Nicholls, G.: Bridge estimation of the probability density at a point. Stat. Sin. pp. 603–612 (2004)
Neal, R.M.: Annealed importance sampling. Stat. Comput. 11, 125–139 (2001)
Newton, M.A., Raftery, A.E.: Approximate Bayesian inference with the weighted likelihood bootstrap. J. R. Stat. Soc. B 56, 3–48 (1994)
Pajor, A.: Estimating the marginal likelihood using the arithmetic mean identity. Bayesian Anal. 12(1), 261–287 (2017)
Perrakis, K., Ntzoufras, I., Tsionas, E.G.: On the use of marginal posteriors in marginal likelihood estimation via importance sampling. Comput. Stat. Data Anal. 77, 54–69 (2014)
Priestley, H.A.: Introduction to complex analysis. Oxford (1985)
Raftery, A.: Hypothesis testing and model selection. In: Gilks, W.R., 1230 Richardson, S., Spiegelhalter, D.J. (eds.) Markov Chain Monte 1231 Carlo in Practice, p. Chapter 10. Chapman & Hall/CRC, Boca 1232 Raton (1996)
Raftery, A.E., Newton, M.A., Satagopan, J.M., Krivitsky, P.N.: Estimating the integrated likelihood via posterior simulation using the harmonic mean identity. In: Bernardo, J.M., Bayarri, M.J., Berger, J.O., Dawid, A.P., Heckerman, D., Smith, A.F.M., West, M. (eds.) Bayesian Statistics 8, pp. 1–45. Oxford University Press, Oxford (2007)
Ritter, C., Tanner, M.A.: Facilitating the Gibbs sampler: the Gibbs stopper and the Griddy-Gibbs sampler. J. Am. Stat. Assoc. 87(419), 861–868 (1992)
Robert, C.P., Wraith, D.: Computational methods for Bayesian model choice. In: Aip Conference Proceedings, vol. 1193, pp. 251–262. American Institute of Physics (2009)
Silverman, B.W.: Algorithm AS 176: Kernel density estimation using the fast Fourier transform. J. R. Stat. Soc. Ser. C (Appl. Stat.) 31(1), 93–99 (1982)
Skilling, J.: Nested sampling for general Bayesian computation. Bayesian Anal. 1, 833–860 (2006)
Wand, M.P., Jones, M.C.: Kernel Smoothing. CRC Press, Boca Raton (1994)
Weinberg, M.D.: Computing the Bayes factor from a Markov chain Monte Carlo simulation of the posterior distribution. Bayesian Anal. 7, 737–770 (2012)
Williams, E.J.: Regression Analysis, vol. 14. Wiley, Hoboken (1959)
Acknowledgements
The authors are grateful for the comments and suggestions of three reviewers which have allowed us to significantly improve the paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Appendices
Appendix
1.1 Proof of Theorem 1:
We first show that
for all \(R\ge 0\). Now,
and using integration by parts, with \(x\,\phi (x)=-\phi '(x)\), we have \(I'(R)=-R\,I(R)\) and hence (4) holds since \(I(0)=1\).
Now consider
and recall
and so,
since \(\sin (Rx)\) is an odd function. Further, it is straightforward to show that
using suitable transforms.
If
then \(J'(R)\) is given by (4), so
since \(J(0)=0\). Hence,
We want to look at
and from (4), we have that
Therefore,
This completes the proof.
1.2 General theory:
If \(f(x)\) is integrable, piecewise smooth, and piecewise continuous on \({\mathbb {R}}\), defined at its points of discontinuity so as to satisfy \(f(x)=\frac{1}{2}\left[ f(x-)+f(x+)\right] \) for all \(x\), then, as a consequence of the Fourier inversion theorem (see Folland (2009) for details), we have that
Now, assume that observations \(X_1,\dots ,X_n\) are i.i.d. from \(f\), a smooth density function on \({\mathbb {R}}\). Then, consider the (Monte Carlo) estimator
For the mean, we see that, for \(R\rightarrow \infty \),
For the variance, we see that, for \(R\rightarrow \infty \),
Extending to higher dimensions, we find, as a consequence of the Fourier inversion theorem, that
In particular, assuming that \(R_j=R\) for \(j=1,\dots ,d\), we see that the estimator takes the form
Then, as an extension of the result for univariate variance, assuming mutual independence of all components of \(X\), we have that
Details for Chib and Jeliazkov (2001)
In moving from Gibbs output to Metropolis–Hastings output, we can consider the method of bridge sampling; see, for example, Gronau et al. (2020). With this method, the estimate of the marginal likelihood, which can also be seen as the normalizing constant for a posterior density function, is given by
where the \(({\widetilde{\theta }}_j)_{j=1:n_1}\) are i.i.d. from the importance density g, and the \((\theta _j^*)_{j=1:n_2}\) are i.i.d. from the posterior density. The choices to be made include the bridge function h and the importance density g. As previously mentioned, Mira and Nicholls (2004) have identified the estimator of Chib and Jeliazkov (2001) as a bridge sampling estimator.
In the context of MCMC chains produced by the Metropolis–Hastings algorithm, Chib and Jeliazkov (2001) introduce a more complicated method for marginal likelihood estimation. The Metropolis–Hastings algorithm is more flexible than the Gibbs algorithm insofar as not all the normalizing constants of the full conditional densities need to be known to run the Metropolis–Hastings algorithm. For the Metropolis–Hastings algorithm, the estimation of the posterior ordinate \(\pi (\theta ^*|x)\) given the posterior sample \(\{\theta ^{(1)},\dots ,\theta ^{(M)}\}\) requires the specification of a proposal density \(q(\theta ,\theta '|x)\) for the transition from \(\theta \) to \(\theta '\).
With this approach, Chib and Jeliazkov (2001) provide the following estimate for the marginal likelihood: \({\widehat{\pi }}\left( \theta ^*\mid x\right) =\)
where \(\alpha \left( \theta _r,\theta _r'|\psi _{r-1},\psi ^{r+1}\right) =\)
with \(\psi _{r-1}\) denoting the parameters (or blocks of parameters) up to \(r\) and \(\psi ^{r+1}\) denoting those beyond \(r\). In particular, the quantity \(\pi (\theta _r^*\mid x,\theta _1^*,\dots ,\theta _{r-1}^*)\) such that \(\pi (\theta _r^*\mid x,\theta _1^*,\dots ,\theta _{r-1}^*)=\)
involves \(\hbox {E}_1\), an expectation with respect to the conditional posterior \(\pi (\theta _r,\psi ^{r+1}|x,\psi ^*_{r-1})\), and \(\hbox {E}_2\), an expectation with respect to the conditional product measure
Here, we have used exactly the notation in Chib and Jeliazkov (2001).
In the case of Metropolis–Hastings output, as in the case of Gibbs output, multiple runs of the sampling algorithm are needed in general, save when a single block is required, for the calculation of the numerator value. Moreover, for the Metropolis–Hastings output, there appears in the denominator of the marginal likelihood estimate a second expectation, which requires separate treatment that can prove time-consuming for certain likelihood evaluations, as revealed in Sect. 4.7.
The corresponding modifications to the algorithm must be made to ensure that the extra information needed to compute each conditional ordinate is properly stored, and although any subsequent runs simulate from a smaller set of distributions, the added sampling not only takes up time but also increases the chances that an error may be made by the user during implementation, especially as there are now multiple different expectations being taken between the parameters (or blocks) in the numerator and the denominator.
Supplementary Material
We describe the contents of the Supplementary Material document. Section 1 highlights the ability of the Fourier integral theorem, without any tuning or estimation of a covariance matrix, to estimate the value of a point on an irregular shaped density function. Section 2 compares the Fourier approach with the Warp–III algorithm. Section 3 shows how it is possible to also estimate the likelihood function using the Fourier integral theorem when the likelihood is deemed intractable. Section 4 does a wide-ranging comparison of methods for a non-nested linear model, and Section 5 does the same for a logistic regression model. Section 6 considers a mixture model, Section 7 a bimodal model, and Section 8 a dynamic linear model. In Section 9, we look at the possibility in some cases of being able to reduce the dimension of the problem by integrating out a number of parameters. Finally in Section 10 we have some remarks on the Tables presented in the main paper.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Rotiroti, F., Walker, S.G. Computing marginal likelihoods via the Fourier integral theorem and pointwise estimation of posterior densities. Stat Comput 32, 67 (2022). https://doi.org/10.1007/s11222-022-10131-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11222-022-10131-0