Abstract
Discrete Markov random fields form a natural class of models to represent images and spatial datasets. The use of such models is, however, hampered by a computationally intractable normalising constant. This makes parameter estimation and a fully Bayesian treatment of discrete Markov random fields difficult. We apply approximation theory for pseudo-Boolean functions to binary Markov random fields and construct approximations and upper and lower bounds for the associated computationally intractable normalising constant. As a by-product of this process we also get a partially ordered Markov model approximation of the binary Markov random field. We present numerical examples with both the pairwise interaction Ising model and with higher-order interaction models, showing the quality of our approximations and bounds. We also present simulation examples and one real data example demonstrating how the approximations and bounds can be applied for parameter estimation and to handle a fully Bayesian model computationally.
Similar content being viewed by others
References
Austad, H.M.: Approximations of binary Markov random fields. PhD thesis, Norwegian University of Science and Technology. Thesis number 292:2011. Available from http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-14922 (2011)
Besag, J.: Spatial interaction and the statistical analysis of lattice systems. J. R. Stat. Soc. Ser. B 36, 192–225 (1974)
Besag, J.: On the statistical analysis of dirty pictures (with discussion). J. R. Stat. Soc. Ser. B 48, 259–302 (1986)
Clifford, P.: Markov random fields in statistics. In: Grimmett, G.R., Welsh, D.J.A. (eds.) Disorder in Physical Systems, pp. 19–31. Oxford University Press (1990)
Cowell, R.G., Dawid, A.P., Lauritzen, S.L., Spiegelhalter, D.J.: Probabilistic Networks and Expert Systems, Exact Computational Methods for Bayesian Networks. Springer, London (2007)
Cressie, N.A.C.: Statistics for Spatial Data, 2nd edn. Wiley, New York (1993)
Cressie, N., Davidson, J.: Image analysis with partially ordered Markov models. Comput. Stat. Data Anal. 29, 1–26 (1998)
Ding, G., Lax, R., Chen, J., Chen, P.P.: Formulas for approximating pseudo-Boolean random variables. Discret. Appl. Math. 156, 1581–1597 (2008)
Ding, G., Lax, R., Chen, J., Chen, P.P., Marx, B.D.: Transforms of pseudo-Boolean random variables. Discret. Appl. Math. 158, 13–24 (2010)
Friel, N., Rue, H.: Recursive computing and simulation-free inference for general factorizable models. Biometrika 94, 661–672 (2007)
Friel, N., Pettitt, A.N., Reeves, R., Wit, E.: Bayesian inference in hidden Markov random fields for binary data defined on large lattices. J. Comput. Graph. Stat. 18, 243–261 (2009)
Gelman, A., Meng, X.-L.: Simulating normalizing constants: from importance sampling to bridge sampling to path sampling. Stat. Sci. 13, 163–185 (1998)
Geyer, C.J., Thompson, E.A.: Annealing Markov chain Monte Carlo with applications to ancestral inference. J. Am. Stat. Assoc. 90, 909–920 (1995)
Grabisch, M., Marichal, J.L., Roubens, M.: Equivalent representations of set functions. Math. Oper. Res. 25, 157–178 (2000)
Green, P.J.: Reversible jump MCMC computation and Bayesian model determination. Biometrika 82, 711–732 (1995)
Grelaud, A., Robert, C., Marin, J.M., Rodolphe, F., Taly, J.F.: ABC likelihood-free methods for model choice in Gibbs random fields. Bayesian Anal. 4, 317–336 (2009)
Gu, M.G., Zhu, H.T.: Maximum likelihood estimation for spatial models by Markov chain Monte Carlo stochastic approximation. J. R. Stat. Soc. Ser. B 63, 339–355 (2001)
Hammer, P.L., Holzman, R.: Approximations of pseudo-Boolean functions; applications to game theory. Methods Models Oper. Res. 36, 3–21 (1992)
Hammer, P.L., Rudeanu, S.: Boolean Methods in Operation Research and Related Areas. Springer, Berlin (1968)
Jerrum, M., Sinclair, A.: Polynomial-time approximation algorithms for the Ising model. SIAM J. Comput. 22, 1087–1116 (1993)
Künsch, H.R.: State space and hidden Markov models. In: Barndorff-Nielsen, O.E., Cox, D.R., Klppelberg, C. (eds.) Complex Stochastic Systems. Chapman & Hall/CRC (2001)
Liang, F.: A double Metropolis-Hastings sampler for spatial models with intractable normalizing constants. J. Stat. Comput. Simul. 80, 1007–1022 (2010)
Liang, F., Liu, C., Carroll, R.: Advanced Markov Chain Monte Carlo Methods: Learning from Past Samples. Wiley, New York (2011)
Lyne, A.M., Girolami, M., Atchadé, Y., Strathmann, H., Simplson, D.: On Russian roulette estimates for Bayesian inference with doubly-intractable likelihoods. Stat. Sci. 30, 443–467 (2015)
Marin, J.M., Mengersen, K., Robert, C.P.: Bayesian modelling and inference on mixtures of distributions. In: Dey, D.K., Rao, C.R. (eds.) Essential Bayesian Models, pp. 253–300. North-Holland, Amsterdam (2011)
Møller, J., Pettitt, A., Reeves, R., Berthelsen, K.: An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants. Biometrika 93, 451–458 (2006)
Murray, I., Ghahramani, Z., MacKay, D.: Mcmc for doubly-intractable distributions. In: Proceedings of the Twenty-Second Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-06), AUAI Press, Arlington, Virginia, pp. 359–366 (2006)
Propp, J.G., Wilson, D.B.: Exact sampling with coupled Markov chains and applications to statistical mechanics. Random Stuct. Algorithms 9, 223–252 (1996)
Reeves, R., Pettitt, A.N.: Efficient recursions for general factorisable models. Biometrika 91, 751–757 (2004)
Riggan, W.B., Creason, J.P., Nelson, W.C., Manton, K.G., Woodbury, M.A., Stallard, E., Pellom, A.C., Beaubier, J.: U.S. Cancer Mortality Rates and Trends, 1950–1979, vol. IV (U.S. Goverment Printing Office, Washington, DC: Maps, U.S. Environmental Protection Agency) (1987)
Sherman, M., Apanasovich, T.V., Carroll, R.J.: On estimation in binary autologistic spatial models. J. Stat. Comput. Simul. 76, 167–179 (2006)
Tjelmeland, H., Austad, H.: Exact and approximate recursive calculations for binary Markov random fields defined on graphs. J. Comput. Graphical Stat. 21, 758–780 (2012)
Viterbi, A.J.: Error bounds for convolutional codes and an asymptotic optimum decoding algorithm. IEEE Trans. Inf. Theory 13, 260–269 (1967)
Walker, S.: Posterior sampling when the normalising constant is unknown. Commun. Stat. Simul. Comput. 40, 784–792 (2011)
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix: Proof of Theorem 3
Expanding \(\hbox {SSE}(f,\tilde{\tilde{f}}) = \sum _{x \in {\varOmega }} \left\{ f(x)-\tilde{\tilde{f}}(x) \right\} ^2\) we get,
To prove the theorem it is thereby sufficient to show that,
First recall that we from (9) know that,
Also, since \(\tilde{\tilde{S}} \subseteq \tilde{S}\),
We study the first term, \(\sum _{x \in {\varOmega }}\{f(x)-\tilde{f}(x)\}\tilde{f}(x)\), expand the expression for \(\tilde{f}(x)\) outside the parenthesis and change the order of summation,
where the last transition follows from (42). Using (43) we can correspondingly show that \(\sum _{x \in {\varOmega }} \{f(x)-\tilde{f}(x)\}\tilde{\tilde{f}}(x) = 0\).
Proof of Theorem 4
We study the error sum of squares,
where the second sum is always zero by (43). Since \(\tilde{S} \subseteq S\), the first sum can be further split into two parts,
where once again the first sum is zero.
Proof of Theorem 5
From Theorem 1 it follows that it is sufficient to consider a function f(x) with non-zero interactions \(\beta ^{\varLambda }\) only for \({\varLambda }\in S_{\{ i,j\}}\), since we only need to focus on the interactions we want to remove. Thus we have
and we need to show that then
We start by defining the sets
and note that these sets are disjoint, and, since we have assumed S to be dense, \(R_{\varLambda }\subseteq \widetilde{S}\). Defining also the residue set
we may write the approximation error \(f(x)-\widetilde{f}(x)\) in the following form,
Defining
we have
Inserting this into (9) and switching the order of summation we get
for all \(\lambda \in \widetilde{S}\). We now proceed to show that this system of equations has a solution where \(\widetilde{\beta }^{\varLambda }= 0\) for \({\varLambda }\in T\) and \(\sum _{x\in {\varOmega }_{\lambda \cup ({\varLambda }\setminus \{ i,j\}}} \Delta f^{\varLambda }(x_i,x_j) = 0\) for each \({\varLambda }\in S_{\{ i,j\}}\). Obviously for each \({\varLambda }\) the function \(\Delta f^{\varLambda }(x_i,x_j)\) has only our possible values, namely \(\Delta f^{\varLambda }(0,0)\), \(\Delta f^{\varLambda }(1,0)\), \(\Delta ^{\varLambda }(0,1)\) and \(\Delta f^{\varLambda }(1,1)\). Thus the sum \(\sum _{x\in {\varOmega }_{\lambda \cup ({\varLambda }\setminus \{ i,j\}}} \Delta f^{\varLambda }(x_i,x_j)\) is simply given as a sum over these four values multiplied by the number of times they occur. Consider first the case where \(\lambda \), and thereby also \(\lambda \cup ({\varLambda }\setminus \{ i,j\})\) does not contain i or j. Then the four values \(\Delta f^{\varLambda }(0,0)\), \(\Delta f^{\varLambda }(1,0)\), \(\Delta ^{\varLambda }(0,1)\) and \(\Delta f^{\varLambda }(1,1)\) will occur the same number of times, so
Next consider the case when \(\lambda \), and thereby also \(\lambda \cup ({\varLambda }\setminus \{ i,j\})\), contains i, but not j. Then \(x_i=1\) in all terms in the sum, so the values \(\Delta f^{\varLambda }(0,0)\) and \(\Delta f^{\varLambda }(0,1)\) will not occur, whereas the values \(\Delta f^{\varLambda }(1,0)\) and \(\Delta f^{\varLambda }(1,1)\) will occur the same number of times. Thus,
When \(\lambda \) contains j, but not i we correspondingly get
The final case, that \(\lambda \) contains both i and j, will never occur since \(\lambda \in \widetilde{S}\) and all interaction involving both i and j have been removed from \(\widetilde{S}\). We can now reach the conclusion that if we can find a solution for
for all \({\varLambda }\in S_{\{ i,j\}}\) we also have a solution for (46) as discussed above. Using our expression for \(\Delta f^{\varLambda }(x_i,x_j)\), the above three equations become
Since the sets \(R_{\varLambda }\) are disjoint, the three equations above can be solved separately for each \({\varLambda }\), and the solution is \(\widetilde{\beta }^{{\varLambda }\setminus \{ i,j\}} = -\frac{1}{4}\beta ^{\varLambda }\) and \(\widetilde{\beta }^{{\varLambda }\setminus \{ i\}}=\widetilde{\beta }^{{\varLambda }\setminus \{ j\}}= \frac{1}{2}\beta ^{\varLambda }\). Together with \(\widetilde{\beta }^{\varLambda }=0\) for \({\varLambda }\in T\) this is equivalent to (13) in the theorem. Inserting the values we have found for \(\widetilde{\beta }^{\varLambda }\) in (45) we get
Inserting this into the above expression for \(f(x)-\widetilde{f}(x)\), and using that we know \(\widetilde{\beta }^{\varLambda }=0\) for \({\varLambda }\in T\) we get (14) given in the theorem.
Rights and permissions
About this article
Cite this article
Austad, H.M., Tjelmeland, H. Approximate computations for binary Markov random fields and their use in Bayesian models. Stat Comput 27, 1271–1292 (2017). https://doi.org/10.1007/s11222-016-9685-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-016-9685-7