Regularized Gaussian belief propagation

Kamper, Francois; du Preez, Johan A.; Steel, Sarel J.; Wagner, Stephan

doi:10.1007/s11222-017-9753-7

Regularized Gaussian belief propagation

Published: 12 May 2017

Volume 28, pages 653–672, (2018)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Francois Kamper ORCID: orcid.org/0000-0001-9460-5219¹,
Johan A. du Preez²,
Sarel J. Steel¹ &
…
Stephan Wagner³

716 Accesses
Explore all metrics

Abstract

Belief propagation (BP) has been applied in a variety of inference problems as an approximation tool. BP does not necessarily converge in loopy graphs, and even if it does, is not guaranteed to provide exact inference. Even so, BP is useful in many applications due to its computational tractability. In this article, we investigate a regularized BP scheme by focusing on loopy Markov graphs (MGs) induced by a multivariate Gaussian distribution in canonical form. There is a rich literature surrounding BP on Gaussian MGs (labelled Gaussian belief propagation or GaBP), and this is known to experience the same problems as general BP on graphs. GaBP is known to provide the correct marginal means if it converges (this is not guaranteed), but it does not provide the exact marginal precisions. We show that our adjusted BP will always converge, with sufficient tuning, while maintaining the exact marginal means. As a further contribution we show, in an empirical study, that our GaBP variant can accelerate GaBP and compares well with other GaBP-type competitors in terms of convergence speed and accuracy of approximate marginal precisions. These improvements suggest that the principle of regularized BP should be investigated in other inference problems. The selection of the degree of regularization is addressed through the use of two heuristics. A by-product of GaBP is that it can be used to solve linear systems of equations; the same is true for our variant and we make an empirical comparison with the conjugate gradient method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving probabilistic inference in graphical models with determinism and cycles

Article 22 July 2016

Message Scheduling Methods for Belief Propagation

Convergence and correctness of belief propagation for the Chinese postman problem

Article 07 February 2019

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Artificial Intelligence

References

Aji, S., McEliece, R.: The generalized distributive law. IEEE Trans. Inform. Theory 46, 325–343 (2000)
Article MathSciNet MATH Google Scholar
Bach, F., Jenatton, R., Mairal, J., Obozinski, G.: Convex optimization with sparsity-inducing norms. In: Sra, S., Nowozin, S., Wright, J. (eds.) Optimization for Machine Learning. MIT Press, Cambridge (2011)
Google Scholar
Bickson, D.: Gaussian Belief Propagation: Theory and Application, PhD thesis. The Hebrew University of Jerusalem (2008)
Chandrasekaran, V., Johnson, J.K., Willsky, A.S.: Estimation in Gaussian graphical models using tractable subgraphs: a walk-sum analysis. IEEE Trans. Signal Process. 56, 1916–1930 (2008)
Article MathSciNet Google Scholar
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Annal. Stat. 32(2), 407–499 (2004)
Article MathSciNet MATH Google Scholar
El-Kurdi, Y., Giannacopoulos, D., Gross, W.J.: Relaxed Gaussian belief propagation. In: Proceedings of the 2012 IEEE International Symposium on Information Theory (2012a)
El-Kurdi, Y., Gross, W.J., Giannacopoulos, D.: Efficient implementation of Gaussian belief propagation solver for large sparse diagonally dominant linear systems. IEEE Trans. Magn. 48, 471–474 (2012b)
Article Google Scholar
Frey, B., Kschischang, F.: Probability propagation and iterative decoding. In: Proceedings of the 34th Annual Allerton Conference on Communication, Control, and Computing, Allerton House, Monticello (1996)
Gallager, R.G.: Low-Density Parity-Check Codes. MIT Press, Cambridge (1963)
MATH Google Scholar
Guo, Q., Huang, D.: EM-based joint channel estimation and detection for frequency selective channels using Gaussian message passing. IEEE Trans. Signal Process. 59, 4030–4035 (2011)
Article MathSciNet Google Scholar
Guo, Q., Li, P.: LMMSE turbo equalization based on factor graphs. IEEE J. Sel. Areas Commun. 26, 311–319 (2008)
Article Google Scholar
Johnson, J.K., Bickson, D., Dolev, D.: Fixing convergence of Gaussian belief propagation. In: International Symposium on Information Theory (ISIT), Seoul (2009)
Koller, D., Friedman, N.: Probabilistic Graphical Models Principles and Techniques. MIT Press, Cambridge (2009)
MATH Google Scholar
Lauritzen, S,, Spiegelhalter, D. Local computations with probabilities on graphical structures and their application to expert systems. J. R. Stat. Soc. B 50,157–224 (1988)
Liu, Y., Chandrasekaran, V., Anandkumar, A., Willsky, A.S.: Feedback message passing for inference in Gaussian graphical models. IEEE Trans. Signal Process. 60(8), 4135–4150 (2012)
Article MathSciNet Google Scholar
Malioutov, D.M., Johnson, J.K., Willsky, A.S.: Walk-sums and belief propagation in gaussian graphical models. J. Mach. Learn. Res. 7, 2031–2064 (2006)
MathSciNet MATH Google Scholar
Montanari, A., Prabhakar, B., Tse, D.: Belief propagation based multi-user detection. In: IEEE Information Theory Workshop, Punta del Este, Uruguay (2006)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Francisco (1988)
MATH Google Scholar
Seeger, M.W., Wipf, D.P.: Variational Bayesian inference techniques. IEEE Signal Process. Mag. 27, 81–91 (2010)
Article Google Scholar
Shachter, R.: Probabilistic inference and influence diagrams. Oper. Res. 36, 589–605 (1988)
Article MATH Google Scholar
Shafer, G., Shenoy, P.: Probability propagation. Ann. Mat. Art. Intell. 2, 327–352 (1990)
Shental, O., Siegel, P.H., Wolf. J.K., Bickson, D., Dolev, D.: Gaussian belief propagation solver for systems of linear equations. In: IEEE International Symposium on Informational Theory (ISIT), pp 1863–1867 (2008)
Shewchuk, J.R.: An Introduction to the Conjugate Gradient Method Without the Agonizing Pain. School of Computer Science. Carnegie Mellon University, Pittsburgh, pp. 15213 (1994)
Su, Q., Wu, Y.: On convergence conditions of Gaussian belief propagation. IEEE Int. Trans. Signal Process. 63, 1144–1155 (2015)
Article MathSciNet Google Scholar
Weiss, Y.: Correctness of local probability in graphical models with loops. Neural Comput. 12, 1–41 (2000)
Article MathSciNet Google Scholar
Weiss, Y., Freeman, W.T.: Correctness of belief propagation in Gaussian graphical models of arbitrary topology. Neural Comput. 13(10), 2173–2200 (2001)
Article MATH Google Scholar

Download references

Acknowledgements

The financial assistance of the National Research Foundation (NRF) towards this research is hereby acknowledged. Opinions expressed and conclusions arrived at are those of the author and are not necessarily to be attributed to the NRF.

Author information

Authors and Affiliations

Department of Statistics and Actuarial Science, University of Stellenbosch, Stellenbosch, South Africa
Francois Kamper & Sarel J. Steel
Department of Electrical and Electronic Engineering, University of Stellenbosch, Stellenbosch, South Africa
Johan A. du Preez
Department of Mathematical Sciences, University of Stellenbosch, Stellenbosch, South Africa
Stephan Wagner

Authors

Francois Kamper
View author publications
You can also search for this author inPubMed Google Scholar
Johan A. du Preez
View author publications
You can also search for this author inPubMed Google Scholar
Sarel J. Steel
View author publications
You can also search for this author inPubMed Google Scholar
Stephan Wagner
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Francois Kamper.

Appendices

Appendix 1: Proofs

1.1 Proof of Theorem 1

Proof

The proof is contained in the following list.

1.
From 1, all the precision components are negative at stage n, hence $Q_{ij}^{(n+1)} = \frac{-S_{ij}^{2}}{1 + \lambda + \sum _{t \in {\mathcal {N}}_{i}/j}Q_{ti}^{(n)}} = \frac{-S_{ij}^{2}}{1 + \lambda - \sum _{t \in {\mathcal {N}}_{i}/j}|Q_{ti}^{(n)}|}$. From 3 we have that $\sum _{t \in {\mathcal {N}}_{i}/j}|Q_{ti}^{(n)}| \le \sum _{t \in {\mathcal {N}}_{i}}|Q_{ti}^{(n)}|< \delta _{i}^{(n)} < 1 + \lambda $ and $1 + \lambda - \sum _{t \in {\mathcal {N}}_{i}/j}|Q_{ti}^{(n)}| > 0$ from which 1 follows for iteration $n+1$.
2.
$|Q_{ij}^{(n+1)}| = \frac{S_{ij}^{2}}{1 + \lambda - \sum _{t \in {\mathcal {N}}_{i}/j}|Q_{ti}^{(n)}|} \ge \frac{S_{ij}^{2}}{1 + \lambda - \sum _{t \in {\mathcal {N}}_{i}/j}|Q_{ti}^{(n-1)}|} = |Q_{ij}^{(n)}|$ since $|Q_{ti}^{(n)}| > |Q_{ti}^{(n-1)}|, t \in {\mathcal {N}}_{i}$ from 2 for iteration n and hence 2 is also true for $n+1$.
3.
$\delta _{i}^{(n+1)} = \sum _{t \in {\mathcal {N}}_{i}} |Q_{ti}^{(n+1)}| = \sum _{t \in {\mathcal {N}}_{i}} \frac{S^{2}_{ti}}{1 + \lambda - \delta _{t}^{(n)} + |Q_{it}^{(n)}|} \le \sum _{t \in {\mathcal {N}}_{i}} \frac{S^{2}_{ti}}{1 + \lambda - \delta _{t} + |Q_{it}^{(n)}|} \le \delta _{i} < 1 + \lambda $ by 4 and therefore 3 is true for $n+1$.
4.
From the above we have, $\sum _{t \in {\mathcal {N}}_{i}} \frac{S^{2}_{ti}}{1 + \lambda - \delta _{t} + |Q_{it}^{(n+1)}|} $ $\le \sum _{t \in {\mathcal {N}}_{i}} \frac{S^{2}_{ti}}{1 + \lambda - \delta _{t} + |Q_{it}^{(n)}|} \le \delta _{i}$, hence 4 holds for $n+1$.

$\square $

1.2 Proof of Theorem 2

Let ${\mathbf {S}}$ be a symmetric, positive definite matrix with diagonal entries equal to 1, and let its entries be denoted by $S_{ij}$. Values $Q_{ij}(\lambda )$ are characterized by the system

$$\begin{aligned} Q_{ij} = Q_{ij}(\lambda ) = - \frac{S_{ij}^2}{1+\lambda +\sum _{t \in {\mathcal {N}}_{i}/j} Q_{ti}(\lambda )},\qquad 1\le i,j \in {\mathcal {N}}_{i}. \end{aligned}$$

We are particularly interested in the behaviour as $\lambda \rightarrow \infty $. A consequence of Theorem 1 is that $\lim _{\lambda \rightarrow \infty } Q_{ij}(\lambda ) = 0$. For convenience, set $\delta = \lambda ^{-1}$, so that $\delta \rightarrow 0$. The system can be rewritten as

$$\begin{aligned} Q_{ij} \Big (-\delta \sum _{t \in {\mathcal {N}}_{i}/j} Q_{ti} - 1 - \delta \Big ) - \delta S_{ij}^2 = 0,\qquad 1\le i,j \in {\mathcal {N}}_{i}. \end{aligned}$$

Note that

$$\begin{aligned}&\frac{\partial }{\partial Q_{kl}} \bigg ( Q_{ij} \Big (-\delta \sum _{t \in {\mathcal {N}}_{i}/j} Q_{ti} - 1 - \delta \Big ) - \delta S_{ij}^2 \bigg ) \nonumber \\&= {\left\{ \begin{array}{ll} -\delta \sum _{t \ne i,j} Q_{ti} - 1 - \delta &{} (k,l) = (i,j), \\ -\delta Q_{ij} &{} l = i, k \in {\mathcal {N}}_{i}/j, \\ 0 &{} \text {otherwise.}\end{array}\right. } \end{aligned}$$

As $\delta \rightarrow 0$, we see that the Jacobian of the system tends to a negative identity matrix, so in particular it is invertible. This means that the $Q_{ij}$ are analytic functions of $\delta $ if $\delta $ is in a suitable neighbourhood of 0. Consequently, the $Q_{ij}$ have power series expansions in $\delta $:

$$\begin{aligned} Q_{ij} = a_{ij}\delta + b_{ij} \delta ^2 + \cdots \end{aligned}$$

Plugging this back into the system, we see that in fact $a_{ij} = -S_{ij}^2$, so we have

$$\begin{aligned} C_{ij} = \frac{Q_{ij}}{S_{ij}} = - S_{ij}\delta + {\mathcal {O}}(\delta ^2). \end{aligned}$$

Consider again the matrix ${\mathbf {L}}$ given in (20). Let $l = p^{2} - p$, we now define a $l \times p$ matrix ${\mathbf {G}} = \begin{bmatrix} {\mathbf {g}}_{1}&{\mathbf {g}}_{2}&\ldots&{\mathbf {g}}_{p} \end{bmatrix} $. The vector ${\mathbf {g}}_{i}$ has entries 1 in positions $(p-1)(i-1)+1,\ldots ,(p-1)i$. It can be shown that

$$\begin{aligned} {\mathbf {L}}: p^{2} \times p^{2} = \begin{bmatrix} {\mathbf {L}}_{11}: l \times l&\frac{\lambda }{p-2}{\mathbf {L}}_{11} {\mathbf {G}}\\ \frac{1}{\lambda }{\mathbf {G}}'{\mathbf {L}}_{22}&{\mathbf {L}}_{22}: p \times p \end{bmatrix}, \end{aligned}$$

(26)

with the understanding that $\lambda > 0$. Let,

$$\begin{aligned} {\mathbf {D}} = \begin{bmatrix} {\mathbf {I}}:l \times l&\mathbf {0}:l \times p \\ \mathbf {0}: p \times l&\lambda [{\mathbf {I}}:p\times p] \end{bmatrix}, \end{aligned}$$

(27)

and set $\tilde{{\mathbf {L}}} = {\mathbf {D}}{\mathbf {L}}{\mathbf {D}}^{-1}$. It is easy to see that,

$$\begin{aligned} \tilde{{\mathbf {L}}}: p^{2} \times p^{2} = \begin{bmatrix} {\mathbf {L}}_{11}: l \times l&\frac{1}{p-2}{\mathbf {L}}_{11} {\mathbf {G}}\\ {\mathbf {L}}_{22}{\mathbf {G}}'&{\mathbf {L}}_{22}: p \times p \end{bmatrix}, \end{aligned}$$

(28)

and that ${\mathbf {L}}$ and $\tilde{{\mathbf {L}}}$ will have the same eigenvalues. As a first step, we show that the eigenvalues of $\tilde{{\mathbf {L}}}$ are all clustered around 0 and 1 as $\delta \rightarrow 0$. We have already discussed the construction of ${\mathbf {L}}_{11}$ from the elements $C_{ij}$. Using the fact that $C_{ij} = -\delta S_{ij} + {\mathcal {O}}(\delta ^{2})$ we see that ${\mathbf {L}}_{11} = \delta {\mathbf {A}} + {\mathcal {O}}(\delta ^{2})$ where ${\mathbf {A}}$ does not depend on $\lambda $ and ${\mathcal {O}}(\delta ^{2})$ is of a suitable dimension each entry being ${\mathcal {O}}(\delta ^{2})$. The matrix ${\mathbf {A}}$ is constructed exactly as ${\mathbf {L}}_{11}$; however, $-S_{ij}$’s are used instead of $-C_{ij}$’s. As discussed, the matrix ${\mathbf {L}}_{22}$ is diagonal with entries $\frac{\lambda }{1 + \lambda + \sum _{t\ne i}Q_{ti}} = \frac{\lambda }{1 + \lambda + {\mathcal {O}}(\delta )} = 1 - \delta + {\mathcal {O}}(\delta ^{2})$ and therefore ${\mathbf {L}}_{22} = {\mathbf {I}} - \delta {\mathbf {I}} + {\mathcal {O}}(\delta ^{2})$. We now consider the matrix,

$$\begin{aligned} \tilde{{\mathbf {L}}} = \begin{bmatrix} \delta {\mathbf {A}}&\frac{\delta }{p-2}{\mathbf {A}}{\mathbf {G}} \\ (1-\delta ){\mathbf {G}}'&(1-\delta ){\mathbf {I}} \end{bmatrix} + {\mathcal {O}}(\delta ^{2}), \end{aligned}$$

(29)

and the following Lemma.

Lemma 1

Let ${\mathbf {M}}$ be a square matrix, and let c be a positive constant that satisfies $c > \Vert {\mathbf {M}}\Vert _{\infty }$ ($\Vert {\mathbf {M}}\Vert _{\infty }$ is the $\infty $-norm of ${\mathbf {M}}$, which can be obtained by calculating the row sums of the absolute values of entries in ${\mathbf {M}}$ and taking the maximum of these sums). For every x with $|x| \ge c$, the matrices $x{\mathbf {I}} - {\mathbf {M}}$ and ${\mathbf {I}} - \frac{1}{x}{\mathbf {M}}$ are invertible, and the entries of $({\mathbf {I}} - \frac{1}{x} {\mathbf {M}})^{-1}$ are bounded by constants that only depend on c and ${\mathbf {M}}$.

Proof

The invertibility follows directly from the fact that the matrix $x {\mathbf {I}} - {\mathbf {M}}$ is strictly diagonally dominant by our assumptions. For the second statement, let $|{\mathbf {M}}|$ be obtained from ${\mathbf {M}}$ by replacing all entries by their absolute values. Note that $|{\mathbf {M}}|$ has the same $\infty $-norm as ${\mathbf {M}}$. Clearly, the entries of

$$\begin{aligned} \Big ({\mathbf {I}} - \frac{1}{x} {\mathbf {M}}\Big )^{-1} = \sum _{j=0}^{\infty } x^{-j} {\mathbf {M}}^j \end{aligned}$$

are bounded by the entries of

$$\begin{aligned} \Big ({\mathbf {I}} - \frac{1}{c} |{\mathbf {M}}|\Big )^{-1} = \sum _{j=0}^{\infty } c^{-j} |{\mathbf {M}}|^j, \end{aligned}$$

which readily proves the desired statement. $\square $

Lemma 2

There exists a constant $K > 0$ such that for sufficiently small $\delta $, each eigenvalue x of $\tilde{{\mathbf {L}}}$ either satisfies $|x| \le K\delta $ or $|x-1| \le K\delta $.

Proof

We reason by contradiction and assume that there is an eigenvalue for which $|x| > K\delta $ and $|x-1| > K\delta $. Consider first $||{\mathbf {L}}_{11}||_{\infty }= ||\delta {\mathbf {A}} + {\mathcal {O}}(\delta ^{2})||_{\infty } \le \delta ||{\mathbf {A}}||_{\infty } + {\mathcal {O}}(\delta ^{2})$. If we choose K large enough (e.g. $K \ge \Vert {\mathbf {A}}\Vert _{\infty } + 1$), then the matrix $x{\mathbf {I}} - {\mathbf {L}}_{11} = x {\mathbf {I}} - \delta {\mathbf {A}} + {\mathcal {O}}(\delta ^2)$ is invertible by the previous lemma for sufficiently small $\delta $, and the entries of $({\mathbf {I}} - \frac{1}{x} {\mathbf {L}}_{11})^{-1}$ are bounded by absolute constants. Now, we use the Schur complement on 29:

$$\begin{aligned}&\det (x {\mathbf {I}} - \tilde{{\mathbf {L}}}) = \det (x {\mathbf {I}} - {\mathbf {L}}_{11}) \nonumber \\&\quad \times \det \big ( x {\mathbf {I}} - {\mathbf {L}}_{22} - \frac{1}{p-2}{\mathbf {L}}_{22}{\mathbf {G}}' (x {\mathbf {I}} - {\mathbf {L}}_{11})^{-1}{\mathbf {L}}_{11}{\mathbf {G}} \big ). \end{aligned}$$

(30)

It remains to show that the second determinant is not equal to 0. We rewrite the matrix as follows:

$$\begin{aligned}&x {\mathbf {I}} - {\mathbf {L}}_{22} - \frac{1}{p-2}{\mathbf {L}}_{22}{\mathbf {G}}' (x {\mathbf {I}} - {\mathbf {L}}_{11})^{-1}{\mathbf {L}}_{11}{\mathbf {G}} \nonumber \\&\quad = (x-1) {\mathbf {I}} - ({\mathbf {L}}_{22} - {\mathbf {I}})\nonumber \\&\qquad \,\,- \frac{1}{x(p-2)}{\mathbf {L}}_{22}{\mathbf {G}}' \Big ({\mathbf {I}} -\qquad \frac{1}{x} {\mathbf {L}}_{11}\Big )^{-1}{\mathbf {L}}_{11}{\mathbf {G}} \nonumber \\&\quad = (x-1) {\mathbf {I}} + {\mathbf {H}}_{1} + \frac{1}{x}{\mathbf {H}}_{2}. \end{aligned}$$

(31)

Consider $({\mathbf {L}}_{22} - {\mathbf {I}}) = -\delta {\mathbf {I}} + {\mathcal {O}}(\delta ^{2})$ and $||{\mathbf {L}}_{22} - {\mathbf {I}}||_{\infty } = ||-\delta {\mathbf {I}} + {\mathcal {O}}(\delta ^{2})||_{\infty } \le \delta || {\mathbf {I}} ||_{\infty } + ||{\mathcal {O}}(\delta ^{2})||_{\infty } = \delta + {\mathcal {O}}(\delta ^{2})$. Therefore, $||{\mathbf {H}}_{1}||_{\infty } \le \kappa _{1}\delta $ for a constant $\kappa _{1}$ and sufficiently small $\delta $. The entries of $({\mathbf {I}} - \frac{1}{x} {\mathbf {L}}_{11})^{-1}$ are bounded by $({\mathbf {I}} - \frac{1}{K\delta } |{\mathbf {L}}_{11}|)^{-1} = {\mathbf {I}} + \frac{|{\mathbf {L}}_{11}|}{K \delta } + \sum _{j=2}^{\infty }\frac{|{\mathbf {L}}_{11}|^{j}}{K \delta }$ for sufficiently small $\delta $ by Lemma 1. Since ${\mathbf {L}}_{11} = \delta {\mathbf {A}} + {\mathcal {O}}(\delta ^{2})$ we have that $({\mathbf {I}} - \frac{1}{x} {\mathbf {L}}_{11})^{-1} = {\mathbf {I}} + \frac{|{\mathbf {A}}|}{K} + {\mathcal {O}}(\delta ) = {\mathcal {O}}(1)$. Furthermore, ${\mathbf {L}}_{22} = {\mathcal {O}}(1)$ and ${\mathbf {L}}_{11} = {\mathcal {O}}(\delta )$ from which we have that ${\mathbf {H}}_{2} = {\mathcal {O}}(\delta ) + {\mathcal {O}}(\delta ^{2})$ and $||{\mathbf {H}}_{2}||_{\infty } \le \kappa _{2}\delta $ for a constant $\kappa _{2}$ and sufficiently small $\delta $. If $|x| \ge \frac{1}{2}$, we find that

$$\begin{aligned} \Big \Vert {\mathbf {H}}_{1} + {\mathbf {H}}_{2}\Big \Vert _{\infty } \le \kappa _{1}\delta + \frac{\kappa _{2} \delta }{|x|} \le (\kappa _{1} + 2\kappa _{2})\delta < K \delta \le |x-1|, \end{aligned}$$

if K is chosen large enough (greater than $\kappa _{1} + 2\kappa _{2}$). If $|x| \le \frac{1}{2}$, we get

$$\begin{aligned} \Big \Vert {\mathbf {H}}_{1} + {\mathbf {H}}_{2} \Big \Vert _{\infty } \le \kappa _{1}\delta + \frac{\kappa _{2}\delta }{|x|} \le \kappa _{1} \delta + \frac{\kappa _{2}}{K} < \frac{1}{2} \le |x-1| \end{aligned}$$

if K is chosen large enough and $\delta $ is sufficiently small. In either case, we can apply the previous lemma to see that the matrix in (31) is in fact invertible. $\square $

Now, we focus on the eigenvalues that are close to 1, setting $x = 1 - \delta t$ for some t with $|t| \le K$. Returning to (30), we observe that $x {\mathbf {I}} - {\mathbf {L}}_{11}$ is invertible for sufficiently small $\delta $, again by Lemma 1. Hence, we consider the second matrix:

$$\begin{aligned}&x {\mathbf {I}} - {\mathbf {L}}_{22} - \frac{1}{p-2}{\mathbf {L}}_{22}{\mathbf {G}}' (x {\mathbf {I}} - {\mathbf {L}}_{11})^{-1}{\mathbf {L}}_{11}{\mathbf {G}} \\&\quad = (1-t)\delta {\mathbf {I}} - \frac{\delta }{p-2} {\mathbf {G}}'{\mathbf {A}}{\mathbf {G}} + {\mathcal {O}}(\delta ^2). \end{aligned}$$

The entries of the matrix hidden by the ${\mathcal {O}}(\delta ^2)$ term are in fact analytic in $\delta $ and t, since we proved earlier that the entries of $\tilde{{\mathbf {L}}}$ are analytic functions. We can take out a factor $\delta $ to be left with the equation

$$\begin{aligned} \det \big ( (1-t) {\mathbf {I}} - \frac{1}{p-2} {\mathbf {G}}'{\mathbf {A}}{\mathbf {G}} + {\mathbf {M}} \big ) = 0, \end{aligned}$$

(32)

where the matrix ${\mathbf {M}}$ has entries that are analytic functions of $\delta $ and t (if $\delta $ is restricted to a sufficiently small neighbourhood of 0 and $|t| \le K$). Moreover, ${\mathbf {M}} = {\mathcal {O}}(\delta )$. As $\delta \rightarrow 0$, we obtain (up to a change of variable $1-t = u$) the characteristic equation of the matrix $\frac{1}{p-2} {\mathbf {G}}'{\mathbf {A}}{\mathbf {G}}$ (we will show later that this matrix is in fact equal to ${\mathbf {I}} - {\mathbf {S}}$). Its p solutions (counted with multiplicity) give rise to p branches $t_1(\delta ),t_2(\delta ),\ldots ,t_p(\delta )$ that solve the implicit equation (32). In the same way, we can treat the “small” eigenvalues that are close to 0. We set $x = \delta t$ for some t with $|t| \le K$, and use the Schur complement with respect to the other diagonal block:

$$\begin{aligned} \det (x {\mathbf {I}} - \tilde{{\mathbf {L}}})&= \det (x {\mathbf {I}} - {\mathbf {L}}_{22}) \nonumber \\&\quad \times \det \big ( x {\mathbf {I}}- {\mathbf {L}}_{11} - \frac{1}{p-2}{\mathbf {L}}_{11}{\mathbf {G}} (x {\mathbf {I}} - {\mathbf {L}}_{22})^{-1}{\mathbf {L}}_{22}{\mathbf {G}}' \big ). \end{aligned}$$

(33)

Since

$$\begin{aligned} x {\mathbf {I}} - {\mathbf {L}}_{22} = (x-1+\delta ){\mathbf {I}} + {\mathcal {O}}(\delta ^2) = - {\mathbf {I}} + {\mathcal {O}}(\delta ), \end{aligned}$$

this matrix is invertible for sufficiently small $\delta $, again by Lemma 1. Moreover, we have

$$\begin{aligned}&x {\mathbf {I}} - {\mathbf {L}}_{11} - {\mathbf {L}}_{11}{\mathbf {G}} (x {\mathbf {I}} - {\mathbf {L}}_{22})^{-1}{\mathbf {L}}_{22}{\mathbf {G}}' \\&\quad =\delta (t {\mathbf {I}} - {\mathbf {A}} + \frac{1}{p-2}\mathbf {AGG'}) + {\mathcal {O}}(\delta ^2), \end{aligned}$$

so we can repeat the argument for the “large” eigenvalues. We obtain $p^2-p$ branches $\bar{t}_1(\delta ),\bar{t}_2(\delta ),\ldots ,\bar{t}_{p^2-p}(\delta )$ that correspond to the eigenvalues of ${\mathbf {A}} - \frac{1}{p-2}{\mathbf {AGG}}'$.

Returning to the large eigenvalues, we consider the product ${\mathbf {G}}'{\mathbf {A}}{\mathbf {G}}$. The matrix ${\mathbf {A}}$ is constructed by taking the first l rows and columns of ${\mathbf {L}}$ and replacing the $C_{ij}$ elements with $-S_{ij}$. The rows, $(p-1)(j-1)+1,\ldots ,(p-1)j$, correspond to messages received by node j (in order) and hence ${\mathbf {g}}_{j}$ contains ones at the rows corresponding to messages received by node j and zeros otherwise. Consider a row corresponding to a message from node i to node j which requires communication from other nodes (excluding j) to node i, this row will therefore contain $-S_{ij}$ where ${\mathbf {g}}_{i}$ is equal to 1, except the element corresponding to the message from j to i. Now, ${\mathbf {A}}{\mathbf {g}}_{i}$ will be equal to $-(p-2)S_{ij}$ in the rows corresponding to the message from i to j and zero otherwise. The vector ${\mathbf {g}}_{j}$ contains references to rows corresponding to messages received by node j and since there is only one message from i to j the nonzero elements of ${\mathbf {g}}_{j}$ will overlap with the nonzero elements of ${\mathbf {A}}{\mathbf {g}}_{i}$ at one element and hence ${\mathbf {g}}_{j}'{\mathbf {A}}{\mathbf {g}}_{i} = -(p-2)S_{ij}$ for $j \ne i$. Furthermore, since there is no message from node i to node i we have that ${\mathbf {g}}_{i}'{\mathbf {A}}{\mathbf {g}}_{i} = 0$. We see that $\frac{1}{p-2}{\mathbf {G}}'{\mathbf {AG}} = {\mathbf {I}} - {\mathbf {S}}$. Equation (32) becomes

$$\begin{aligned} \det \big ( {\mathbf {S}} - t {\mathbf {I}} + {\mathbf {H}} \big ) = 0. \end{aligned}$$

Since ${\mathbf {S}}$ is symmetric, it is diagonalizable. There exists an orthogonal matrix ${\mathbf {U}}$ such that ${\mathbf {U}}^{-1} {\mathbf {S}} {\mathbf {U}} = {\mathbf {D}}$ is a diagonal matrix. We have

$$\begin{aligned} \det \big ( {\mathbf {S}} - t {\mathbf {I}} + {\mathbf {H}} \big )&= \det \big ( {\mathbf {U}}^{-1} ({\mathbf {S}} - t {\mathbf {I}} + {\mathbf {H}}) {\mathbf {U}} \big ) \\&= \det \big ( {\mathbf {D}} - t {\mathbf {I}} + {\mathbf {U}}^{-1} {\mathbf {H}} {\mathbf {U}} \big ). \end{aligned}$$

Recall that ${\mathbf {H}} = {\mathcal {O}}(\delta )$, uniformly in t (for $|t| \le K$), so we also have ${\mathbf {U}}^{-1} {\mathbf {H}} {\mathbf {U}} = {\mathcal {O}}(\delta )$. Let $\kappa $ be a constant such that $\Vert {\mathbf {U}}^{-1} {\mathbf {H}} {\mathbf {U}}\Vert _{\infty } \le \kappa \delta $ (for sufficiently small $\delta $ and $|t| \le K$). If

$$\begin{aligned} \det \big ( {\mathbf {D}} - t {\mathbf {I}} + {\mathbf {U}}^{-1} {\mathbf {H}} {\mathbf {U}} \big ) = 0, \end{aligned}$$

then we must have $|t - d_{ii}| \le \kappa \delta $ for one of the diagonal entries $d_{ii}$ of ${\mathbf {D}}$, for otherwise the matrix ${\mathbf {D}} - t {\mathbf {I}} + {\mathbf {U}}^{-1} {\mathbf {H}} {\mathbf {U}}$ will be strictly diagonally dominant and thus invertible. The diagonal entries of ${\mathbf {D}}$ are the eigenvalues $\sigma _1,\sigma _2,\ldots ,\sigma _p$ of ${\mathbf {S}}$, so it follows that $t = \sigma _i + {\mathcal {O}}(\delta )$.

We can deal with the small eigenvalues in the same way, it only remains to determine the entries of ${\mathbf {A}} - \frac{1}{p-2}{\mathbf {AG}}'{\mathbf {G}}$ (thereby verifying that this matrix is also symmetric and thus diagonalizable). It is easy to verify that ${\mathbf {G}}'{\mathbf {G}}$ is a block diagonal matrix where the blocks are of dimension $(p-1)\times (p-1)$ with all entries equal to one, in fact ${\mathbf {G}}'{\mathbf {G}}= [{\mathbf {B}}_{1},{\mathbf {B}}_{2},\ldots ,{\mathbf {B}}_{p}]$ where ${\mathbf {B}}_{i}$ is ${\mathbf {g}}_{i}$ appended $p-1$ times as columns. Consider a row in ${\mathbf {A}}$ corresponding to a message from i to j, say ${\mathbf {a}}_{ij}'$, we have already verified that this row contains $-S_{ij}$ where ${\mathbf {g}}_{i}$ is one, except for the 1 corresponding to the message from j to i. Now, ${\mathbf {a}}_{ij}'{\mathbf {G}}'{\mathbf {G}}$ will contain nonzero elements in ${\mathbf {a}}_{ij}'{\mathbf {B}}_{i}$ and these will all equal $-(p-2)S_{ij}$. A row, ${\mathbf {b}}_{ij}$, of $\frac{1}{p-2}{\mathbf {AG}}'{\mathbf {G}}$ corresponding to a message from i to j will contain $-S_{ij}$ where ${\mathbf {g}}_{i}$ equals one (even for the message from j to i). Hence, ${\mathbf {a}}_{ij}$ and ${\mathbf {b}}_{ij}$ will be identical except for the element corresponding to the message from i to j, where ${\mathbf {a}}_{ij}$ is zero and ${\mathbf {b}}_{ij}$ is $-S_{ij}$. Hence, a row of ${\mathbf {A}} - \frac{1}{p-2}{\mathbf {AG}}'{\mathbf {G}}$ corresponding to a message from i to j will have one element (at the message from j to i) equal to $S_{ij}$ and the rest are zero. Furthermore, the row corresponding to the message from j to i will have $S_{ji} = S_{ij}$ as an element in the position corresponding to the message from i to j. Hence, ${\mathbf {A}} - \frac{1}{p-2}{\mathbf {AG}}'{\mathbf {G}}$ is symmetric.

In conclusion, the eigenvalues of $\tilde{{\mathbf {L}}}$ are

$1 - \sigma _i\delta + {\mathcal {O}}(\delta ^2)$, where $\sigma _1,\sigma _2,\ldots ,\sigma _p$ are the eigenvalues of ${\mathbf {S}}$, and
$\pm S_{ij} \delta + {\mathcal {O}}(\delta ^2)$, $1 \le i < j \le p$.

In particular, the largest eigenvalue of $\tilde{{\mathbf {L}}}$ is connected to the least eigenvalue $\sigma _{\text {min}}$ of ${\mathbf {S}}$ by

$$\begin{aligned} \max \{\mu \,:\, \mu \text { is an eigenvalue of } \tilde{{\mathbf {L}}}\} = 1 - \sigma _{\text {min}} \delta + {\mathcal {O}}(\delta ^2). \end{aligned}$$

Since ${\mathbf {S}}$ is a positive definite matrix, we know that $\sigma _{\text {min}} > 0$. It follows that

$$\begin{aligned} \max \{\mu \,:\, \mu \text { is an eigenvalue of } \tilde{{\mathbf {L}}}\} < 1 \end{aligned}$$

for sufficiently small $\delta $.

1.3 Proof of Theorem 3

Theorem 3

Under the assumption that sGaBP converges, with precision matrix ${\mathbf {S}}$ and potential vector ${\varvec{b}}$ as inputs, and setting ${\varvec{\mu }}$ equal to the converged posterior means, we have that ${\mathbf {S}}{\varvec{\mu }} = {\varvec{b}}$.

Proof

In Theorems 1 and 2, we proved convergence to the following stationary equations:

$$\begin{aligned} Q_{ij}&= \frac{-S_{ij}^{2}}{\lambda + q_{i} - Q_{ji}} \nonumber \\ V_{ij}&= \frac{Q_{ij}}{S_{ij}}(\lambda \mu _{i} + z_{i} - V_{ji}) \nonumber \\ \mu _{i}&= \frac{\lambda \mu _{i}+z_{i}}{\lambda + q_{i}}, \end{aligned}$$

(34)

for all i and $j \in {\mathcal {N}}_{i}$. Furthermore, $q_{i} = 1 + \sum _{t \in {\mathcal {N}}_{i}}Q_{ti}$ and $z_{i} = b_{i} + \sum _{t \in {\mathcal {N}}_{i}}V_{ti}$. Using (34):

$$\begin{aligned} z_{i} + \lambda \mu _{i} = V_{ji} + \frac{S_{ij}}{Q_{ij}}V_{ij}, \end{aligned}$$

for all i and $j \in {\mathcal {N}}_{i}$. For any $k \in {\mathcal {N}}_{i}$ we can write

$$\begin{aligned} S_{ki}(z_{i} + \lambda \mu _{i}) = S_{ki}V_{ji} + S_{ki}\frac{S_{ij}}{Q_{ij}}V_{ij} = S_{ki}V_{ki} + \frac{S_{ik}^{2}}{Q_{ik}}V_{ik}. \end{aligned}$$

Furthermore, since $\frac{S_{ik}^{2}}{Q_{ik}} = Q_{ki} - (q_{i} + \lambda _{i})$, we have

$$\begin{aligned} S_{ki}(z_{i} + \lambda \mu _{i}) = S_{ki}V_{ki} + (Q_{ki} - (q_{i} + \lambda _{i}))V_{ik}. \end{aligned}$$

(35)

Dividing (35) by $q_{i} + \lambda $ gives

$$\begin{aligned} S_{ki}\mu _{i} = \frac{1}{q_{i} + \lambda }S_{ki}V_{ki} + \frac{Q_{ki}}{q_{i} + \lambda }V_{ik} - V_{ik}. \end{aligned}$$

(36)

Further simplification can be done by noting that $S_{ki}V_{ki} = Q_{ki}(\lambda u_{k} + z_{k} - V_{ik})$, and substituting into (36):

$$\begin{aligned} S_{ki}\mu _{i}&= \frac{Q_{ki}}{q_{i} + \lambda }(\lambda \mu _{k} + z_{k} - V_{ik} )+ \frac{Q_{ki}}{q_{i} + \lambda }V_{ik} - V_{ik} \nonumber \\&= \frac{Q_{ki}}{q_{i} + \lambda }(\lambda \mu _{k} + z_{k}) - \frac{Q_{ki}}{q_{i} + \lambda }V_{ik} + \frac{Q_{ki}}{q_{i} + \lambda }V_{ik} - V_{ik} \nonumber \\&= \frac{Q_{ki}}{q_{i} + \lambda }(\lambda \mu _{k} + z_{k}) - V_{ik}. \end{aligned}$$

(37)

Summing $S_{ki}\mu _{i}$ over i, substituting (37) for $i \in {\mathcal {N}}_{k}$, gives

$$\begin{aligned} \sum _{i \in {\mathcal {N}}_{k} \cup k}S_{ki}\mu _{i}&= \mu _{k} + (\lambda \mu _{k} + z_{k}) \sum _{i \in {\mathcal {N}}_{k}}\frac{Q_{ki}}{q_{i} + \lambda } - \sum _{i \in {\mathcal {N}}_{k}}V_{ik}. \nonumber \\ \end{aligned}$$

(38)

Since $Q_{ik}(\lambda + q_{i} - Q_{ki}) = -S_{ik}^{2} = -S_{ki}^{2} = Q_{ki}(\lambda + q_{k} - Q_{ik})$,

$$\begin{aligned} \frac{Q_{ik}}{\lambda + q_{k}} = \frac{Q_{ki}}{\lambda + q_{i}}. \end{aligned}$$

(39)

Substituting (39) into (38):

$$\begin{aligned} \sum _{i \in {\mathcal {N}}_{k} \cup k}S_{ki}\mu _{i}&= \mu _{k} + (\lambda \mu _{k} + z_{k}) \sum _{i \in {\mathcal {N}}_{k}}\frac{Q_{ik}}{\lambda + q_{k}} - \sum _{i \in {\mathcal {N}}_{k}}V_{ik} \nonumber \\&= \mu _{k} + \mu _{k}\sum _{i \in {\mathcal {N}}_{k}}Q_{ik} - \sum _{i \in {\mathcal {N}}_{k}}V_{ik} \nonumber \\&= \mu _{k} + \mu _{k}(q_{k}-1) - (z_{k} - b_{k}) \nonumber \\&= q_{k}\mu _{k} -z_{k} + b_{k}. \end{aligned}$$

(40)

Finally, since $\mu _{k} = \frac{\lambda \mu _{k} + z_{k}}{\lambda + q_{k}}$, we have that $ \mu _{k}(\lambda + q_{k}) = \lambda \mu _{k} + z_{k}$ and

$$\begin{aligned} q_{k}\mu _{k} = z_{k}. \end{aligned}$$

(41)

Substituting (41) into (40) completes the proof. $\square $

Appendix 2: Simulation information

1.1 Simulation scheme

We briefly describe the simulation scheme (Bach et al. 2011) used in our empirical work. This simulation scheme also relates GaBP to least squares estimation under the linear model. In order to apply sGaBP, we need to generate a positive definite symmetric precision matrix (${\mathbf {S}}$) and potential vector (${\varvec{b}}$). One way to do this is to generate a data structure according to the linear model with n observations and p inputs. This yields a design matrix ${\mathbf {X}}: n \times p$ and a response vector ${\mathbf {y}}: n \times 1$. We then form the sample correlation matrix ${\mathbf {S}} = {\mathbf {X}}'{\mathbf {X}}$ where we assume the columns of ${\mathbf {X}}$ are standardized to have zero mean and unity $L_{2}$ norm. We assume the same for ${\mathbf {y}}$ and form the sample correlation vector ${\varvec{b}} = {\mathbf {X}}'{\mathbf {y}}$. As long as $n > p$, ${\mathbf {S}}$ will be positive definite and we can use this as a valid precision matrix for the application of sGaBP. Explanatory variables are generated from $N(\mathbf {0},\frac{1}{n}{\mathbf {I}}_{p})$, where n is the number of observations and p the number of explanatory variables. The generated explanatory variables are stored in ${\mathbf {X}}$. Coefficients are generated, $\beta _{i} \sim \text {idd } N(0,1)$, and sparsity is introduced by randomly selecting half of the $\beta _{i}$’s and setting these equal to zero. Observations of the response are generated, ${\mathbf {y}} = {\mathbf {X}}\varvec{\beta } + \varvec{\epsilon }$, where $\epsilon _{i} \sim \text {idd } N(0,\sigma ^{2})$ and $\sigma ^{2} = 0.01 \times \frac{||{\mathbf {X}}\varvec{\beta }||^{2}}{n}$. All variables were standardized to have zero mean and unit $L_{2}$ norm and we form ${\mathbf {S}} = {\mathbf {X}}'{\mathbf {X}}$ and ${\varvec{b}} = {\mathbf {X}}'{\mathbf {y}}$. We used $n = p$ throughout the empirical section. The matrix, ${\mathbf {S}}$, was ensured to be positive by regulating its zero-diagonal spectral radius using the method discussed in the next section. We then apply sGaBP on a multivariate Gaussian with precision matrix ${\mathbf {S}}$ and potential vector ${\varvec{b}}$.

1.2 Regulating the zero-diagonal spectral radius

Suppose we have a precision matrix ${\mathbf {S}}: p \times p$ normalized to have ones along its diagonal. Set ${\mathbf {R}} = {\mathbf {I}}_{p} - {\mathbf {S}}$ and let $\tau _{i}: i = 1,2,\ldots ,p$ be the eigenvalues of ${\mathbf {R}}$. Suppose we wish to find a new precision matrix, ${\mathbf {S}}^{*}$, with zero-diagonal spectral radius set to a specified value (say $\alpha $). First, we compute the eigen decomposition of ${\mathbf {R}}$,

$$\begin{aligned} {\mathbf {R}} = {\mathbf {V}}{\mathbf {D}}{\mathbf {V}}', \end{aligned}$$

where ${\mathbf {V}}'{\mathbf {V}} = {\mathbf {V}}{\mathbf {V}}' = {\mathbf {I}}$ and ${\mathbf {D}} = \text {diag}(\tau _{1},\ldots ,\tau _{p})$. We form a new diagonal matrix, ${\mathbf {D}}^{*} = \frac{\alpha }{\tilde{\rho }({\mathbf {S}})}{\mathbf {D}}$, and set ${\mathbf {R}}^{*} = {\mathbf {V}}{\mathbf {D}}^{*}{\mathbf {V}}'$, ${\mathbf {S}}^{*} = {\mathbf {I}} - {\mathbf {R}}^{*}$. We now show that ${\mathbf {S}}^{*}$ is a valid precision matrix with diagonal entries equal to one if $\alpha < 1$. Since ${\mathbf {S}}$ is a normalized precision matrix, the diagonal of ${\mathbf {R}}$ will contain only zeros, the same is true for ${\mathbf {R}}^{*}$ (being a scalar multiple of ${\mathbf {R}}$) and therefore the diagonal of ${\mathbf {S}}^{*}$ will contain only ones. Suppose $\lambda _{i}, i = 1,2,\ldots ,p$ and $\lambda ^{*}_{i}, i = 1,2,\ldots ,p$ represent the eigenvalues of ${\mathbf {S}}$ and ${\mathbf {S}}^{*}$, respectively. The following holds:

$$\begin{aligned} \lambda _{i}^{*}&= 1 - \frac{\alpha }{\tilde{\rho }({\mathbf {S}})}(1-\lambda _{i}) \\&= 1 - \alpha \frac{1-\lambda _{i}}{\text {max}_{j} \{ |1-\lambda _{j}| \}} \\&= 1 - \alpha \times \text {sign}(1-\lambda _{i}) \times \frac{|1-\lambda _{i}|}{\text {max}_{j}\{ |1-\lambda _{j}| \}}. \end{aligned}$$

Since $\frac{|1-\lambda _{i}|}{\text {max}_{j}\{ |1-\lambda _{j}| \}} \le 1$, we have that $1-\alpha \le \lambda _{i}^{*} \le 1 + \alpha $. If $0 \le \alpha < 1$, then ${\mathbf {S}}^{*}$ will be positive definite. In our simulations, when $\alpha > 1$, we used a check to ensure that ${\mathbf {S}}^{*}$ is positive definite.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kamper, F., du Preez, J.A., Steel, S.J. et al. Regularized Gaussian belief propagation. Stat Comput 28, 653–672 (2018). https://doi.org/10.1007/s11222-017-9753-7

Download citation

Received: 17 January 2017
Accepted: 03 May 2017
Published: 12 May 2017
Issue Date: May 2018
DOI: https://doi.org/10.1007/s11222-017-9753-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Regularized Gaussian belief propagation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Improving probabilistic inference in graphical models with determinism and cycles

Message Scheduling Methods for Belief Propagation

Convergence and correctness of belief propagation for the Chinese postman problem

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Proofs

1.1 Proof of Theorem 1

Proof

1.2 Proof of Theorem 2

Lemma 1

Proof

Lemma 2

Proof

1.3 Proof of Theorem 3

Theorem 3

Proof

Appendix 2: Simulation information

1.1 Simulation scheme

1.2 Regulating the zero-diagonal spectral radius

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now