Abstract
Belief propagation (BP) has been applied in a variety of inference problems as an approximation tool. BP does not necessarily converge in loopy graphs, and even if it does, is not guaranteed to provide exact inference. Even so, BP is useful in many applications due to its computational tractability. In this article, we investigate a regularized BP scheme by focusing on loopy Markov graphs (MGs) induced by a multivariate Gaussian distribution in canonical form. There is a rich literature surrounding BP on Gaussian MGs (labelled Gaussian belief propagation or GaBP), and this is known to experience the same problems as general BP on graphs. GaBP is known to provide the correct marginal means if it converges (this is not guaranteed), but it does not provide the exact marginal precisions. We show that our adjusted BP will always converge, with sufficient tuning, while maintaining the exact marginal means. As a further contribution we show, in an empirical study, that our GaBP variant can accelerate GaBP and compares well with other GaBP-type competitors in terms of convergence speed and accuracy of approximate marginal precisions. These improvements suggest that the principle of regularized BP should be investigated in other inference problems. The selection of the degree of regularization is addressed through the use of two heuristics. A by-product of GaBP is that it can be used to solve linear systems of equations; the same is true for our variant and we make an empirical comparison with the conjugate gradient method.








Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Aji, S., McEliece, R.: The generalized distributive law. IEEE Trans. Inform. Theory 46, 325–343 (2000)
Bach, F., Jenatton, R., Mairal, J., Obozinski, G.: Convex optimization with sparsity-inducing norms. In: Sra, S., Nowozin, S., Wright, J. (eds.) Optimization for Machine Learning. MIT Press, Cambridge (2011)
Bickson, D.: Gaussian Belief Propagation: Theory and Application, PhD thesis. The Hebrew University of Jerusalem (2008)
Chandrasekaran, V., Johnson, J.K., Willsky, A.S.: Estimation in Gaussian graphical models using tractable subgraphs: a walk-sum analysis. IEEE Trans. Signal Process. 56, 1916–1930 (2008)
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Annal. Stat. 32(2), 407–499 (2004)
El-Kurdi, Y., Giannacopoulos, D., Gross, W.J.: Relaxed Gaussian belief propagation. In: Proceedings of the 2012 IEEE International Symposium on Information Theory (2012a)
El-Kurdi, Y., Gross, W.J., Giannacopoulos, D.: Efficient implementation of Gaussian belief propagation solver for large sparse diagonally dominant linear systems. IEEE Trans. Magn. 48, 471–474 (2012b)
Frey, B., Kschischang, F.: Probability propagation and iterative decoding. In: Proceedings of the 34th Annual Allerton Conference on Communication, Control, and Computing, Allerton House, Monticello (1996)
Gallager, R.G.: Low-Density Parity-Check Codes. MIT Press, Cambridge (1963)
Guo, Q., Huang, D.: EM-based joint channel estimation and detection for frequency selective channels using Gaussian message passing. IEEE Trans. Signal Process. 59, 4030–4035 (2011)
Guo, Q., Li, P.: LMMSE turbo equalization based on factor graphs. IEEE J. Sel. Areas Commun. 26, 311–319 (2008)
Johnson, J.K., Bickson, D., Dolev, D.: Fixing convergence of Gaussian belief propagation. In: International Symposium on Information Theory (ISIT), Seoul (2009)
Koller, D., Friedman, N.: Probabilistic Graphical Models Principles and Techniques. MIT Press, Cambridge (2009)
Lauritzen, S,, Spiegelhalter, D. Local computations with probabilities on graphical structures and their application to expert systems. J. R. Stat. Soc. B 50,157–224 (1988)
Liu, Y., Chandrasekaran, V., Anandkumar, A., Willsky, A.S.: Feedback message passing for inference in Gaussian graphical models. IEEE Trans. Signal Process. 60(8), 4135–4150 (2012)
Malioutov, D.M., Johnson, J.K., Willsky, A.S.: Walk-sums and belief propagation in gaussian graphical models. J. Mach. Learn. Res. 7, 2031–2064 (2006)
Montanari, A., Prabhakar, B., Tse, D.: Belief propagation based multi-user detection. In: IEEE Information Theory Workshop, Punta del Este, Uruguay (2006)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Francisco (1988)
Seeger, M.W., Wipf, D.P.: Variational Bayesian inference techniques. IEEE Signal Process. Mag. 27, 81–91 (2010)
Shachter, R.: Probabilistic inference and influence diagrams. Oper. Res. 36, 589–605 (1988)
Shafer, G., Shenoy, P.: Probability propagation. Ann. Mat. Art. Intell. 2, 327–352 (1990)
Shental, O., Siegel, P.H., Wolf. J.K., Bickson, D., Dolev, D.: Gaussian belief propagation solver for systems of linear equations. In: IEEE International Symposium on Informational Theory (ISIT), pp 1863–1867 (2008)
Shewchuk, J.R.: An Introduction to the Conjugate Gradient Method Without the Agonizing Pain. School of Computer Science. Carnegie Mellon University, Pittsburgh, pp. 15213 (1994)
Su, Q., Wu, Y.: On convergence conditions of Gaussian belief propagation. IEEE Int. Trans. Signal Process. 63, 1144–1155 (2015)
Weiss, Y.: Correctness of local probability in graphical models with loops. Neural Comput. 12, 1–41 (2000)
Weiss, Y., Freeman, W.T.: Correctness of belief propagation in Gaussian graphical models of arbitrary topology. Neural Comput. 13(10), 2173–2200 (2001)
Acknowledgements
The financial assistance of the National Research Foundation (NRF) towards this research is hereby acknowledged. Opinions expressed and conclusions arrived at are those of the author and are not necessarily to be attributed to the NRF.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Proofs
1.1 Proof of Theorem 1
Proof
The proof is contained in the following list.
-
1.
From 1, all the precision components are negative at stage n, hence \(Q_{ij}^{(n+1)} = \frac{-S_{ij}^{2}}{1 + \lambda + \sum _{t \in {\mathcal {N}}_{i}/j}Q_{ti}^{(n)}} = \frac{-S_{ij}^{2}}{1 + \lambda - \sum _{t \in {\mathcal {N}}_{i}/j}|Q_{ti}^{(n)}|}\). From 3 we have that \(\sum _{t \in {\mathcal {N}}_{i}/j}|Q_{ti}^{(n)}| \le \sum _{t \in {\mathcal {N}}_{i}}|Q_{ti}^{(n)}|< \delta _{i}^{(n)} < 1 + \lambda \) and \(1 + \lambda - \sum _{t \in {\mathcal {N}}_{i}/j}|Q_{ti}^{(n)}| > 0\) from which 1 follows for iteration \(n+1\).
-
2.
\(|Q_{ij}^{(n+1)}| = \frac{S_{ij}^{2}}{1 + \lambda - \sum _{t \in {\mathcal {N}}_{i}/j}|Q_{ti}^{(n)}|} \ge \frac{S_{ij}^{2}}{1 + \lambda - \sum _{t \in {\mathcal {N}}_{i}/j}|Q_{ti}^{(n-1)}|} = |Q_{ij}^{(n)}|\) since \(|Q_{ti}^{(n)}| > |Q_{ti}^{(n-1)}|, t \in {\mathcal {N}}_{i}\) from 2 for iteration n and hence 2 is also true for \(n+1\).
-
3.
\(\delta _{i}^{(n+1)} = \sum _{t \in {\mathcal {N}}_{i}} |Q_{ti}^{(n+1)}| = \sum _{t \in {\mathcal {N}}_{i}} \frac{S^{2}_{ti}}{1 + \lambda - \delta _{t}^{(n)} + |Q_{it}^{(n)}|} \le \sum _{t \in {\mathcal {N}}_{i}} \frac{S^{2}_{ti}}{1 + \lambda - \delta _{t} + |Q_{it}^{(n)}|} \le \delta _{i} < 1 + \lambda \) by 4 and therefore 3 is true for \(n+1\).
-
4.
From the above we have, \(\sum _{t \in {\mathcal {N}}_{i}} \frac{S^{2}_{ti}}{1 + \lambda - \delta _{t} + |Q_{it}^{(n+1)}|} \) \(\le \sum _{t \in {\mathcal {N}}_{i}} \frac{S^{2}_{ti}}{1 + \lambda - \delta _{t} + |Q_{it}^{(n)}|} \le \delta _{i}\), hence 4 holds for \(n+1\).
\(\square \)
1.2 Proof of Theorem 2
Let \({\mathbf {S}}\) be a symmetric, positive definite matrix with diagonal entries equal to 1, and let its entries be denoted by \(S_{ij}\). Values \(Q_{ij}(\lambda )\) are characterized by the system
We are particularly interested in the behaviour as \(\lambda \rightarrow \infty \). A consequence of Theorem 1 is that \(\lim _{\lambda \rightarrow \infty } Q_{ij}(\lambda ) = 0\). For convenience, set \(\delta = \lambda ^{-1}\), so that \(\delta \rightarrow 0\). The system can be rewritten as
Note that
As \(\delta \rightarrow 0\), we see that the Jacobian of the system tends to a negative identity matrix, so in particular it is invertible. This means that the \(Q_{ij}\) are analytic functions of \(\delta \) if \(\delta \) is in a suitable neighbourhood of 0. Consequently, the \(Q_{ij}\) have power series expansions in \(\delta \):
Plugging this back into the system, we see that in fact \(a_{ij} = -S_{ij}^2\), so we have
Consider again the matrix \({\mathbf {L}}\) given in (20). Let \(l = p^{2} - p\), we now define a \(l \times p\) matrix \({\mathbf {G}} = \begin{bmatrix} {\mathbf {g}}_{1}&{\mathbf {g}}_{2}&\ldots&{\mathbf {g}}_{p} \end{bmatrix} \). The vector \({\mathbf {g}}_{i}\) has entries 1 in positions \((p-1)(i-1)+1,\ldots ,(p-1)i\). It can be shown that
with the understanding that \(\lambda > 0\). Let,
and set \(\tilde{{\mathbf {L}}} = {\mathbf {D}}{\mathbf {L}}{\mathbf {D}}^{-1}\). It is easy to see that,
and that \({\mathbf {L}}\) and \(\tilde{{\mathbf {L}}}\) will have the same eigenvalues. As a first step, we show that the eigenvalues of \(\tilde{{\mathbf {L}}}\) are all clustered around 0 and 1 as \(\delta \rightarrow 0\). We have already discussed the construction of \({\mathbf {L}}_{11}\) from the elements \(C_{ij}\). Using the fact that \(C_{ij} = -\delta S_{ij} + {\mathcal {O}}(\delta ^{2})\) we see that \({\mathbf {L}}_{11} = \delta {\mathbf {A}} + {\mathcal {O}}(\delta ^{2})\) where \({\mathbf {A}}\) does not depend on \(\lambda \) and \({\mathcal {O}}(\delta ^{2})\) is of a suitable dimension each entry being \({\mathcal {O}}(\delta ^{2})\). The matrix \({\mathbf {A}}\) is constructed exactly as \({\mathbf {L}}_{11}\); however, \(-S_{ij}\)’s are used instead of \(-C_{ij}\)’s. As discussed, the matrix \({\mathbf {L}}_{22}\) is diagonal with entries \(\frac{\lambda }{1 + \lambda + \sum _{t\ne i}Q_{ti}} = \frac{\lambda }{1 + \lambda + {\mathcal {O}}(\delta )} = 1 - \delta + {\mathcal {O}}(\delta ^{2})\) and therefore \({\mathbf {L}}_{22} = {\mathbf {I}} - \delta {\mathbf {I}} + {\mathcal {O}}(\delta ^{2})\). We now consider the matrix,
and the following Lemma.
Lemma 1
Let \({\mathbf {M}}\) be a square matrix, and let c be a positive constant that satisfies \(c > \Vert {\mathbf {M}}\Vert _{\infty }\) (\(\Vert {\mathbf {M}}\Vert _{\infty }\) is the \(\infty \)-norm of \({\mathbf {M}}\), which can be obtained by calculating the row sums of the absolute values of entries in \({\mathbf {M}}\) and taking the maximum of these sums). For every x with \(|x| \ge c\), the matrices \(x{\mathbf {I}} - {\mathbf {M}}\) and \({\mathbf {I}} - \frac{1}{x}{\mathbf {M}}\) are invertible, and the entries of \(({\mathbf {I}} - \frac{1}{x} {\mathbf {M}})^{-1}\) are bounded by constants that only depend on c and \({\mathbf {M}}\).
Proof
The invertibility follows directly from the fact that the matrix \(x {\mathbf {I}} - {\mathbf {M}}\) is strictly diagonally dominant by our assumptions. For the second statement, let \(|{\mathbf {M}}|\) be obtained from \({\mathbf {M}}\) by replacing all entries by their absolute values. Note that \(|{\mathbf {M}}|\) has the same \(\infty \)-norm as \({\mathbf {M}}\). Clearly, the entries of
are bounded by the entries of
which readily proves the desired statement. \(\square \)
Lemma 2
There exists a constant \(K > 0\) such that for sufficiently small \(\delta \), each eigenvalue x of \(\tilde{{\mathbf {L}}}\) either satisfies \(|x| \le K\delta \) or \(|x-1| \le K\delta \).
Proof
We reason by contradiction and assume that there is an eigenvalue for which \(|x| > K\delta \) and \(|x-1| > K\delta \). Consider first \(||{\mathbf {L}}_{11}||_{\infty }= ||\delta {\mathbf {A}} + {\mathcal {O}}(\delta ^{2})||_{\infty } \le \delta ||{\mathbf {A}}||_{\infty } + {\mathcal {O}}(\delta ^{2})\). If we choose K large enough (e.g. \(K \ge \Vert {\mathbf {A}}\Vert _{\infty } + 1\)), then the matrix \(x{\mathbf {I}} - {\mathbf {L}}_{11} = x {\mathbf {I}} - \delta {\mathbf {A}} + {\mathcal {O}}(\delta ^2)\) is invertible by the previous lemma for sufficiently small \(\delta \), and the entries of \(({\mathbf {I}} - \frac{1}{x} {\mathbf {L}}_{11})^{-1}\) are bounded by absolute constants. Now, we use the Schur complement on 29:
It remains to show that the second determinant is not equal to 0. We rewrite the matrix as follows:
Consider \(({\mathbf {L}}_{22} - {\mathbf {I}}) = -\delta {\mathbf {I}} + {\mathcal {O}}(\delta ^{2})\) and \(||{\mathbf {L}}_{22} - {\mathbf {I}}||_{\infty } = ||-\delta {\mathbf {I}} + {\mathcal {O}}(\delta ^{2})||_{\infty } \le \delta || {\mathbf {I}} ||_{\infty } + ||{\mathcal {O}}(\delta ^{2})||_{\infty } = \delta + {\mathcal {O}}(\delta ^{2})\). Therefore, \(||{\mathbf {H}}_{1}||_{\infty } \le \kappa _{1}\delta \) for a constant \(\kappa _{1}\) and sufficiently small \(\delta \). The entries of \(({\mathbf {I}} - \frac{1}{x} {\mathbf {L}}_{11})^{-1}\) are bounded by \(({\mathbf {I}} - \frac{1}{K\delta } |{\mathbf {L}}_{11}|)^{-1} = {\mathbf {I}} + \frac{|{\mathbf {L}}_{11}|}{K \delta } + \sum _{j=2}^{\infty }\frac{|{\mathbf {L}}_{11}|^{j}}{K \delta }\) for sufficiently small \(\delta \) by Lemma 1. Since \({\mathbf {L}}_{11} = \delta {\mathbf {A}} + {\mathcal {O}}(\delta ^{2})\) we have that \(({\mathbf {I}} - \frac{1}{x} {\mathbf {L}}_{11})^{-1} = {\mathbf {I}} + \frac{|{\mathbf {A}}|}{K} + {\mathcal {O}}(\delta ) = {\mathcal {O}}(1)\). Furthermore, \({\mathbf {L}}_{22} = {\mathcal {O}}(1)\) and \({\mathbf {L}}_{11} = {\mathcal {O}}(\delta )\) from which we have that \({\mathbf {H}}_{2} = {\mathcal {O}}(\delta ) + {\mathcal {O}}(\delta ^{2})\) and \(||{\mathbf {H}}_{2}||_{\infty } \le \kappa _{2}\delta \) for a constant \(\kappa _{2}\) and sufficiently small \(\delta \). If \(|x| \ge \frac{1}{2}\), we find that
if K is chosen large enough (greater than \(\kappa _{1} + 2\kappa _{2}\)). If \(|x| \le \frac{1}{2}\), we get
if K is chosen large enough and \(\delta \) is sufficiently small. In either case, we can apply the previous lemma to see that the matrix in (31) is in fact invertible. \(\square \)
Now, we focus on the eigenvalues that are close to 1, setting \(x = 1 - \delta t\) for some t with \(|t| \le K\). Returning to (30), we observe that \(x {\mathbf {I}} - {\mathbf {L}}_{11}\) is invertible for sufficiently small \(\delta \), again by Lemma 1. Hence, we consider the second matrix:
The entries of the matrix hidden by the \({\mathcal {O}}(\delta ^2)\) term are in fact analytic in \(\delta \) and t, since we proved earlier that the entries of \(\tilde{{\mathbf {L}}}\) are analytic functions. We can take out a factor \(\delta \) to be left with the equation
where the matrix \({\mathbf {M}}\) has entries that are analytic functions of \(\delta \) and t (if \(\delta \) is restricted to a sufficiently small neighbourhood of 0 and \(|t| \le K\)). Moreover, \({\mathbf {M}} = {\mathcal {O}}(\delta )\). As \(\delta \rightarrow 0\), we obtain (up to a change of variable \(1-t = u\)) the characteristic equation of the matrix \(\frac{1}{p-2} {\mathbf {G}}'{\mathbf {A}}{\mathbf {G}}\) (we will show later that this matrix is in fact equal to \({\mathbf {I}} - {\mathbf {S}}\)). Its p solutions (counted with multiplicity) give rise to p branches \(t_1(\delta ),t_2(\delta ),\ldots ,t_p(\delta )\) that solve the implicit equation (32). In the same way, we can treat the “small” eigenvalues that are close to 0. We set \(x = \delta t\) for some t with \(|t| \le K\), and use the Schur complement with respect to the other diagonal block:
Since
this matrix is invertible for sufficiently small \(\delta \), again by Lemma 1. Moreover, we have
so we can repeat the argument for the “large” eigenvalues. We obtain \(p^2-p\) branches \(\bar{t}_1(\delta ),\bar{t}_2(\delta ),\ldots ,\bar{t}_{p^2-p}(\delta )\) that correspond to the eigenvalues of \({\mathbf {A}} - \frac{1}{p-2}{\mathbf {AGG}}'\).
Returning to the large eigenvalues, we consider the product \({\mathbf {G}}'{\mathbf {A}}{\mathbf {G}}\). The matrix \({\mathbf {A}}\) is constructed by taking the first l rows and columns of \({\mathbf {L}}\) and replacing the \(C_{ij}\) elements with \(-S_{ij}\). The rows, \((p-1)(j-1)+1,\ldots ,(p-1)j\), correspond to messages received by node j (in order) and hence \({\mathbf {g}}_{j}\) contains ones at the rows corresponding to messages received by node j and zeros otherwise. Consider a row corresponding to a message from node i to node j which requires communication from other nodes (excluding j) to node i, this row will therefore contain \(-S_{ij}\) where \({\mathbf {g}}_{i}\) is equal to 1, except the element corresponding to the message from j to i. Now, \({\mathbf {A}}{\mathbf {g}}_{i}\) will be equal to \(-(p-2)S_{ij}\) in the rows corresponding to the message from i to j and zero otherwise. The vector \({\mathbf {g}}_{j}\) contains references to rows corresponding to messages received by node j and since there is only one message from i to j the nonzero elements of \({\mathbf {g}}_{j}\) will overlap with the nonzero elements of \({\mathbf {A}}{\mathbf {g}}_{i}\) at one element and hence \({\mathbf {g}}_{j}'{\mathbf {A}}{\mathbf {g}}_{i} = -(p-2)S_{ij}\) for \(j \ne i\). Furthermore, since there is no message from node i to node i we have that \({\mathbf {g}}_{i}'{\mathbf {A}}{\mathbf {g}}_{i} = 0\). We see that \(\frac{1}{p-2}{\mathbf {G}}'{\mathbf {AG}} = {\mathbf {I}} - {\mathbf {S}}\). Equation (32) becomes
Since \({\mathbf {S}}\) is symmetric, it is diagonalizable. There exists an orthogonal matrix \({\mathbf {U}}\) such that \({\mathbf {U}}^{-1} {\mathbf {S}} {\mathbf {U}} = {\mathbf {D}}\) is a diagonal matrix. We have
Recall that \({\mathbf {H}} = {\mathcal {O}}(\delta )\), uniformly in t (for \(|t| \le K\)), so we also have \({\mathbf {U}}^{-1} {\mathbf {H}} {\mathbf {U}} = {\mathcal {O}}(\delta )\). Let \(\kappa \) be a constant such that \(\Vert {\mathbf {U}}^{-1} {\mathbf {H}} {\mathbf {U}}\Vert _{\infty } \le \kappa \delta \) (for sufficiently small \(\delta \) and \(|t| \le K\)). If
then we must have \(|t - d_{ii}| \le \kappa \delta \) for one of the diagonal entries \(d_{ii}\) of \({\mathbf {D}}\), for otherwise the matrix \({\mathbf {D}} - t {\mathbf {I}} + {\mathbf {U}}^{-1} {\mathbf {H}} {\mathbf {U}}\) will be strictly diagonally dominant and thus invertible. The diagonal entries of \({\mathbf {D}}\) are the eigenvalues \(\sigma _1,\sigma _2,\ldots ,\sigma _p\) of \({\mathbf {S}}\), so it follows that \(t = \sigma _i + {\mathcal {O}}(\delta )\).
We can deal with the small eigenvalues in the same way, it only remains to determine the entries of \({\mathbf {A}} - \frac{1}{p-2}{\mathbf {AG}}'{\mathbf {G}}\) (thereby verifying that this matrix is also symmetric and thus diagonalizable). It is easy to verify that \({\mathbf {G}}'{\mathbf {G}}\) is a block diagonal matrix where the blocks are of dimension \((p-1)\times (p-1)\) with all entries equal to one, in fact \({\mathbf {G}}'{\mathbf {G}}= [{\mathbf {B}}_{1},{\mathbf {B}}_{2},\ldots ,{\mathbf {B}}_{p}]\) where \({\mathbf {B}}_{i}\) is \({\mathbf {g}}_{i}\) appended \(p-1\) times as columns. Consider a row in \({\mathbf {A}}\) corresponding to a message from i to j, say \({\mathbf {a}}_{ij}'\), we have already verified that this row contains \(-S_{ij}\) where \({\mathbf {g}}_{i}\) is one, except for the 1 corresponding to the message from j to i. Now, \({\mathbf {a}}_{ij}'{\mathbf {G}}'{\mathbf {G}}\) will contain nonzero elements in \({\mathbf {a}}_{ij}'{\mathbf {B}}_{i}\) and these will all equal \(-(p-2)S_{ij}\). A row, \({\mathbf {b}}_{ij}\), of \(\frac{1}{p-2}{\mathbf {AG}}'{\mathbf {G}}\) corresponding to a message from i to j will contain \(-S_{ij}\) where \({\mathbf {g}}_{i}\) equals one (even for the message from j to i). Hence, \({\mathbf {a}}_{ij}\) and \({\mathbf {b}}_{ij}\) will be identical except for the element corresponding to the message from i to j, where \({\mathbf {a}}_{ij}\) is zero and \({\mathbf {b}}_{ij}\) is \(-S_{ij}\). Hence, a row of \({\mathbf {A}} - \frac{1}{p-2}{\mathbf {AG}}'{\mathbf {G}}\) corresponding to a message from i to j will have one element (at the message from j to i) equal to \(S_{ij}\) and the rest are zero. Furthermore, the row corresponding to the message from j to i will have \(S_{ji} = S_{ij}\) as an element in the position corresponding to the message from i to j. Hence, \({\mathbf {A}} - \frac{1}{p-2}{\mathbf {AG}}'{\mathbf {G}}\) is symmetric.
In conclusion, the eigenvalues of \(\tilde{{\mathbf {L}}}\) are
-
\(1 - \sigma _i\delta + {\mathcal {O}}(\delta ^2)\), where \(\sigma _1,\sigma _2,\ldots ,\sigma _p\) are the eigenvalues of \({\mathbf {S}}\), and
-
\(\pm S_{ij} \delta + {\mathcal {O}}(\delta ^2)\), \(1 \le i < j \le p\).
In particular, the largest eigenvalue of \(\tilde{{\mathbf {L}}}\) is connected to the least eigenvalue \(\sigma _{\text {min}}\) of \({\mathbf {S}}\) by
Since \({\mathbf {S}}\) is a positive definite matrix, we know that \(\sigma _{\text {min}} > 0\). It follows that
for sufficiently small \(\delta \).
1.3 Proof of Theorem 3
Theorem 3
Under the assumption that sGaBP converges, with precision matrix \({\mathbf {S}}\) and potential vector \({\varvec{b}}\) as inputs, and setting \({\varvec{\mu }}\) equal to the converged posterior means, we have that \({\mathbf {S}}{\varvec{\mu }} = {\varvec{b}}\).
Proof
In Theorems 1 and 2, we proved convergence to the following stationary equations:
for all i and \(j \in {\mathcal {N}}_{i}\). Furthermore, \(q_{i} = 1 + \sum _{t \in {\mathcal {N}}_{i}}Q_{ti}\) and \(z_{i} = b_{i} + \sum _{t \in {\mathcal {N}}_{i}}V_{ti}\). Using (34):
for all i and \(j \in {\mathcal {N}}_{i}\). For any \(k \in {\mathcal {N}}_{i}\) we can write
Furthermore, since \(\frac{S_{ik}^{2}}{Q_{ik}} = Q_{ki} - (q_{i} + \lambda _{i})\), we have
Dividing (35) by \(q_{i} + \lambda \) gives
Further simplification can be done by noting that \(S_{ki}V_{ki} = Q_{ki}(\lambda u_{k} + z_{k} - V_{ik})\), and substituting into (36):
Summing \(S_{ki}\mu _{i}\) over i, substituting (37) for \(i \in {\mathcal {N}}_{k}\), gives
Since \(Q_{ik}(\lambda + q_{i} - Q_{ki}) = -S_{ik}^{2} = -S_{ki}^{2} = Q_{ki}(\lambda + q_{k} - Q_{ik})\),
Finally, since \(\mu _{k} = \frac{\lambda \mu _{k} + z_{k}}{\lambda + q_{k}}\), we have that \( \mu _{k}(\lambda + q_{k}) = \lambda \mu _{k} + z_{k}\) and
Substituting (41) into (40) completes the proof. \(\square \)
Appendix 2: Simulation information
1.1 Simulation scheme
We briefly describe the simulation scheme (Bach et al. 2011) used in our empirical work. This simulation scheme also relates GaBP to least squares estimation under the linear model. In order to apply sGaBP, we need to generate a positive definite symmetric precision matrix (\({\mathbf {S}}\)) and potential vector (\({\varvec{b}}\)). One way to do this is to generate a data structure according to the linear model with n observations and p inputs. This yields a design matrix \({\mathbf {X}}: n \times p\) and a response vector \({\mathbf {y}}: n \times 1\). We then form the sample correlation matrix \({\mathbf {S}} = {\mathbf {X}}'{\mathbf {X}}\) where we assume the columns of \({\mathbf {X}}\) are standardized to have zero mean and unity \(L_{2}\) norm. We assume the same for \({\mathbf {y}}\) and form the sample correlation vector \({\varvec{b}} = {\mathbf {X}}'{\mathbf {y}}\). As long as \(n > p\), \({\mathbf {S}}\) will be positive definite and we can use this as a valid precision matrix for the application of sGaBP. Explanatory variables are generated from \(N(\mathbf {0},\frac{1}{n}{\mathbf {I}}_{p})\), where n is the number of observations and p the number of explanatory variables. The generated explanatory variables are stored in \({\mathbf {X}}\). Coefficients are generated, \(\beta _{i} \sim \text {idd } N(0,1)\), and sparsity is introduced by randomly selecting half of the \(\beta _{i}\)’s and setting these equal to zero. Observations of the response are generated, \({\mathbf {y}} = {\mathbf {X}}\varvec{\beta } + \varvec{\epsilon }\), where \(\epsilon _{i} \sim \text {idd } N(0,\sigma ^{2})\) and \(\sigma ^{2} = 0.01 \times \frac{||{\mathbf {X}}\varvec{\beta }||^{2}}{n}\). All variables were standardized to have zero mean and unit \(L_{2}\) norm and we form \({\mathbf {S}} = {\mathbf {X}}'{\mathbf {X}}\) and \({\varvec{b}} = {\mathbf {X}}'{\mathbf {y}}\). We used \(n = p\) throughout the empirical section. The matrix, \({\mathbf {S}}\), was ensured to be positive by regulating its zero-diagonal spectral radius using the method discussed in the next section. We then apply sGaBP on a multivariate Gaussian with precision matrix \({\mathbf {S}}\) and potential vector \({\varvec{b}}\).
1.2 Regulating the zero-diagonal spectral radius
Suppose we have a precision matrix \({\mathbf {S}}: p \times p\) normalized to have ones along its diagonal. Set \({\mathbf {R}} = {\mathbf {I}}_{p} - {\mathbf {S}}\) and let \(\tau _{i}: i = 1,2,\ldots ,p\) be the eigenvalues of \({\mathbf {R}}\). Suppose we wish to find a new precision matrix, \({\mathbf {S}}^{*}\), with zero-diagonal spectral radius set to a specified value (say \(\alpha \)). First, we compute the eigen decomposition of \({\mathbf {R}}\),
where \({\mathbf {V}}'{\mathbf {V}} = {\mathbf {V}}{\mathbf {V}}' = {\mathbf {I}}\) and \({\mathbf {D}} = \text {diag}(\tau _{1},\ldots ,\tau _{p})\). We form a new diagonal matrix, \({\mathbf {D}}^{*} = \frac{\alpha }{\tilde{\rho }({\mathbf {S}})}{\mathbf {D}}\), and set \({\mathbf {R}}^{*} = {\mathbf {V}}{\mathbf {D}}^{*}{\mathbf {V}}'\), \({\mathbf {S}}^{*} = {\mathbf {I}} - {\mathbf {R}}^{*}\). We now show that \({\mathbf {S}}^{*}\) is a valid precision matrix with diagonal entries equal to one if \(\alpha < 1\). Since \({\mathbf {S}}\) is a normalized precision matrix, the diagonal of \({\mathbf {R}}\) will contain only zeros, the same is true for \({\mathbf {R}}^{*}\) (being a scalar multiple of \({\mathbf {R}}\)) and therefore the diagonal of \({\mathbf {S}}^{*}\) will contain only ones. Suppose \(\lambda _{i}, i = 1,2,\ldots ,p\) and \(\lambda ^{*}_{i}, i = 1,2,\ldots ,p\) represent the eigenvalues of \({\mathbf {S}}\) and \({\mathbf {S}}^{*}\), respectively. The following holds:
Since \(\frac{|1-\lambda _{i}|}{\text {max}_{j}\{ |1-\lambda _{j}| \}} \le 1\), we have that \(1-\alpha \le \lambda _{i}^{*} \le 1 + \alpha \). If \(0 \le \alpha < 1\), then \({\mathbf {S}}^{*}\) will be positive definite. In our simulations, when \(\alpha > 1\), we used a check to ensure that \({\mathbf {S}}^{*}\) is positive definite.
Rights and permissions
About this article
Cite this article
Kamper, F., du Preez, J.A., Steel, S.J. et al. Regularized Gaussian belief propagation. Stat Comput 28, 653–672 (2018). https://doi.org/10.1007/s11222-017-9753-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-017-9753-7