Appendix
Proof of Theorem 1:
Computing the denominator of WBIC for GP
In the following, we have simplified notation by setting \(\sigma ^2_\mathrm{n}=\sigma ^2\) as variable \(\sigma ^2_\mathrm{s}\) is inside matrix \({{\mathrm{\mathbf {K}}}}\) and therefore it does not emerge in the proof. Let us start to compute \(\beta \)-weighted marginal likelihood:
$$\begin{aligned}&\int \limits _{-\infty }^{\infty }p({{\mathrm{\mathbf {y}}}}|{{\mathrm{\mathbf {f}}}},{{\mathrm{D}}})^{\beta }p({{\mathrm{\mathbf {f}}}}|D){{\mathrm{d}}}{{\mathrm{\mathbf {f}}}}\\&\quad = \int \limits _{-\infty }^{\infty }{{\mathrm{\mathcal {N}}}}({{\mathrm{\mathbf {y}}}};{{\mathrm{\mathbf {f}}}},\sigma ^2{{\mathrm{\mathbf {I}}}})^{\beta }{{\mathrm{\mathcal {N}}}}({{\mathrm{\mathbf {f}}}};0,{{\mathrm{\mathbf {K}}}}) {{\mathrm{d}}}{{\mathrm{\mathbf {f}}}}\\&\quad = \int \limits _{-\infty }^{\infty }\frac{1}{\sqrt{(2\pi )^{\beta n}|\sigma ^2{{\mathrm{\mathbf {I}}}}|^\beta }}{{\mathrm{\mathbf {e}}}}^{-\frac{1}{2}({{\mathrm{\mathbf {y}}}}-{{\mathrm{\mathbf {f}}}})^\intercal (\frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}})^{-1}({{\mathrm{\mathbf {y}}}}-{{\mathrm{\mathbf {f}}}})}\\&\qquad \times \frac{1}{\sqrt{(2\pi )^n|{{\mathrm{\mathbf {K}}}}|}}{{\mathrm{\mathbf {e}}}}^{-\frac{1}{2}{{\mathrm{\mathbf {f}}}}^\intercal {{\mathrm{\mathbf {K}}}}^{-1}{{\mathrm{\mathbf {f}}}}}{{\mathrm{d}}}{{\mathrm{\mathbf {f}}}}\\&\quad = (2\pi )^{-\frac{(\beta +1) n}{2}}|\sigma ^2{{\mathrm{\mathbf {I}}}}|^{-\frac{\beta }{2}}|{{\mathrm{\mathbf {K}}}}|^{-\frac{1}{2}}\\&\qquad \times \int \limits _{-\infty }^{\infty }{{\mathrm{\mathbf {e}}}}^{-\frac{1}{2}({{\mathrm{\mathbf {y}}}}-{{\mathrm{\mathbf {f}}}})^\intercal (\frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}})^{-1}({{\mathrm{\mathbf {y}}}}-{{\mathrm{\mathbf {f}}}})-\frac{1}{2}{{\mathrm{\mathbf {f}}}}^\intercal {{\mathrm{\mathbf {K}}}}^{-1}{{\mathrm{\mathbf {f}}}}}{{\mathrm{d}}}{{\mathrm{\mathbf {f}}}}. \end{aligned}$$
Then complete the sum of squared forms to square (Petersen and Pedersen 2008, 8.1.7 Sum of two squared forms)
$$\begin{aligned}&-\frac{1}{2}({{\mathrm{\mathbf {f}}}}-{{\mathrm{\mathbf {y}}}})^\intercal \left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}({{\mathrm{\mathbf {f}}}}-{{\mathrm{\mathbf {y}}}})-\frac{1}{2}({{\mathrm{\mathbf {f}}}}-0)^\intercal {{\mathrm{\mathbf {K}}}}^{-1}({{\mathrm{\mathbf {f}}}}-0)\\&\quad =-\frac{1}{2}\left( {{\mathrm{\mathbf {f}}}}-\left( \left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}+{{\mathrm{\mathbf {K}}}}^{-1}\right) ^{-1}(\frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}})^{-1}{{\mathrm{\mathbf {y}}}}\right) ^\intercal \\&\qquad \times \left( \left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}+{{\mathrm{\mathbf {K}}}}^{-1}\right) \\&\qquad \times \left( {{\mathrm{\mathbf {f}}}}-\left( \left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}+{{\mathrm{\mathbf {K}}}}^{-1}\right) ^{-1}\left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}{{\mathrm{\mathbf {y}}}}\right) \\&\qquad + \frac{1}{2}\left( {{\mathrm{\mathbf {y}}}}^\intercal \left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}\right) \left( \left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}+{{\mathrm{\mathbf {K}}}}^{-1}\right) ^{-1}\\&\qquad \times \left( \left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}{{\mathrm{\mathbf {y}}}}\right) -\frac{1}{2}\left( {{\mathrm{\mathbf {y}}}}^\intercal \left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}{{\mathrm{\mathbf {y}}}}\right) \end{aligned}$$
By using the matrix inversion lemma (Rasmussen and Williams 2006, A.3 Matrix Identities) to the second term we get
$$\begin{aligned}&-\frac{1}{2}\left( {{\mathrm{\mathbf {f}}}}-\left( \left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}+{{\mathrm{\mathbf {K}}}}^{-1}\right) ^{-1}\left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}{{\mathrm{\mathbf {y}}}}\right) ^\intercal \\&\quad \times \left( \left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}+{{\mathrm{\mathbf {K}}}}^{-1}\right) \biggl ({{\mathrm{\mathbf {f}}}}-\left( \left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}+{{\mathrm{\mathbf {K}}}}^{-1}\right) ^{-1}\\&\quad \times \left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}{{\mathrm{\mathbf {y}}}}\biggr )-\frac{1}{2}{{\mathrm{\mathbf {y}}}}^\intercal \left( \left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) +{{\mathrm{\mathbf {K}}}}\right) ^{-1}{{\mathrm{\mathbf {y}}}}\end{aligned}$$
Now we continue with the integral after substituting the previous results and using after that an integration rule of a Gaussian integral (Petersen and Pedersen 2008, 8.1.1 Density and normalization)
$$\begin{aligned}&(2\pi )^{-\frac{(\beta +1) n}{2}}|\sigma ^2{{\mathrm{\mathbf {I}}}}|^{-\frac{\beta }{2}}|{{\mathrm{\mathbf {K}}}}|^{-\frac{1}{2}}{{\mathrm{\mathbf {e}}}}^{-\frac{1}{2}{{\mathrm{\mathbf {y}}}}^\intercal \left( \left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) +{{\mathrm{\mathbf {K}}}}\right) ^{-1}{{\mathrm{\mathbf {y}}}}}\\&\qquad \times \int \limits _{-\infty }^{\infty }\exp \Biggl (-\frac{1}{2}\biggl ({{\mathrm{\mathbf {f}}}}-\biggl (\left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}+{{\mathrm{\mathbf {K}}}}^{-1}\biggr )^{-1}\\&\qquad \times \left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}{{\mathrm{\mathbf {y}}}}\biggr )^\intercal \left( (\frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}})^{-1}+{{\mathrm{\mathbf {K}}}}^{-1}\right) \\&\qquad \times \biggl ({{\mathrm{\mathbf {f}}}}-\left( \left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}+{{\mathrm{\mathbf {K}}}}^{-1}\right) ^{-1}\left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}{{\mathrm{\mathbf {y}}}}\biggr )\Biggr ){{\mathrm{d}}}{{\mathrm{\mathbf {f}}}}\\&\quad = (2\pi )^{-\frac{(\beta +1) n}{2}}|\sigma ^2{{\mathrm{\mathbf {I}}}}|^{-\frac{\beta }{2}}|{{\mathrm{\mathbf {K}}}}|^{-\frac{1}{2}}{{\mathrm{\mathbf {e}}}}^{-\frac{1}{2}{{\mathrm{\mathbf {y}}}}^\intercal \left( (\frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}})+{{\mathrm{\mathbf {K}}}}\right) ^{-1}{{\mathrm{\mathbf {y}}}}}\\&\qquad \times (2\pi )^\frac{n}{2}|(\frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}})^{-1}+{{\mathrm{\mathbf {K}}}}^{-1}|^{-\frac{1}{2}}\\&\quad = (2\pi )^{-\frac{\beta n}{2}}|\sigma ^2{{\mathrm{\mathbf {I}}}}|^{-\frac{\beta }{2}}|{{\mathrm{\mathbf {K}}}}|^{-\frac{1}{2}}{{\mathrm{\mathbf {e}}}}^{-\frac{1}{2}{{\mathrm{\mathbf {y}}}}^\intercal \left( (\frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}})+{{\mathrm{\mathbf {K}}}}\right) ^{-1}{{\mathrm{\mathbf {y}}}}}\\&\qquad \times \left( |\left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) +{{\mathrm{\mathbf {K}}}}|~|\left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}|~|{{\mathrm{\mathbf {K}}}}^{-1}|\right) ^{-\frac{1}{2}}\\&\quad = (2\pi )^{-\frac{\beta n}{2}}|(\sigma ^2{{\mathrm{\mathbf {I}}}})^{-1}|^{\frac{\beta }{2}}|{{\mathrm{\mathbf {K}}}}^{-1}|^{\frac{1}{2}}{{\mathrm{\mathbf {e}}}}^{-\frac{1}{2}{{\mathrm{\mathbf {y}}}}^\intercal \left( (\frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}})+{{\mathrm{\mathbf {K}}}}\right) ^{-1}{{\mathrm{\mathbf {y}}}}}\\&\qquad \times \left( |\left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) +{{\mathrm{\mathbf {K}}}}|~|\left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}|~|{{\mathrm{\mathbf {K}}}}^{-1}|\right) ^{-\frac{1}{2}}\\&\quad = \frac{1}{\sqrt{(2\pi )^{\beta n}|{{\mathrm{\mathbf {K}}}}+(\frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}})|}}\sqrt{\frac{|(\sigma ^2{{\mathrm{\mathbf {I}}}})^{-1}|^\beta }{|(\frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}})^{-1}|}}{{\mathrm{\mathbf {e}}}}^{-\frac{1}{2}{{\mathrm{\mathbf {y}}}}^\intercal \left( {{\mathrm{\mathbf {K}}}}+(\frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}})\right) ^{-1}{{\mathrm{\mathbf {y}}}}}\\&\quad =\mathrm{DENOM} \end{aligned}$$
Notice that by setting \(\beta = 1\) and taking the logarithm, we end up to the well-known log marginal likelihood formula.
Computing the numerator of WBIC for GP
$$\begin{aligned}&\int \limits _{-\infty }^{\infty }\log p({{\mathrm{\mathbf {y}}}}|{{\mathrm{\mathbf {f}}}},{{\mathrm{D}}}) p({{\mathrm{\mathbf {y}}}}|{{\mathrm{\mathbf {f}}}},{{\mathrm{D}}})^{\beta }p({{\mathrm{\mathbf {f}}}}|D){{\mathrm{d}}}{{\mathrm{\mathbf {f}}}}\\ =&\int \limits _{-\infty }^{\infty }\log {{\mathrm{\mathcal {N}}}}({{\mathrm{\mathbf {y}}}};{{\mathrm{\mathbf {f}}}},\sigma ^2{{\mathrm{\mathbf {I}}}}){{\mathrm{\mathcal {N}}}}({{\mathrm{\mathbf {y}}}};{{\mathrm{\mathbf {f}}}},\sigma ^2{{\mathrm{\mathbf {I}}}})^{\beta }{{\mathrm{\mathcal {N}}}}({{\mathrm{\mathbf {f}}}};0,{{\mathrm{\mathbf {K}}}}) {{\mathrm{d}}}{{\mathrm{\mathbf {f}}}}\\ =&\int \limits _{-\infty }^{\infty }\biggl (-\frac{n}{2}\log (2\pi )-\frac{1}{2}\log |\sigma ^2{{\mathrm{\mathbf {I}}}}|\\&\quad -\frac{1}{2}({{\mathrm{\mathbf {y}}}}-{{\mathrm{\mathbf {f}}}})^\intercal (\sigma ^2{{\mathrm{\mathbf {I}}}})^{-1}({{\mathrm{\mathbf {y}}}}-{{\mathrm{\mathbf {f}}}})\biggr )\\&\quad \times \frac{1}{\sqrt{(2\pi )^{\beta n}|\sigma ^2{{\mathrm{\mathbf {I}}}}|^\beta }}{{\mathrm{\mathbf {e}}}}^{-\frac{1}{2}({{\mathrm{\mathbf {y}}}}-{{\mathrm{\mathbf {f}}}})^\intercal (\frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}})^{-1}({{\mathrm{\mathbf {y}}}}-{{\mathrm{\mathbf {f}}}})}\\&\quad \times \frac{1}{\sqrt{(2\pi )^n|{{\mathrm{\mathbf {K}}}}|}}{{\mathrm{\mathbf {e}}}}^{-\frac{1}{2}{{\mathrm{\mathbf {f}}}}^\intercal {{\mathrm{\mathbf {K}}}}^{-1}{{\mathrm{\mathbf {f}}}}}{{\mathrm{d}}}{{\mathrm{\mathbf {f}}}}\\ =&\left( -\frac{n}{2}\log (2\pi )-\frac{1}{2}\log |\sigma ^2{{\mathrm{\mathbf {I}}}}|\right) (2\pi )^{-\frac{(\beta +1) n}{2}}|\sigma ^2{{\mathrm{\mathbf {I}}}}|^{-\frac{\beta }{2}}\\&\quad \times |{{\mathrm{\mathbf {K}}}}|^{-\frac{1}{2}}\int \limits _{-\infty }^{\infty }{{\mathrm{\mathbf {e}}}}^{-\frac{1}{2}({{\mathrm{\mathbf {y}}}}-{{\mathrm{\mathbf {f}}}})^\intercal (\frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}})^{-1}({{\mathrm{\mathbf {y}}}}-{{\mathrm{\mathbf {f}}}})-\frac{1}{2}{{\mathrm{\mathbf {f}}}}^\intercal {{\mathrm{\mathbf {K}}}}^{-1}{{\mathrm{\mathbf {f}}}}}{{\mathrm{d}}}{{\mathrm{\mathbf {f}}}}\\&\quad -\frac{1}{2}\int \limits _{-\infty }^{\infty }({{\mathrm{\mathbf {y}}}}-{{\mathrm{\mathbf {f}}}})^\intercal (\sigma ^2{{\mathrm{\mathbf {I}}}})^{-1}({{\mathrm{\mathbf {y}}}}-{{\mathrm{\mathbf {f}}}})(2\pi )^{-\frac{(\beta +1) n}{2}}|\sigma ^2{{\mathrm{\mathbf {I}}}}|^{-\frac{\beta }{2}}\\&\quad \times |{{\mathrm{\mathbf {K}}}}|^{-\frac{1}{2}}{{\mathrm{\mathbf {e}}}}^{-\frac{1}{2}({{\mathrm{\mathbf {y}}}}-{{\mathrm{\mathbf {f}}}})^\intercal (\frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}})^{-1}({{\mathrm{\mathbf {y}}}}-{{\mathrm{\mathbf {f}}}})-\frac{1}{2}{{\mathrm{\mathbf {f}}}}^\intercal {{\mathrm{\mathbf {K}}}}^{-1}{{\mathrm{\mathbf {f}}}}}{{\mathrm{d}}}{{\mathrm{\mathbf {f}}}}\\ =&\left( -\frac{n}{2}\log (2\pi )-\frac{1}{2}\log |\sigma ^2{{\mathrm{\mathbf {I}}}}|\right) \times \mathrm{DENOM}\\&\quad -\frac{1}{2}(2\pi )^{-\frac{(\beta +1) n}{2}}|\sigma ^2{{\mathrm{\mathbf {I}}}}|^{-\frac{\beta }{2}}|{{\mathrm{\mathbf {K}}}}|^{-\frac{1}{2}}{{\mathrm{\mathbf {e}}}}^{-\frac{1}{2}{{\mathrm{\mathbf {y}}}}^\intercal \left( (\frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}})+{{\mathrm{\mathbf {K}}}}\right) ^{-1}{{\mathrm{\mathbf {y}}}}}\\&\quad \times \int \limits _{-\infty }^{\infty }({{\mathrm{\mathbf {y}}}}-{{\mathrm{\mathbf {f}}}})^\intercal (\sigma ^2{{\mathrm{\mathbf {I}}}})^{-1}({{\mathrm{\mathbf {y}}}}-{{\mathrm{\mathbf {f}}}})\\&\quad \times \exp \Biggl (-\frac{1}{2}\biggl ({{\mathrm{\mathbf {f}}}}-\left( \left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}+{{\mathrm{\mathbf {K}}}}^{-1}\right) ^{-1}\\&\quad \times \left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}{{\mathrm{\mathbf {y}}}}\biggr )^\intercal \left( \left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}+{{\mathrm{\mathbf {K}}}}^{-1}\right) \\&\quad \times \biggl ({{\mathrm{\mathbf {f}}}}-\left( \left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}+{{\mathrm{\mathbf {K}}}}^{-1}\right) ^{-1}\left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}{{\mathrm{\mathbf {y}}}}\biggr )\Biggr ){{\mathrm{d}}}{{\mathrm{\mathbf {f}}}}\end{aligned}$$
By using Matrix Cookbook formula 357 (Petersen and Pedersen 2008, 8.2.2 Mean and variance of square forms) and by compensating missing constants we will have
$$\begin{aligned}&\left( -\frac{n}{2}\log (2\pi )-\frac{1}{2}\log |\sigma ^2{{\mathrm{\mathbf {I}}}}|\right) \times \mathrm{DENOM}\\&\quad -\frac{1}{2}(2\pi )^{-\frac{(\beta +1) n}{2}}|\sigma ^2{{\mathrm{\mathbf {I}}}}|^{-\frac{\beta }{2}}|{{\mathrm{\mathbf {K}}}}|^{-\frac{1}{2}}{{\mathrm{\mathbf {e}}}}^{-\frac{1}{2}{{\mathrm{\mathbf {y}}}}^\intercal \left( (\frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}})+{{\mathrm{\mathbf {K}}}}\right) ^{-1}{{\mathrm{\mathbf {y}}}}}\\&\quad \times (2\pi )^\frac{n}{2}|\left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}+{{\mathrm{\mathbf {K}}}}^{-1}|^{-\frac{1}{2}}\\&\quad \times \left( \left( \left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}+{{\mathrm{\mathbf {K}}}}^{-1}\right) ^{-1}\left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}{{\mathrm{\mathbf {y}}}}-{{\mathrm{\mathbf {y}}}}\right) ^\intercal (\sigma ^2{{\mathrm{\mathbf {I}}}})^{-1}\\&\quad \times \left( \left( \left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}+{{\mathrm{\mathbf {K}}}}^{-1}\right) ^{-1}\left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}{{\mathrm{\mathbf {y}}}}-{{\mathrm{\mathbf {y}}}}\right) \\&\quad +\mathrm{tr}\left( (\sigma ^2{{\mathrm{\mathbf {I}}}})^{-1}\left( \left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}+{{\mathrm{\mathbf {K}}}}^{-1}\right) ^{-1}\right) \\ =&\left( -\frac{n}{2}\log (2\pi )-\frac{1}{2}\log |\sigma ^2{{\mathrm{\mathbf {I}}}}|\right) \times \mathrm{DENOM}\\&\quad -\frac{1}{2}(2\pi )^{-\frac{\beta n}{2}}|\sigma ^2{{\mathrm{\mathbf {I}}}}|^{-\frac{\beta }{2}}|\left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}|^{-\frac{1}{2}}|\left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) +{{\mathrm{\mathbf {K}}}}|^{-\frac{1}{2}}\\&\quad \times {{\mathrm{\mathbf {e}}}}^{-\frac{1}{2}{{\mathrm{\mathbf {y}}}}^\intercal \left( (\frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}})+{{\mathrm{\mathbf {K}}}}\right) ^{-1}{{\mathrm{\mathbf {y}}}}}\\&\quad \times \left( \left( \left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}+{{\mathrm{\mathbf {K}}}}^{-1}\right) ^{-1}\left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}{{\mathrm{\mathbf {y}}}}-{{\mathrm{\mathbf {y}}}}\right) ^\intercal (\sigma ^2{{\mathrm{\mathbf {I}}}})^{-1}\\&\quad \times \left( \left( \left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}+{{\mathrm{\mathbf {K}}}}^{-1}\right) ^{-1}\left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}{{\mathrm{\mathbf {y}}}}-{{\mathrm{\mathbf {y}}}}\right) \\&\quad +\mathrm{tr}\left( (\sigma ^2{{\mathrm{\mathbf {I}}}})^{-1}\left( \left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}+{{\mathrm{\mathbf {K}}}}^{-1}\right) ^{-1}\right) \\ =&~\mathrm{DENOM}\times \Biggl (-\frac{n}{2}\log (2\pi )-\frac{1}{2}\log |\sigma ^2{{\mathrm{\mathbf {I}}}}|\\&\quad -\frac{1}{2}\Biggl [\mathrm{tr}\left( (\sigma ^2{{\mathrm{\mathbf {I}}}})^{-1}\left( \left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}+{{\mathrm{\mathbf {K}}}}^{-1}\right) ^{-1}\right) \\&\quad + \left( \left( \left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}+{{\mathrm{\mathbf {K}}}}^{-1}\right) ^{-1}\left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}{{\mathrm{\mathbf {y}}}}-{{\mathrm{\mathbf {y}}}}\right) ^\intercal (\sigma ^2{{\mathrm{\mathbf {I}}}})^{-1}\\&\quad \times \left( \left( \left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}+{{\mathrm{\mathbf {K}}}}^{-1}\right) ^{-1}\left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) ^{-1}{{\mathrm{\mathbf {y}}}}-{{\mathrm{\mathbf {y}}}}\right) \Biggr ]\Biggr )\\ =&~\mathrm{DENOM}\times (-\mathbf{WBIC}) \end{aligned}$$
In the last line, we observe that as the numerator is divided by the denominator and the numerator has the denominator as a multiplier, then the rest of the numerator must be (negative) WBIC itself. After a manipulation (using the matrix inversion lemma to the all three appropriate terms) we achieve a numerically stable version
$$\begin{aligned}&-\mathbf{WBIC}=-\frac{n}{2}\log \left( 2\pi \sigma ^2\right) \\&\quad -\frac{1}{2\beta }\Biggl [\mathrm{tr}\left( 1-\left( \left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) +{{\mathrm{\mathbf {K}}}}\right) ^{-1}\left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) \right) \\&\quad +\left( \left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) \left( \left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) +{{\mathrm{\mathbf {K}}}}\right) ^{-1}{{\mathrm{\mathbf {y}}}}\right) ^\intercal \left( \left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) \!+\!{{\mathrm{\mathbf {K}}}}\right) ^{-1}{{\mathrm{\mathbf {y}}}}\Biggr ], \end{aligned}$$
which can be further manipulated by taking stuff outside of the trace and interpreting the latter term as a matrix norm. As a result we obtain a neat form
$$\begin{aligned} \mathbf{WBIC}=&\frac{n}{2}\left( \log (2\pi \sigma ^2)+\frac{1}{\beta }\right) \\&\quad +\frac{\sigma ^2}{2\beta ^2}\biggl (||\mathbf {A{{\mathrm{\mathbf {y}}}}}||^2-\mathrm{tr}(\mathbf {A})\biggr )\text {, where}\\&\quad \mathbf {A}=\left( \left( \frac{\sigma ^2}{\beta }{{\mathrm{\mathbf {I}}}}\right) +{{\mathrm{\mathbf {K}}}}\right) ^{-1} \end{aligned}$$
\(\square \)
The exact solution to the pruned GP case
The following exact solution is computed with Mathematica by equating the results of Lemmas 1and 2
Lemma 3
The optimal temperature for the pruned GP regression case with all the terms included is
$$\begin{aligned} \hat{\beta }^\mathrm{opt}=&-\Biggr (-2{{\mathrm{\mathbf {y}}}}^\intercal {{\mathrm{\mathbf {y}}}}\sigma ^2_\mathrm{n}+\sigma ^2_\mathrm{s}nW+2n\sigma ^2_\mathrm{n}W\log \frac{\sigma ^2_\mathrm{n}}{W}+\sqrt{W}\\&\quad \times \sqrt{4({{\mathrm{\mathbf {y}}}}^\intercal {{\mathrm{\mathbf {y}}}})^2\sigma ^2_\mathrm{n}\!+\!\sigma ^4_\mathrm{s}n^2W\!+\!4{{\mathrm{\mathbf {y}}}}^\intercal {{\mathrm{\mathbf {y}}}}n\sigma ^2_\mathrm{n}W\log \frac{W}{\sigma ^2_\mathrm{n}}}\Biggr )\\&\quad \bigg /\left( 2\sigma ^2_\mathrm{s}\left( -{{\mathrm{\mathbf {y}}}}^\intercal {{\mathrm{\mathbf {y}}}}+nW\log \frac{\sigma ^2_\mathrm{n}}{W}\right) \right) , \end{aligned}$$
where \(W=(\sigma ^2_\mathrm{s}+\sigma ^2_\mathrm{n})\).