Abstract
Shape modelling describes methods aimed at capturing the natural variability of shapes and commonly relies on probabilistic interpretations of dimensionality reduction techniques such as principal component analysis. Due to their computational complexity when dealing with dense deformation models such as diffeomorphisms, previous attempts have focused on explicitly reducing their dimension, diminishing de facto their flexibility and ability to model complex shapes such as brains. In this paper, we present a generative model of shape that allows the covariance structure of deformations to be captured without squashing their domain, resulting in better normalisation. An efficient inference scheme based on Gauss-Newton optimisation is used, which enables processing of 3D neuroimaging data. We trained this algorithm on segmented brains from the OASIS database, generating physiologically meaningful deformation trajectories. To prove the model’s robustness, we applied it to unseen data, which resulted in equivalent fitting scores.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
1 Introduction
In neuroimaging studies, or more generally in shape analysis, normalising a set of subjects consists in deforming them towards a common space that allows one-to-one correspondence. Finding this common space usually reduces to finding an optimal shape in terms of distance to all subjects in the space of deformations. However, the covariance structure of these deformations is not known a priori and the deformation metric generally involves penalising roughness. Yet, in a Bayesian setting, a prior that is informative of population variance would make the registration process, which relies on a posteriori estimates, more robust. Shape models aim to learn this covariance structure from the data. As deformations are parameterised in a very high-dimensional space, learning their covariance is computationally intractable. Dimensionality reduction techniques are therefore commonly used, even though some have tackled this problem by parameterising deformations using location-adaptive control points [8].
For a long time, due to their computational complexity, shape modelling approaches had only been applied to simple models of deformations [7] or simple data sets [6, 9]. Recently, Zhang and Fletcher applied a probabilistic shape model, named principal geodesic analysis, to densely sampled diffeomorphisms and 3D MR images of the brain [14]. Diffeomorphisms correspond to a particular family of deformations that are ensured to be invertible, allowing for very large deformations. Geodesic shooting of diffeomorphisms involves specifying a Riemannian metric on their tangent space and allows a diffeomorphism to be entirely parameterised by its initial velocity [3, 11]. However, because Zhang and Fletcher’s optimisation scheme relies either on Gradient descent or on Monte Carlo sampling of the posterior, they have focused on effectively reducing the dimensionality of velocity fields. It results in an effective approach for studying the principle modes of variations, but may give less accurate alignment than with classical approaches. In particular, they do not explicitly model “anatomical noise”, i.e., deformations that are not captured by the principal modes.
Here, we propose a generative shape model, whose posterior is inferred using variational inference and Laplace approximations. A residual velocity field capturing anatomical noise is explicitly defined and its magnitude is inferred from the data. An efficient Gauss-Newton optimisation is used to obtain the maximum a posteriori latent subspace as well as individual coordinates, minimising the chances of falling into local minima, and making the registration more robust.
2 Methods
2.1 Generative Shape Model
First, let us define a generative model of brain shape. Here, the observed variables are supposed to be categorical images (i.e., segmentations) comprising K classes—e.g. grey matter, white matter, background—stemming from a categorical distribution. This kind of data term has proved very effective for driving registration [2] and is compatible with unified models of registration and segmentation. If needed, it is straightforward to replace this term with a stationary Gaussian noise model. The template \(\varvec{\mu }\) encodes prior probabilities of finding each of the K classes in a given location, and is deformed towards the n-th subject according to the spatial transform \(\varvec{\phi }_n\). In practice, its log-representation \(\varvec{a}\) is encoded by trilinear basis functions, and the deformed template is recovered by softmax interpolation [2]:
Note that the discrete operation \(\varvec{a}\circ \varvec{\phi }_n\) can be equivalently written as the matrix multiplication, \(\varvec{\varPhi }_n\varvec{a}\), where \(\varvec{\varPhi }_n\) is a large and sparse matrix that depends on \(\varvec{\phi }_n\) and performs the combined “sample and interpolate” operation. We will name this operation pulling, while the multiplication by its transpose, \(\varvec{\varPhi }_n^{T}\), will be named pushing.
Let \(\left\{ \varvec{f}_n \in \mathbb {R}^{I \times K} ;~ 1 \leqslant n \leqslant N\right\} \) be the set of observed imagesFootnote 1. For each subject, let \(\varvec{\phi }_n \in \mathbb {R}^{I \times 3}\) be the diffeomorphic transformation, and let \(\varvec{\mu }_n = \mathrm {softmax}\left( \varvec{a}\circ \varvec{\phi }_n\right) \) be the deformed template. The likelihood of observed voxel values at locations \(\left\{ \varvec{x}_i ~;~ 1 \leqslant i \leqslant I\right\} \) is:
In this work, diffeomorphisms are defined as geodesics, according to a Riemannian metricFootnote 2 defined by a positive definite operator named L, and are thus entirely parameterised by their initial velocity [11]. A complete transformation \(\varvec{\phi }\) is recovered by integrating the velocity in time, knowing that the momentum \(\varvec{u}_t = L \varvec{v}_t\), is conserved at any t:
The velocity field can be retrieved from the momentum field by performing the inverse operation \(\varvec{v}_t = K \varvec{u}_t\), where K is L’s Green’s function Because we want to model inter-individual variability, we need them to be all defined in the same (template) space, which is achieved by using the initial velocity of their inverseFootnote 3, that we name \(\left\{ \varvec{v}_n \in \mathbb {R}^{I \times 3} ~;~ 1 \leqslant n \leqslant N\right\} \). Following [13], we use the probabilistic principal component analysis (PPCA) framework to regularise initial velocity fields, which leads to writing them as a linear combination of principal modes plus a residual field. Let us assume that we want to model them with \(M \ll 3I\) principal components. Then, let \(\varvec{W}\) be a \(3I \times M\) matrix (called the principal subspace), each column being one principal component, let \(\varvec{z}_n\) be the latent representation of a given velocity field in the principal subspace and let \(\varvec{r}_n\) be the corresponding residual field. This yields \(\varvec{v}_n = \varvec{W}\varvec{z}_n + \varvec{r}_n\). In [13], latent coordinates \(\varvec{z}_n\) stem from a standard Gaussian and \(\varvec{r}_n\) is i.i.d. Gaussian noise, and a maximum-likelihood estimate of the principal subspace is retrieved. A Bayesian version can be designed by placing a Gaussian prior on each principal component [5]. Smooth velocities can be enforced with a smooth prior over each principal component and over the residual field, and a Gaussian prior over the latent coordinates, yielding:
where \(\varvec{L}\) is the discretisation of L and \(\lambda \) is the anatomical noise precision. However, this approach is often not regularised enough. Zhang and Fletcher [14] proposed a different prior, which can be seen as being set over the reconstructed velocities. In practice, it takes the form of a joint distribution over all latent coordinates, residual fields and the principal subspace:
The advantage of the first formulation (4–6) is that it explicates the covariance matrix of the latent coordinates and the noise precision, which can then be inferred from the data. Additionally, it could be extended to multimodal latent distributions such as Gaussian mixtures. However, the second formulation (7) is better at effectively regularising the principal subspace and, in general, the reconstructed velocities. In practice, we use a weighted combination of these two formulations, and call the weights \(\gamma _1\) and \(\gamma _2\).
The noise precision, \(\lambda \), can be inferred in a Bayesian fashion by introducing a conjugate Gamma priorFootnote 4 with \(\alpha = \frac{\nu _0 \times 3I}{2}\) and \(\beta = \frac{\nu _0 \times 3I}{2\lambda _0}\) as shown in [12]. Similarly, the latent covariance matrix is given a conjugate Wishart prior, which is made as non-informative as possible by setting its degrees of freedom to M, and whose expected value is the identity matrix, i.e., . This prior has the opposite effect of an automatic relevance determination prior, since it prevents principal components from collapsing during the first iterations by promoting non-null variances.
Finally, we look for a maximum a posteriori estimate of the template, \(\varvec{a}\), with a very uninformative log-Dirichlet prior that prevents null probabilities (a smooth prior could also be used following [2]). The complete model is depicted in the form of a graphical model in Fig. 1.
2.2 Inference
A basic inference scheme would be to search for a mode of the model likelihood, by optimising in turn each parameter of the model. It is however more consistent to tackle this problem as one of missing data, which is dealt with by computing the posterior distribution over all latent variables. Unfortunately, the posterior does not possess a tractable form. A solution is to use variational inference to describe an approximate posterior q that can be more easily computed, by restricting the search space to distributions that factorise over a subset of variables [4]. This method allows the uncertainty about parameters estimates to be accounted for when inferring other parameters. Here, for computational reasons, we do not perform a fully Bayesian treatment of the problem and look for mode estimates of the principal subspace and template. We still marginalise over all subject-specific parameters (latent coordinates and residual field), as recommended by [1]. We state that the set of marginalised latent variables is \(\varvec{\varUpsilon } = \left\{ \varvec{z}_{1 \dots N}, \varvec{r}_{1 \dots N}, \varvec{A}, \lambda \right\} \) and make the (mean field) approximation that the posterior factorises over \((\varvec{z}_{1 \dots N})\), \((\varvec{r}_{1 \dots N})\), \((\varvec{A})\) and \((\lambda )\).
Since we used conjugate priors for the latent precision matrix and the anatomical noise precision, their posterior have the same form as their prior and update equations are equivalent to computing a weighted average between their prior expected value and their maximum likelihood estimator. In contrast, posterior distributions of \(\varvec{z}_n\) and \(\varvec{r}_n\) have no simple form. We thus make a Laplace approximation and estimate their mean (\(\varvec{z}_n^\star \), \(\varvec{r}_n^\star \)) and covariance (\(\varvec{S}_{\mathrm {z},n}^\star \), \(\varvec{S}_{\mathrm {r},n}^\star \)) with their mode and second derivatives about this mode. They are obtained by Gauss-Newton optimisation and the corresponding derivations are provided in Sect. 2.3. Because of the non-linearity induced by geodesic shooting and template interpolation, we first make the approximation that:
where \(\mathbb {E}_q\) means the posterior expected value and \(\varvec{\mu }_n^\star \) is the template deformed according to the above parameter estimates. Consequently, we find:


2.3 Gauss-Newton Optimisation
Gauss-Newton (GN) optimisation of an objective function \(\mathcal {E}\) with respect to a vector of parameters \(\varvec{x}\) consists of iteratively improving the objective function by making, locally, a second-order approximation. The gradient, \(\varvec{g}\), and Hessian matrix, \(\varvec{H}\), are computed about the current best estimate of the optimal parameters, \(\varvec{x}_i\), and the new optimum is found according to \(\varvec{x}_{i+1} = \varvec{x}_i - \varvec{H}^{-1} \varvec{g}\). In practice, this update scheme sometimes overshoots it is therefore common to perform a backtracking line search along the direction \(-\varvec{H}^{-1} \varvec{g}\).
Differentiating the Data Term: Let us write \(\mathcal {E}\) the (negative) data term for an arbitrary subject:
Following [3], differentiating \(\mathcal {E}\) with respect to \(\varvec{v}\) can be approximated by differentiating with respect to \(\varvec{\varPhi }\) and applying the chain rule, which yields:
where \(\varvec{\nabla } \mathcal {C}_{\varvec{f}}\) is the gradient of the log-Categorical distribution and takes the form of an \(I \times K\) vector field, and \(\varvec{\nabla }\varvec{a}\) contains spatial gradients of the log-template and takes the form of an \(I \times 3\) vector field. Second derivatives can be approximated by:
where \(\varvec{\nabla }^2 \mathcal {C}_{\varvec{f}}\) is the Hessian of the log-Categorical distribution and takes the form of an \(I \times K \times K\) symmetric tensor field. The gradient and Hessian of were derived in [2] and can be computed in each voxel according to:
Since \(\varvec{v} = \varvec{W}\varvec{z} + \varvec{r}\), derivatives with respect to \(\varvec{r}\), \(\varvec{z}\) and \(\varvec{W}\) are obtained by applying the chain rule.
Orthogonalisation: The PPCA formulation is invariant to rotations inside the latent space [13]. Consequently, it allows finding an optimal subspace but does not enforce the individual bases \(\varvec{w}_{1 \dots M}\) to be the eigenmodes of the complete covariance matrix. It makes sense, however, to transform the subspace so that it corresponds to the first eigenmodes as it eases the interpretation. Also, in order to enforce a sparse Hessian matrix over \(\varvec{W}\), we require \(\varvec{Z}\varvec{Z}^{T}\) to be diagonal, where the columns of \(\varvec{Z}\) are the individual \(\varvec{z}_n\). This leads us to look for an \(M \times M\) transformation matrix \(\varvec{T}\) that keeps the actual diffeomorphisms untouched (\(\varvec{W}\varvec{z} = \varvec{W}\varvec{T}^{-1}\varvec{T}\varvec{z}\)), while diagonalising both \(\varvec{Z}\varvec{Z}^{T}\) and \(\varvec{W}^{T}\varvec{L}\varvec{W}\). This is done by a series of singular value decompositions that insure that \(\varvec{T}\varvec{Z}\varvec{Z}^{T}\varvec{T}^{T}\) is diagonal and \(\varvec{T}^{-T}\varvec{W}^{T}\varvec{L}\varvec{W}\varvec{T}^{-1}\) is the identity. However, the distribution of diagonal weights between \(\varvec{W}^{T}\varvec{L}\varvec{W}\) and \(\varvec{Z}\varvec{Z}^{T}\) is not optimal and we optimise an additional diagonal scaling matrix \(\varvec{Q}\) by alternating between updating \(\varvec{A}\) from the rotated \(\mathbb {E}\left[ \varvec{Z}\varvec{Z}^{T}\right] \), and updating the scaling weights by Gauss-Newton optimisation of the remaining terms of the lower bound that depend on them:

3 Experiments and Results
We ran the algorithm on a training set consisting of the first 38 subjects of the OASIS cross-sectional database [10]. We used the provided FSL segmentations, which we transformed into tissue probability maps by extracting the grey and white matter classes and smoothing them with a 1-voxel FWHM Gaussian kernel. We set the number of principal components to 32, the parameters of the membrane, bending and linear-elastic energies were respectively set to 0.001, 0.02 and (0.0025, 0.005), and we used \(\gamma _1 = \gamma _2 = 1\). We set the prior parameters of the residual precision magnitude based on tests conducted on 2D axial slices (\(\lambda _0 = 17\), \(\nu _0 = 10\)). Templates reconstructed with a varying number of principal modes, and with or without the residual field, are presented in Fig. 2, while Fig. 3 shows the template deformed along the first two principal modes, the first one being typical of brain ageing. This pattern is validated by plotting coordinates along the first dimension against actual ages. Finally, the learnt model was tested by registering the template towards 38 unseen images from the OASIS database. The distribution of categorical log-likelihood values for the training and testing sets are depicted in Fig. 4, along with two example fits that were randomly selected. Both sets have similar distributions (mean ± std \(\times 10^5\). Train: \(-7.23 \pm 1.26\) ; Test: \(-7.55 \pm 0.88\)), showing the model’s robustness.
4 Conclusion
We presented a generative model of brain shape that does not limit the space of diffeomorphisms, allowing learning regularisation while preserving enough flexibility for accurate normalisation. We showed how principal modes of variation correlate with known factors of brain shape variability, hinting towards the fact that this low-dimensional representation might allow to discriminate between physiological states. Future research will focus on applying this framework to very large databases, and on combining it with segmentation models in order to work directly with raw data. The latent distribution may be improved by using multimodal priors such as Gaussian mixtures. Our main limitation is the small number of principal basis that can be learned due to their large size. This could be overcome by explicitly modelling sparse and local covariance patterns.
Notes
- 1.
We assume that they all have the same lattice, but this condition can be waived by composing each diffeomorphic transform with a fixed “change of lattice” transform, which can even embed a rigid-body alignment.
- 2.
In this work, it is a combination of membrane, bending and linear-elastic energies.
- 3.
The initial velocity of \(\varvec{\phi }\) is the opposite of the final velocity of \(\varvec{\phi }^{-1}\), and vice versa.
- 4.
The Gamma prior is a parameterised such that \(\mathbb {E}\left[ \lambda \right] = \lambda _0\).
- 5.
q is used for approximate posteriors and \(\mathbb {E}_q\) for posterior expected values. Superscript stars denote optimal approximations.
means “equal up to an additive constant”.
References
Allassonnière, S., Amit, Y., Trouvé, A.: Towards a coherent statistical framework for dense deformable template estimation. J. R. Stat. Soc. Ser. B Stat. Methodol. 69(1), 3–29 (2007)
Ashburner, J., Friston, K.J.: Computing average shaped tissue probability templates. NeuroImage 45(2), 333–341 (2009)
Ashburner, J., Friston, K.J.: Diffeomorphic registration using geodesic shooting and Gauss-Newton optimisation. NeuroImage 55(3), 954–967 (2011)
Bishop, C.: Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, New York (2006)
Bishop, C.M.: Bayesian PCA. In: Kearns, M.J., Solla, S.A., Cohn, D.A. (eds.) NIPS, vol. 11, pp. 382–388. MIT Press (1999)
Cootes, T.F., Twining, C.J., Babalola, K.O., Taylor, C.J.: Diffeomorphic statistical shape models. Image Vis. Comput. 26(3), 326–332 (2008)
Cootes, T., Hill, A., Taylor, C., Haslam, J.: Use of active shape models for locating structures in medical images. Image Vis. Comput. 12(6), 355–365 (1994)
Durrleman, S., Allassonnière, S., Joshi, S.: Sparse adaptive parameterization of variability in image ensembles. Int. J. Comput. Vis. 101(1), 161–183 (2013)
Fletcher, P.T., Lu, C., Pizer, S.M., Joshi, S.: Principal geodesic analysis for the study of nonlinear statistics of shape. IEEE Trans. Med. Imaging 23(8), 995–1005 (2004)
Marcus, D.S., Wang, T.H., Parker, J., Csernansky, J.G., Morris, J.C., Buckner, R.L.: Open access series of imaging studies (OASIS): cross-sectional MRI data in young, middle aged, nondemented, and demented older adults. J. Cognit. Neurosci. 19(9), 1498–1507 (2007)
Miller, M.I., Trouvé, A., Younes, L.: Geodesic shooting for computational anatomy. J. Math. Imaging Vis. 24(2), 209–228 (2006)
Simpson, I.J.A., Schnabel, J.A., Groves, A.R., Andersson, J.L.R., Woolrich, M.W.: Probabilistic inference of regularisation in non-rigid registration. NeuroImage 59(3), 2438–2451 (2012)
Tipping, M.E., Bishop, C.M.: Probabilistic principal component analysis. J. R. Stat. Soc. Ser. B Stat. Methodol. 61(3), 611–622 (1999)
Zhang, M., Fletcher, P.T.: Bayesian principal geodesic analysis for estimating intrinsic diffeomorphic image variability. Med. Image Anal. 25(1), 37–44 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Balbastre, Y., Brudfors, M., Bronik, K., Ashburner, J. (2018). Diffeomorphic Brain Shape Modelling Using Gauss-Newton Optimisation. In: Frangi, A., Schnabel, J., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2018. MICCAI 2018. Lecture Notes in Computer Science(), vol 11070. Springer, Cham. https://doi.org/10.1007/978-3-030-00928-1_97
Download citation
DOI: https://doi.org/10.1007/978-3-030-00928-1_97
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00927-4
Online ISBN: 978-3-030-00928-1
eBook Packages: Computer ScienceComputer Science (R0)