Abstract
We propose a novel way of estimating fiber orientation distribution functions (fODFs) from diffusion MRI. Our method combines convex optimization with unsupervised learning in a way that preserves the relative benefits of both. In particular, we regularize constrained spherical deconvolution (CSD) with a prior that is derived from an fODF autoencoder, effectively encouraging solutions that are similar to fODFs observed in high-quality training data. Our method improves results on independent test data, especially when only few measurements or relatively weak diffusion weighting (low b values) are available.
This work was supported by the DFG under grant SCHU 3040/1-1. We would like to thank Michael Möller (University of Siegen, Germany) for inspiring discussions. Kanil Patel is now with BCAI - Bosch Center for AI.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
1 Introduction
Estimating fiber orientation distribution functions (fODFs) is a traditional step in analyzing data from diffusion MRI (dMRI). It is usually performed by spherical deconvolution, which solves an ill-conditioned optimization problem with suitable regularizers and constraints [1, 9]. Recently, supervised learning has been explored as an alternative strategy for closely related problems, such as estimating NODDI or diffusional kurtosis [3], microstructure imaging [6], or fiber tractography [7]. Such approaches have been shown to produce robust results, sometimes even from a greatly reduced number of measurements.
This paper aims to harness the power of machine learning also for fODF estimation. A challenge in doing so is the fact that spherical deconvolution involves a response function as a parameter, which is usually estimated individually for each subject [8, 10]. In order to ensure that the learned method will generalize to new subjects, we have to design it so that it does not capture specific assumptions about the response function.
We propose a hybrid solution for this problem that combines traditional convex optimization with learning in such a way that the relative strengths of both approaches are preserved. In particular, we build on an fODF autoencoder, which leverages the ability of unsupervised learning to capture distributions that are too complex to be modeled explicitly: In our case, the distribution of fODFs in high-quality real-world data. We use this autoencoder to regularize optimization-based spherical deconvolution, in which the response function can still be freely exchanged, and traditional constraints can be enforced.
2 Background and Related Work
Spherical deconvolution is the standard approach to fODF estimation. It can be formalized as
where vector \(s\in \mathbb {R}^k\) contains k diffusion-weighted measurements, solution vector f contains spherical harmonics coefficients describing the fODF, matrix \(M=ER\) is the product of a diagonal convolution matrix R and a matrix E that evaluates the spherical harmonics basis functions in the directions of the measurements. Computing \(\hat{f}\) via Eq. (1) is ill-conditioned, and it is typically made more robust by imposing non-negativity constraints [1, 9].
We have been inspired by recent works which demonstrated that, when dealing with other models in diffusion MRI, replacing optimization-based fitting with supervised regression improves robustness and makes it possible to recover plausible estimates from a significantly reduced set of measurements [3, 6]. A straightforward transfer of this idea to Eq. (1) would imply learning a direct mapping from measurements s to model parameters f. We decided against this, since it would effectively hard-wire the convolution matrix R into the mapping. Since R is typically adjusted per-subject [8, 10], this would impair our ability to transfer the resulting method from one subject to another.
Instead, we pursue an alternative strategy that is based on unsupervised learning. It has been inspired by recent works on image restoration, which is a mathematically analogous problem to Eq. (1), with s taking the role of an image degraded by blur and noise, f a reconstructed image, and M a matrix that includes the characteristics of the blur, similar to how M depends on the response function in our case. Our approach follows [2, 5, 13] in using autoencoders to define a regularizer for f, maintaining full flexibility with respect to M. To our knowledge, we are the first to apply this strategy to fODF estimation, and we implement this idea differently than [2, 5, 13].
3 Hybrid fODF Estimation
Our primary goal in regularizing constrained spherical deconvolution is to improve results on suboptimal inputs, such as diffusion MRI data with few gradient directions or low b value. Our proposed method can be written as
where H(f) is our novel objective function. We refer to it as a hybrid method, since it combines convex optimization with a learning-based regularizer, rather than replacing optimization with learning as in [3, 6, 7]. M, f, \(\hat{f}\), and s are defined as in Eq. (1), and \(AE(\hat{f};\varTheta )\) denotes application of an autoencoder with parameters \(\varTheta \) to the preliminary estimate \(\hat{f}\) from standard CSD. We will now justify this use of autoencoders and provide details on how they were trained, then explain the computational pipeline of our method and highlight two theoretical advantages of the hybrid approach.
3.1 Role and Training of Autoencoders in Our Method
Autoencoders are neural networks that are trained in an unsupervised fashion to reconstruct their own input. They are most frequently used to learn a latent representation of the training data, e.g., for dimensionality reduction. Denoising autoencoders are a variant of this idea, trained to reconstruct a clean output from a noisy input. Their use in Eq. (2) is motivated by the observation in [2] that the difference between the input and output of a denoising autoencoder can be interpreted as the gradient of a regularizer that represents a Gaussian-smoothed version of the data distribution. In other words, when dealing with input that is suboptimal for CSD, our regularizer should guide us to prefer fODFs that agree with those observed in more suitable data. Learning the distribution of fODFs, rather than the mapping from the corresponding measurements, increases our ability to generalize to new subjects and measurement protocols.
All our autoencoders were trained with output fODFs that were estimated with standard CSD from 64 uniformly distributed diffusion-weighted images (DWIs) at \(b=2000\), which is clearly above the minimum number of 45 recommended in [10]. We experimented with three different types of autoencoders:
-
A denoising autoencoder (DAE) was trained with input fODFs that have been estimated after perturbing the DWIs with additional Rician noise (SNR = 20)
-
A subsampling autoencoder (SSAE) was trained with input fODFs that have been estimated from a subset of 32 DWIs at \(b=2000\)
-
A quality transfer autoencoder (QTAE) was trained with input fODFs that have been estimated from 30 DWIs at \(b=700\), i.e., a measurement setup commonly used for Diffusion Tensor Imaging, but not for CSD
All autoencoders were trained on 122,324 fODFs from the white matter masks of three subjects; a fourth subject was used as a validation dataset. Hyperparameters were tuned individually for each type of autoencoder. Reported results used network architecture \((64-32-64)\) (i.e., three hidden layers with 64, 32, and 64 neurons, respectively) for the DAE and QTAE, \((128-128-128-128)\) for the SSAE. We obtained better results with sigmoid than with ReLU activation functions, due to the limited size and depth of our networks. Optimization was done using Adadelta [12] with initial learning rate 0.1, decayed by half every 500 epochs, batch size 512, L2 regularization with trade-off parameter 0.01, and dropout with probability 0.01.
3.2 Computational Pipeline and Theoretical Guarantees
Our computational pipeline is illustrated in Fig. 1. The autoencoder is trained in a pre-process. Given new measurements, we first solve a standard CSD according to Eq. (1) to obtain a preliminary estimate \(\hat{f}\). After passing \(\hat{f}\) through the autoencoder, we integrate the regularizer in Eq. (2) into the least squares problem by amending the measurement vector s with \(\sqrt{\lambda }\,AE(\hat{f};\varTheta )\), and matrix M with \(\sqrt{\lambda }\,I\), where the dimension of identity matrix I matches vector f.
Unlike pure learning-based methods that directly map measurements to model parameters [3, 6, 7], our hybrid method allows us to formally guarantee a minimum degree of data fidelity, by maintaining the upper bound
on the fitting error of the final result \(\tilde{f}\). Here, k is the number of diffusion-weighted measurements, and the user’s choice of \(\sigma \) determines by how much we are willing to increase average residual magnitudes per measurement with respect to the unregularized optimum \(\hat{f}\). In our experiments, we set \(\sigma \) to \(5\%\) of the average \(b=0\) signal intensity within a white matter mask.
The constraint in Eq. (3) is maintained by setting \(\lambda =k\sigma ^2/\Vert \hat{f}-AE(\hat{f};\varTheta )\Vert ^2\). To see why, assume that a solution \(\tilde{f}^\prime \) violates it. Since \(\Vert \tilde{f}^\prime -AE(\hat{f};\varTheta )\Vert ^2\) is non-negative, this implies that the value of the objective function from Eq. (2) must be \(H(\tilde{f}^\prime )>\Vert M\hat{f}-s\Vert ^2+k\sigma ^2\). However, according to our choice of \(\lambda \), \(H(\hat{f})=\Vert M\hat{f}-s\Vert ^2+k\sigma ^2\), so \(H(\tilde{f}^\prime )>H(\hat{f})\), and \(\tilde{f}^\prime \) cannot be optimal.
Our method is implemented within the framework of a recently proposed variant of CSD, which represents fODFs as fourth-order tensors with a positive definiteness constraint (H-psd) [1]. The problems in Eqs. (1) and (2) are expressed as quadratic cone programs, and solved using the publicly available package cvxopt [11], as described in more detail in [1]. Note that [1] does not consider regularization based on unsupervised learning, which is our main contribution.
A second theoretical guarantee that is afforded by our hybrid method, but would be non-trivial to integrate into a pure learning-based approach, is preservation of the H-psd constraint. This implies that, while setting \(\sigma =0\) reduces our method to [1], very high values of \(\sigma \) do not simply lead to \(\tilde{f}\approx AE(\hat{f};\varTheta )\), but rather project \(AE(\hat{f};\varTheta )\) onto the set of non-negative fODFs when needed.
4 Results and Discussion
4.1 Quantitative Results
We quantify the benefit from our method on an independent subject, unrelated to the training or validation data. Within a single session, 64 DWIs at \(b=2000\) and 30 DWIs at \(b=700\) were acquired on a 3T Skyra (Siemens), and corrected for head motion using FSL [4]. From these data, we created four test sets:
-
1.
CSD protocol (64 DWIs at \(b=2000\))
-
2.
A subsample of 32 DWIs from the CSD protocol
-
3.
A subsample of 15 DWIs from the CSD protocol
-
4.
A DTI protocol (30 DWIs at \(b=700\))
For each set, Table 1 lists results achieved with standard CSD, as well as with CSD regularized by our three different types of autoencoders. Errors are specified with respect to fODFs that have been estimated with standard CSD from an independent measurement of the same subject (64 DWIs at \(b=2000\), on a 3T Prisma) after spatial registration with the Skyra scan, and corresponding rotation of gradient vectors. Error measures are the L2 norm of differences between estimated and reference fODF coefficients, as well as angular differences between up to three strongest fiber directions, weighted by their corresponding volume fractions. All errors are averaged over the 35,470 voxels in the test subject’s white matter mask.
For all suboptimal inputs (sets 2–4), all autoencoders resulted in a clear improvement. The best result in each column is marked in boldface. Notably, our regularization allows us to achieve similar angular errors from half the measurements (set 2) as standard CSD on the full set of measurements (set 1). Moreover, we achieve similar angular errors from \(b=700\) data (set 4) as standard CSD does given a similar number of DWIs at \(b=2000\) (set 2). We believe that the fact that the reference dataset has been reconstructed using standard CSD from data similar to set 1 makes it difficult to beat standard CSD on that set. Despite this, the SSAE led to a slight improvement even in this case.
Unsurprisingly, best results were obtained when the protocol for generating training data matched the final use case, i.e., the SSAE worked best on set 2, while the QTAE gave best results on set 4. In all experiments, the SSAE and QTAE dominated the DAE. The fact that all autoencoders improved all challenging use cases 2–4, even ones that did not match their training paradigm, indicates that we succeeded in our attempt to construct a method that, in contrast to pure supervised methods such as [3], does not strictly require an expensive re-training for each new use case.
4.2 Practical Benefit from Theoretical Guarantees
We conducted experiments with a simplified variant of our method that directly uses \(AE(\hat{f};\varTheta )\) as the final fODF estimate rather than regularizing deconvolution with it. This simplification sacrifices the formal guarantees that were discussed in Sect. 3.2 and thus helps us understand their practical benefit.
We found that average L2 and angular errors of the two variants were very similar, and none of the two variants dominated the other one. However, we do not only strive for high accuracy on average, but also to achieve an improvement over the baseline for as many individual voxels as possible. As can be seen in Table 2, regularized deconvolution makes a clear contribution towards this goal, since it consistently increases the fraction of voxels for which an improvement over standard CSD is obtained.
This implies that we can often benefit from regularizing CSD with an fODF that has a higher error than the unregularized result, which may seem counter-intuitive. We observed that autoencoders tend to oversmooth the fODFs, sometimes to an extent that increases the L2 error beyond the baseline. In this case, the regularization better balances data fidelity with the level of smoothing.
4.3 Iterating Our Method
Since the image restoration works that have inspired our approach [2, 5, 13] apply the autoencoder repeatedly within an iterative optimization, it was natural to attempt iterating Eq. (2) so that the result \(\tilde{f}\) from the previous iteration replaces the original estimate \(\hat{f}\) in the next one. However, this increased L2 errors almost always, which we found to be due to the onset of oversmoothing.
4.4 Qualitative Results
We also validated our method qualitatively by performing tractography on the resulting fODFs. Figure 2 provides an example from the brainstem region. A tractography on our reference dataset is in (a), while (b) and (c) are based on test set 3 with merely 15 DWIs. As expected from Table 1, our regularizer (c) cannot obtain a perfect reconstruction from this strong subsampling, but it does restore several fibers missing from the unregularized result, marked with ellipses in (b), and it eliminates some spurious fibers, marked with arrows.
5 Conclusion
We have presented a novel method for estimating improved fiber ODFs from suboptimal diffusion MRI data, by using fODF autoencoders to learn a suitable regularizer in an unsupervised manner. On independent test data, our regularizer allowed us to achieve similar angular accuracy from only 32 gradient directions as with the standard approach from 64 directions. Similarly, we achieve comparable angular accuracy from 30 directions at \(b=700\) as the standard approach from 32 directions at \(b=2000\). Our current regularizer only uses information from the same voxel; we hope to extend it to also account for spatial neighborhoods.
On a conceptual level, our work addresses more general challenges such as: How can we use machine learning to produce accurate results not just on average, but in as many individual cases as possible? (By imposing suitable constraints.) How can we enforce formal guarantees on the results of a learning based approach? (By combining it with constrained optimization.) How can we eliminate the need for frequent re-training? (By learning something more general, e.g., the natural distribution of fODFs rather than the mapping from subject- or scanner-specific data to fODFs.) In our future work, we plan to continue exploring strategies for these conceptual challenges, within and beyond this specific application.
References
Ankele, M., Lim, L.H., Gröschel, S., Schultz, T.: Versatile, robust, and efficient tractography with constrained higher-order tensor fODFs. Int. J. Comput. Assist. Radiol. Surg. 12(8), 1257–1270 (2017)
Bigdeli, S.A., Zwicker, M., Favaro, P., Jin, M.: Deep mean-shift priors for image restoration. In: Advances in Neural Information Processing Systems (NIPS), pp. 763–772 (2017)
Golkov, V., et al.: q-space deep learning: twelve-fold shorter and model-free diffusion MRI scans. IEEE Trans. Med. Imaging 35(5), 1344–1351 (2016)
Jenkinson, M., Beckmann, C.F., Behrens, T.E., Woolrich, M.W., Smith, S.M.: FSL. NeuroImage 62(2), 782–790 (2012)
Meinardt, T., Moeller, M., Hazirbas, C., Cremers, D.: Learning proximal operators: using denoising networks for regularizing inverse imaging problems. In: IEEE International Conference on Computer Vision, pp. 1799–1808 (2017)
Nedjati-Gilani, G.L., et al.: Machine learning based compartment models with permeability for white matter microstructure imaging. NeuroImage 150, 119–135 (2017)
Neher, P.F., Côté, M.A., Houde, J.C., Descoteaux, M., Maier-Hein, K.H.: Fiber tractography using machine learning. NeuroImage 158, 417–429 (2017)
Tax, C.M., Jeurissen, B., Vos, S.B., Viergever, M.A., Leemans, A.: Recursive calibration of the fiber response function for spherical deconvolution of diffusion MRI data. NeuroImage 86, 67–80 (2014)
Tournier, J.D., Calamante, F., Connelly, A.: Robust determination of the fibre orientation distribution in diffusion MRI: non-negativity constrained super-resolved spherical deconvolution. NuroImage 35, 1459–1472 (2007)
Tournier, J.D., Calamante, F., Connelly, A.: Determination of the appropriate b value and number of gradient directions for high-angular-resolution diffusion-weighted imaging. NMR Biomed. 26(12), 1775–1786 (2013)
Vandenberghe, L.: The CVXOPT linear and quadratic cone program solvers. Technical report, UCLA Electrical Engineering Department (2010). http://www.seas.ucla.edu/~vandenbe/publications/coneprog.pdf
Zeiler, M.D.: ADADELTA: an adaptive learning rate method. CoRR abs/1212.5701 (2012). http://arxiv.org/abs/1212.5701
Zhang, K., Zuo, W., Gu, S., Zhang, L.: Learning deep CNN denoiser prior for image restoration. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2808–2817 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Patel, K., Groeschel, S., Schultz, T. (2018). Better Fiber ODFs from Suboptimal Data with Autoencoder Based Regularization. In: Frangi, A., Schnabel, J., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2018. MICCAI 2018. Lecture Notes in Computer Science(), vol 11072. Springer, Cham. https://doi.org/10.1007/978-3-030-00931-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-00931-1_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00930-4
Online ISBN: 978-3-030-00931-1
eBook Packages: Computer ScienceComputer Science (R0)