Keywords

1 Introduction

The hippocampus plays a key role in cognition and its compromise is a hallmark of several prevalent brain disorders, such as temporal lobe epilepsy (TLE) [1]. With the advent of large-scale neuroimaging data basing and analysis in health and disease, the development of accurate automated segmentation approaches becomes increasingly important.

The majority of automated hippocampal segmentation approaches have operated on a global scale. Recent methods rely on a multi-template framework to account for interindividual anatomical variability. While the majority of previous algorithms employed a purely voxel-based strategy, adopting a surface-based library has shown benefits by improving flexibility to model shape deformations often seen in disease, but also in 10–15 % of healthy subjects [2]. To improve label fusion and image matching, recent studies have adopted patch-based methods that compactly represent shape, anatomy, texture, and intensity [3]. Notably, these approaches have also been successful in non-segmentation tasks, such as image denoising [4] and supersampling [5].

Developments in MRI hardware have begun to generate images of brain anatomy with unprecedented details [6], fostering guidelines to manually delineate hippocampal subfields. Few automated methods have been proposed, each relying on multiple templates together with: (i) Bayesian inference and fusion of ex-vivo/in-vivo landmarks [7, 8], (ii) label propagation to intermediate templates [9], (iii) combinations of label fusion (taking inter-template similarity into account) and post-hoc segmentation correction [10]. These methods operate either on anisotropic T2-weighted images [10], or T1-weighted images with standard millimetric [8, 9], or submillimetric resolution [7]. Only one study so far [9] calculated Dice overlaps of manual and automated labels in millimetric T1-weighted images, with modest performance (Dice: 0.56–0.65).

We propose a novel approach for hippocampal subfield segmentation SurfPatch, which combines multi-template feature matching with deformable parametric surfaces and vertex-wise patch sampling, relying on point-wise correspondence across the template library. Validation was performed using a publically available 3T dataset of manual segmentations together with high- and standard-resolution MRI data of healthy controls [11]. We also applied SurfPatch to 17 TLE patients with hippocampal atrophy, testing its ability to lateralize the seizure focus and compared its performance to two ASHS (Automatic Segmentation of Hippocampal Subfields) [10] and FreeSurfer 5.3 [7].

2 Methodology

Figure 1 below summarizes the segmentation steps.

Fig. 1.
figure 1

Flowchart of the proposed algorithm SurfPatch.

For training, SurfPatch builds the mean patch surface \( S^{\mu } \) and standard deviation (SD) patch surface \( S^{\sigma } \) across the template library (Fig. 1A). For segmentation, it nonlinearly warps each template surface to the test case, re-computes patch features across the warped surface, and normalizes features using surface-based z-scoring (relative to \( S^{\mu } \) and \( S^{\sigma } \)). Based on vertex-wise z-score, it selects a subset of templates, builds an average surface, and performs a deformation for final segmentation (Fig. 1B).

2.1 Training Step (Fig. 1A)

Subfield labels are converted to surface meshes and parameterized using spherical harmonics and a point distribution model (SPHARM-PDM) [12] that guarantees correspondence of surface points (henceforth, vertices) across subjects (Fig. 1 A-1). For a given template t, its reconstructed surface \( S^{t} \) is mapped to its corresponding T1-MRI \( I^{t} \). Let \( x_{k}^{v} \) with \( k \in \left\{ {1, \ldots ,8} \right\} \) be the eight closest voxels of a given vertex \( v \), and let \( P_{x,k}^{t} \) be the corresponding local cubical neighborhoods (i.e., patches) centered around these voxels. We build a vertex patch \( P_{v}^{t} \) by computing a trilinear interpolation of these 8 patches (which is an extension of the trilinear interpolation of the 8 closest voxels). Patches are considered as vectors. By pooling corresponding vertex patches from each template surface, we derive the mean \( P_{v}^{\mu } \) and SD patch \( P_{v}^{\sigma } \) at vertex \( v \):

$$ P_{v}^{\mu } = \frac{{\sum_{t = 1}^{N} \,P_{v}^{t} }}{N}\,\text{and}\,P_{v}^{\sigma } = \sqrt {\frac{{\sum_{t = 1}^{N} \left( {(P_{v}^{t} )^{2} - (P_{v}^{\mu } )^{2} } \right)}}{N}} $$
(1)

where N is the number of templates (Fig. 1A-2).

2.2 Segmentation Step (Fig. 1B)

Registration and Subset Restriction. Each template MRI is nonlinearly registered to the test MRI to increase shape similarity (Fig. 1B-1). We used ANIMAL non-linear registration tool [13], enhanced with a boundary-based similarity measure [14]. Registration was based on a volume-of-interest that includes the labels of all hippocampi in the template library, plus a margin of 10 voxels in each direction to account for additional shape variability. Applying the registration to the library surface \( \widehat{{S^{t} }} \), it is placed on the test MRI. We then re-compute patch features across vertices and compare these patch features \( \widehat{{P_{v}^{t} }} \) with the template library patch distribution, using vertex-wise z-scoring:

$$ F_{v}^{t} = \frac{{\widehat{{P_{v}^{t} }} - P_{v}^{\mu } }}{{P_{v}^{\sigma } }} $$
(2)

This is an element-wise operation. The absolute deviation from the library can be quantified by summing the squared norm of each patch over all vertices through:

$$ F^{t} = \sum\nolimits_{v = 1}^{K} \parallel F_{v}^{t} \parallel_{2}^{2} $$
(3)

where K is the number of vertices. Figure 1B-2 shows vertex-wise deviation maps. Surfaces in the template library are ranked according to this measure, with smaller scores indicating better fit. To obtain an initial estimation, we a performed successive surface averaging (Fig. 1B-3) defined as:

$$ \overline{{S^{k} }} = \frac{{\sum_{l = 1}^{k} \,\frac{1}{{F^{l} }}\widehat{{S^{l} }}}}{{\sum_{l = 1}^{k} \frac{1}{{F^{l} }}}} $$
(4)

where l corresponds to the ranking index. To sum up, \( \overline{{S^{1} }} \) corresponds the best template, \( \overline{{S^{2} }} \) to the weighted surface average of the two best templates and \( \overline{{S^{k} }} \) to the weighted surface average of the k best templates. Corresponding deviation scores \( \overline{{F^{k} }} \) were computed as in (2) and (3), and the one resulting in the minimal measure is chosen as initialization for a deformable model. This selection of templates has the advantage to automatically adapt to the template library size.

Deformable Model.

To further increase segmentation accuracy and to account for potential errors in the preceding steps, we applied parametric deformable model of the surface average \( \overline{S} \) [15]. The use of an explicit parameterization of the surface ensures vertex-wise correspondence across the library that would be otherwise lost (e.g. when using level-sets [16]). The objective function to minimize is composed of a regularization term, based on mechanical properties of the surface (stretching and bending), and a data term, which is represented by the deviation score F. Surface deformation is performed using gradient descent search:

$$ \overrightarrow {{x_{v} }} = \overrightarrow {{x_{v} }} - \gamma \left\{ {\left( {\alpha S_{v}^{(2)} + \beta S_{v}^{(4)} } \right) + \Delta F_{v} } \right\} $$
(5)

where \( \overrightarrow {{x_{v} }} \) represents the spatial coordinates of voxel v, γ is the step size controlling the magnitude of the surface’s deformation; α and β are parameters controlling for surface stretching and bending. S (2) v and S (4) v are surface’s second and fourth order spatial derivative respectively at voxel v. ΔF v represents the gradient of the surface’s deviation score at vertex v. Figure 2B-4 illustrates a final segmentation.

Fig. 2.
figure 2

Overall Dice with respect to patch (A) and library size (B). CA1-3 (red), CA4-DG (green), SUB (blue).

3 Experiments and Results

3.1 Material

The training set includes 25 subjects from a public repositoryFootnote 1 (31 ± 7 yrs, 13 females) of MRI and manually-drawn labels (CA1-3, DG-CA4, SUB; average intra-/inter-rater Dice >90/87 %.) [11]. MRI data consist of isotropic T1-weighted millimetric (1 mm3) and submillimetric (0.6 mm3) 3D-MPRAGE and anisotropic 2D T2-weighted TSE (0.4 × 0.4 × 2 mm3). Images underwent automated correction for intensity non-uniformity [17] and intensity standardization. Submillimetric data was resampled to 0.4 mm3 resolution in MNI152 space. The patient cohort consists of 17 TLE patients. MRI post-processing followed the same steps as in [11]. TLE diagnosis and lateralization of the seizure focus was based on a multi-disciplinary evaluation. Hippocampal atrophy was determined as hippocampal volumes beyond 2SD of the corresponding mean of healthy controls [18].

3.2 Experiments

Parameter optimization and robustness to library size were performed on submillimetric T1-weighted images.

Parameter Optimization.

Parameters for the active contour are empirically set to α = 100 and β = 100. The step size parameter γ is set to 10−5. Performance with regards to patch sizes was evaluated using a leave-one-out (LOO) strategy, based on Dice overlap index between automated/manual segmentations.

Robustness to Template Library Variations and Image Resolution.

For each subject, we randomly decreased the library from the full size (n = 24 in LOO validation) to 1/2 (12), 1/3 (8) and 1/5 (5) of its original size. We repeated this process 5 times. We evaluated performance with smaller template libraries, based on Dice overlaps. We tested whether SurfPatch achieved adequate performance by operating solely on standard 1 mm3 MPRAGE data. In this evaluation, we first linearly upsampled images to 0.4 mm3, followed by the segmentation outlined above. This permitted the use of equivalent patch sizes. In addition to Dice, we computed correlation coefficients between automated as well as manual volumes, and generated Bland-Altman plots.

TLE Lateralization.

Direct dice overlap comparisons between SurfPatch and both ASHS and FreeSurfer are challenged by the absence of a unified subfield segmentation protocol and by the optimization of different algorithms to different MRI sequences. We thus assessed the clinical utility of the different approaches using a “TLE lateralization challenge” that assessed the accuracy of linear discriminant analysis (LDA) classifiers to lateralize the seizure focus in individual patients based on subfields volumes obtained with SurfPatch compared to those using volumes generated by FreeSurfer 5.3Footnote 2 [7] and ASHSFootnote 3 [10]. We ran both algorithms with their required modalities (FreeSurfer: 1 mm3 T1-weighted; ASHS: 0.4 × 0.4 × 2 mm3 T2-weighted) and default parameters. As both algorithms operate in native space, subfields volumes were corrected for intracranial volume by multiplying them by the Jacobian determinant of the corresponding linear transform to MNI152 space. Cross-validation was performed using a 5-Fold scheme, repeated 200 times.

ASHS Evaluation.

Given that it includes an atlas building tool, we also trained ASHS using our template library. Inputs are submillimetric T1-weighted and T2-weighted images, resampled to MNI152 space along with the corresponding labels. T1-weighted images are used for registration and T2-weighted images for segmentation.

3.3 Results

Parameter Optimization and Robustness to Template Library Size.

Maximum accuracy was achieved with a patch size of 13 × 13 × 13 voxels for CA1-3 (% Dice: 87.43 ± 2.47), 19 × 19 × 19 for CA4-DG (82.71 ± 2.85) and 11 × 11 × 11 for SUB (84.95 ± 2.45) (Fig. 2A). Mean Dice indices remained >80 % for all structures when using only 8 templates (Fig. 2B).

Robustness with Respect to Standard T1-Weighted Images.

Segmenting subfields using only standard millimetric T1-weighted images, we obtained accuracy of 85.71 ± 2.48 for CA1-3 (average decrease compared to submillimetric T1-MRI = −1.72 %), 81.10 ± 3.86 for DG (−1.61 %) and 82.21 ± 3.72 for SUB (−2.75 %). We obtained overall higher correlations between manual and automated volumes for submillimetric (Fig. 3A) than for standard images (Fig. 3B; submillimetric/millimetric CA1-3: 0.73/0.64, CA4-DG: 0.44/0.28, SUB: 0.56/0.63). Bland-Altman plots suggested lower bias in submillimetric than standard images (average shrinkage based on submillimetric/millimetric images for CA1-3: 58/131 mm3 (1.6/3.6 % from average manual volume), CA4-DG: 23/83 mm3 (3.4/8.3 %), SUB: 76/35 mm3 (4.2/1.9 %)). Segmentation examples with SurfPatch are shown in Fig. 4.

Fig. 3.
figure 3

Correlations and Bland-Altman plots between automated and manual volumes (in mm3) in submillimetric (A) and millimetric (B) T1-weighted images.

Fig. 4.
figure 4

Manual delineation and SurfPatch automated segmentations relying on a submillimetric T1-weighted image and a standard T1-weighted image.

TLE Lateralization.

Lateralization of the seizure focus in TLE patients was highly accurate when using SurfPatch, both based on submillimetric and millimetric T1-weighted images (>93 %; Table 1). For ASHS and FreeSurfer, we performed two experiments using: (i) single subfields as defined by the anatomical templates and (ii) subfields grouped into CA1-3, DG-CA4 and SUB, as in [11]. Although better results were obtained with the second option, overall performance was lower than with SurfPatch (Table 1).

Table 1. Average accuracy of seizure focus lateralization in TLE.

ASHS Evaluation.

Trained on our library, ASHS achieved similar performance as SurfPatch (CA1-3: 87.36 ± 1.97; CA4-DG: 82.54 ± 3.45; SUB: 85.48 ± 2.43).

4 Discussion and Conclusion

SurfPatch is a novel subfield segmentation algorithm combining surface-based processing with patch similarity measures. Its use of a population-based patch normalization relative to a template library has desirable run-time and space complexity properties. Moreover, it operates on T1-weighted images only, the currently preferred anatomical contrast of many big data MRI initiatives, and thus avoids T2-weighted MRI, a modality prone to motion and flow artifacts.

In controls, accuracy was excellent, with Dice overlap indices of >82 % when submillimetric images were used and only marginal performance drops when using millimetric data. Performance remained robust when reducing the size of the template library, an advantageous feature given high demands on expertise/time for the generation of subfield-specific atlases. While Dice indices across studies need to be cautiously interpreted given differences in protocols, our results compare favorably to the literature. Indeed, FreeSurfer achieved 62 % for CA1, 74 % in CA2-3 and 68 % in DG-CA4 when applied to high-resolution T1-MRI [7]. With respect to ASHS, slightly lower Dice indices than for our evaluations have been previously reported [10], particularly for CA (80 %) and SUB (75 %), whereas similarly high performance was achieved for DG (82 %). It is possible that the reliance of ASHS on anisotropic images presents a challenge to cover shape variability in antero-posterior direction. It has to be noted that ASHS achieved similar performance than SurfPatch, when trained on our library and dataset.

Although ASHS, FreeSurfer and SurfPatch consistently achieved high lateralization performance, learners based on volume measures derived from the latter lateralized the seizure focus more accurately than the other two. Robust performance on diseased hippocampi may stem from the combination of the patch-based framework, offering intrinsic modeling of multi-scale intensity features with surface-based feature sampling, which may more flexibly capture shape deformations and displacements seen in this condition.