An open, multi-vendor, multi-field-strength brain MR dataset and analysis of publicly available skull stripping methods agreement
Introduction
Magnetic resonance (MR) imaging is an important tool in the diagnosis and follow-up care of patients with brain disease and disorders. More specifically, quantitative brain image analysis is playing an increasingly important role in the development and execution of clinical and research studies. Skull stripping, also known as brain extraction or brain segmentation, is the process of segmenting brain from non-brain tissue. In MR images, skull stripping is an initial step for many quantitative image analysis applications, such as multi-modal registration, cortical flattening procedures, and brain atrophy estimation (Smith, 2002). Brain extraction is an active research field (Avants et al., 2011, Iglesias et al., 2011, Beare et al.,, Eskildsen et al., 2012, Kleesiek et al., 2016). To date, there are four main classes of methods proposed for performing skull stripping: 1) manual segmentation, 2) intensity-based models associated with morphology, 3) surface model-based, and 4) hybrid methods.
Manual segmentations are frequently considered to be the “gold-standard” for skull stripping and are often used to validate other automatic and semi-automatic methods. This method, however, is labor intensive, and therefore impractical in large datasets. Intensity-based methods, such as the ones that use the watershed transform (Beare et al.,, Hahn and Peitgen, 2000), require less computational time compared to other methods and are able to include the brain stem, spinal cord, and much of the brain gyral surface in their segmentation results. Unfortunately, intensity-based methods often produce over-segmented results, i.e. results where the structure of interest is split in two or more regions in the final segmentation mask. Model-based methods (Smith, 2002) use a balloon-like template, which is fit to the brain surface using gradient information and smoothing forces. Some model-based methods require registration to an atlas as a pre-processing step and therefore have longer processing times than intensity-based methods. In addition, due to smoothness constraints, they are not typically able to include the spinal cord and brain stem in the segmentation result. As they generate a smoothed surface, they are also not able to properly segment the brain gyral surface. Hybrid methods attempt to combine the best features of intensity-based and model-based methods, require longer processing times, but achieve improved segmentation results (Ségonne, et al.).
Validating automatic and semi-automatic brain extraction methods is a difficult task and often requires comparison against manually segmented data and only a few public datasets include manual segmentation results. Simulated T1-weighted MR images can also be used for validating automatic methods (Lee et al., 2003). For validation of skull stripping methods, the commonly used datasets are those from the Open Access Series of Imaging Studies (OASIS) (Marcus et al., 2007) and the LONI Probabilistic Brain Atlas (LPBA40) (Shattuck et al., 2008). OASIS includes 77 subjects, of which 20 are classified as being cognitively impaired. For each subject in OASIS, three to four T1-weighted 3D MP-RAGE scans were acquired and co-registered. All of the images were collected on a 1.5 T Siemens scanner with a slice thickness of 1.25 mm. The LPBA40 dataset includes 40 coronal 3D T1-weighted spoiled gradient echo MR scans acquired on a 1.5 T General Electric (GE) scanner. The slice thickness of the images was 1.5 mm. These publicly available datasets are relatively small in size and typically do not allow for analysis of other important acquisition parameters, such as scanner vendor and/or magnetic field strength. Nevertheless, in the absence of manually segmented data, while it is not possible to fully characterize segmentation performance, it is possible to detect outliers, and evaluate the overall consistency and similarity between different techniques (Bouix et al., 2007).
Presented in this paper is a multi-vendor, multi-field strength database. The utility of this database is demonstrated by evaluating the agreement of a series of eight publicly available skull stripping methods. In addition, consensus segmentation masks were generated for each subject using the Simultaneous Truth and Performance Level Estimation (STAPLE) method (Warfield et al., 2004). Manual segmentation was performed on a subset of twelve subjects and with a supervised classifier, used to develop what we will call“silver standard” (SS) brain masks across all subjects in order to assess agreement in our study.
Our results show that scanner vendor and magnetic field strength significantly influence the skull stripping results. To the best of our knowledge, this effort is the first work that analyzes the influences of both scanner vendor and magnetic field strength on skull stripping. Previous work has assessed skull stripping performance in data acquired on different scanners at different institutions (Boesen et al., 2004), but used private data, therefore preventing full assessment of the robustness of these studies with respect to vendor and magnetic field strength. Our dataset is publicly accessible, and can be used to optimize skull stripping parameters depending on scanner vendor and magnetic field strength. Also, the dataset can be used to increase the amount of data necessary to train approaches based on deep learning (Kleesiek et al., 2016).
Section snippets
Public dataset - the Calgary-Campinas-359
The public dataset we have developed consists of T1 volumes acquired in 359 subjects on scanners from three different vendors (GE, Philips, and Siemens) and at two magnetic field strengths (1.5 T and 3 T). Data was obtained using T1-weighted 3D imaging sequences (3D MP-RAGE (Philips, Siemens), and a comparable T1-weighted spoiled gradient echo sequence (GE)) designed to produce high-quality anatomical data with 1 mm3 voxels. Older adult subjects were scanned between 2009 and 2016.
Smaller,
Public dataset - the Calgary-Campinas-359 dataset characteristics
Average age of the subjects in the CC-359 database was years (mean standard deviation) with an age range from 29 to 80 years. General demographic information on the dataset is summarized in Table 1. The database included 183 () female subjects ( years; range: 36–80 years) and 176 () male subjects ( years; range: 29–71 years). A significant difference in age distribution (, ANOVA) was found. Post-hoc testing with Bonferroni correction
Discussion
The overall analysis of the skull stripping techniques against the “silver-standard” consensus showed that STAPLE, ANTS and BEaST achieved the highest Dice coefficient metrics and had smaller variance (standard deviation), therefore they have high agreement with the consensus mask and their performance was consistent. STAPLE's Dice coefficient was significantly different compared to all the methods, except for ANTs and MBWSS.
The Dice coefficient metric represents a compromise between
Conclusions
We have proposed and developed a public, multi-centre, multi-field strength T1 3D brain MR dataset and used it to evaluate agreement between eight publicly available skull stripping techniques plus the STAPLE algorithm. The overall analysis indicated that STAPLE, ANTS and BEaST achieve the best Dice coefficients, which reflects a compromise between sensitivity and specificity. Also, although not as robust as ANTs and BEaST, MBWSS obtains comparable results and is capable of correctly segmenting
Acknowledgments
This project was supported by FAPESP CEPID-BRAINN (2013/07559-3) and CAPES PVE (88881.062158/2014-01). Roberto A. Lotufo thanks CNPq (311228/2014-3), Simone Appenzeller thanks CNPq (157534/2015-4), Roberto Souza thanks FAPESP (2013/23514-0) and the NSERC CREATE I3T foundation, Oeslle Lucena thanks FAPESP (2016/18332-8). Richard Frayne is supported by the Canadian Institutes of Health Research (CIHR, MOP-333931) and the Hopewell Professorship in Brain Imaging. Infrastructure in the Calgary Image
References (36)
- et al.
A reproducible evaluation of ANTs similarity metric performance in brain image registration
NeuroImage
(2011) - et al.
Quantitative comparison of four brain extraction algorithms
NeuroImage
(2004) - et al.
On evaluating brain tissue classifiers without a ground truth
NeuroImage
(2007) - et al.
Cortical surface-based analysis. I. Segmentation and surface reconstruction
NeuroImage
(1999) - et al.
BEaST: brain extraction based on non-local segmentation technique
NeuroImage
(2012) - et al.
FSL
NeuroImage
(2012) - et al.
Deep MRI brain extraction: a 3D convolutional neural network for skull stripping
NeuroImage
(2016) - et al.
Evaluation of automated and semi-automated skull-stripping algorithms using similarity index and segmentation error
Comput. Biol. Med.
(2003) - et al.
Putting our heads together: a consensus approach to brain/non-brain segmentation in T1-weighted MR volumes
NeuroImage
(2004) - et al.
A meta-algorithm for brain extraction in MRI
NeuroImage
(2004)
Magnetic resonance image tissue classification using a partial volume model
NeuroImage
Construction of a 3D probabilistic atlas of human cortical structures
NeuroImage
User-guided 3D active contour segmentation of anatomical structures: significantly improved efficiency and reliability
NeuroImage
Robust statistical label fusion through consensus level, labeler accuracy, and truth estimation (COLLATE)
IEEE Trans. Med. Imaging
Fitting linear mixed-effects models using lme4
J. Stat. Softw.
Random forests
Mach. Learn
Classification and Regression Trees
Cited by (173)
A cross-domain complex convolution neural network for undersampled magnetic resonance image reconstruction
2024, Magnetic Resonance ImagingDCT-net: Dual-domain cross-fusion transformer network for MRI reconstruction
2024, Magnetic Resonance ImagingDomain transformation learning for MR image reconstruction from dual domain input
2024, Computers in Biology and MedicineDeep learning for unsupervised domain adaptation in medical imaging: Recent advancements and future perspectives
2024, Computers in Biology and MedicineBASE: Brain Age Standardized Evaluation
2024, NeuroImagek-strip: A novel segmentation algorithm in k-space for the application of skull stripping
2024, Computer Methods and Programs in Biomedicine