Elsevier

NeuroImage

Volume 170, 15 April 2018, Pages 482-494
NeuroImage

An open, multi-vendor, multi-field-strength brain MR dataset and analysis of publicly available skull stripping methods agreement

https://doi.org/10.1016/j.neuroimage.2017.08.021Get rights and content

Highlights

  • A public multi-vendor, multi-field-strength brain MR dataset is proposed and it is now available for download at http://miclab.fee.unicamp.br/tools.

  • Consensus masks are used as “silver-standards” to assess agreement between different skull stripping methods.

  • Influences of scanner magnetic field strength and scanner vendor on skull stripping results are analyzed.

Abstract

This paper presents an open, multi-vendor, multi-field strength magnetic resonance (MR) T1-weighted volumetric brain imaging dataset, named Calgary-Campinas-359 (CC-359). The dataset is composed of images of older healthy adults (29–80 years) acquired on scanners from three vendors (Siemens, Philips and General Electric) at both 1.5 T and 3 T. CC-359 is comprised of 359 datasets, approximately 60 subjects per vendor and magnetic field strength. The dataset is approximately age and gender balanced, subject to the constraints of the available images. It provides consensus brain extraction masks for all volumes generated using supervised classification. Manual segmentation results for twelve randomly selected subjects performed by an expert are also provided. The CC-359 dataset allows investigation of 1) the influences of both vendor and magnetic field strength on quantitative analysis of brain MR; 2) parameter optimization for automatic segmentation methods; and potentially 3) machine learning classifiers with big data, specifically those based on deep learning methods, as these approaches require a large amount of data. To illustrate the utility of this dataset, we compared to the results of a supervised classifier, the results of eight publicly available skull stripping methods and one publicly available consensus algorithm. A linear mixed effects model analysis indicated that vendor (pvalue<0.001) and magnetic field strength (pvalue<0.001) have statistically significant impacts on skull stripping results.

Introduction

Magnetic resonance (MR) imaging is an important tool in the diagnosis and follow-up care of patients with brain disease and disorders. More specifically, quantitative brain image analysis is playing an increasingly important role in the development and execution of clinical and research studies. Skull stripping, also known as brain extraction or brain segmentation, is the process of segmenting brain from non-brain tissue. In MR images, skull stripping is an initial step for many quantitative image analysis applications, such as multi-modal registration, cortical flattening procedures, and brain atrophy estimation (Smith, 2002). Brain extraction is an active research field (Avants et al., 2011, Iglesias et al., 2011, Beare et al.,, Eskildsen et al., 2012, Kleesiek et al., 2016). To date, there are four main classes of methods proposed for performing skull stripping: 1) manual segmentation, 2) intensity-based models associated with morphology, 3) surface model-based, and 4) hybrid methods.

Manual segmentations are frequently considered to be the “gold-standard” for skull stripping and are often used to validate other automatic and semi-automatic methods. This method, however, is labor intensive, and therefore impractical in large datasets. Intensity-based methods, such as the ones that use the watershed transform (Beare et al.,, Hahn and Peitgen, 2000), require less computational time compared to other methods and are able to include the brain stem, spinal cord, and much of the brain gyral surface in their segmentation results. Unfortunately, intensity-based methods often produce over-segmented results, i.e. results where the structure of interest is split in two or more regions in the final segmentation mask. Model-based methods (Smith, 2002) use a balloon-like template, which is fit to the brain surface using gradient information and smoothing forces. Some model-based methods require registration to an atlas as a pre-processing step and therefore have longer processing times than intensity-based methods. In addition, due to smoothness constraints, they are not typically able to include the spinal cord and brain stem in the segmentation result. As they generate a smoothed surface, they are also not able to properly segment the brain gyral surface. Hybrid methods attempt to combine the best features of intensity-based and model-based methods, require longer processing times, but achieve improved segmentation results (Ségonne, et al.).

Validating automatic and semi-automatic brain extraction methods is a difficult task and often requires comparison against manually segmented data and only a few public datasets include manual segmentation results. Simulated T1-weighted MR images can also be used for validating automatic methods (Lee et al., 2003). For validation of skull stripping methods, the commonly used datasets are those from the Open Access Series of Imaging Studies (OASIS) (Marcus et al., 2007) and the LONI Probabilistic Brain Atlas (LPBA40) (Shattuck et al., 2008). OASIS includes 77 subjects, of which 20 are classified as being cognitively impaired. For each subject in OASIS, three to four T1-weighted 3D MP-RAGE scans were acquired and co-registered. All of the images were collected on a 1.5 T Siemens scanner with a slice thickness of 1.25 mm. The LPBA40 dataset includes 40 coronal 3D T1-weighted spoiled gradient echo MR scans acquired on a 1.5 T General Electric (GE) scanner. The slice thickness of the images was 1.5 mm. These publicly available datasets are relatively small in size and typically do not allow for analysis of other important acquisition parameters, such as scanner vendor and/or magnetic field strength. Nevertheless, in the absence of manually segmented data, while it is not possible to fully characterize segmentation performance, it is possible to detect outliers, and evaluate the overall consistency and similarity between different techniques (Bouix et al., 2007).

Presented in this paper is a multi-vendor, multi-field strength database. The utility of this database is demonstrated by evaluating the agreement of a series of eight publicly available skull stripping methods. In addition, consensus segmentation masks were generated for each subject using the Simultaneous Truth and Performance Level Estimation (STAPLE) method (Warfield et al., 2004). Manual segmentation was performed on a subset of twelve subjects and with a supervised classifier, used to develop what we will call“silver standard” (SS) brain masks across all subjects in order to assess agreement in our study.

Our results show that scanner vendor and magnetic field strength significantly influence the skull stripping results. To the best of our knowledge, this effort is the first work that analyzes the influences of both scanner vendor and magnetic field strength on skull stripping. Previous work has assessed skull stripping performance in data acquired on different scanners at different institutions (Boesen et al., 2004), but used private data, therefore preventing full assessment of the robustness of these studies with respect to vendor and magnetic field strength. Our dataset is publicly accessible, and can be used to optimize skull stripping parameters depending on scanner vendor and magnetic field strength. Also, the dataset can be used to increase the amount of data necessary to train approaches based on deep learning (Kleesiek et al., 2016).

Section snippets

Public dataset - the Calgary-Campinas-359

The public dataset we have developed consists of T1 volumes acquired in 359 subjects on scanners from three different vendors (GE, Philips, and Siemens) and at two magnetic field strengths (1.5 T and 3 T). Data was obtained using T1-weighted 3D imaging sequences (3D MP-RAGE (Philips, Siemens), and a comparable T1-weighted spoiled gradient echo sequence (GE)) designed to produce high-quality anatomical data with 1 mm3 voxels. Older adult subjects were scanned between 2009 and 2016.

Smaller,

Public dataset - the Calgary-Campinas-359 dataset characteristics

Average age of the subjects in the CC-359 database was 53.5±7.8 years (mean ± standard deviation) with an age range from 29 to 80 years. General demographic information on the dataset is summarized in Table 1. The database included 183 (50.97%) female subjects (55.5±7.0 years; range: 36–80 years) and 176 (49.03%) male subjects (51.4±8.1 years; range: 29–71 years). A significant difference in age distribution (pvalue<0.001, ANOVA) was found. Post-hoc testing with Bonferroni correction

Discussion

The overall analysis of the skull stripping techniques against the “silver-standard” consensus showed that STAPLE, ANTS and BEaST achieved the highest Dice coefficient metrics and had smaller variance (standard deviation), therefore they have high agreement with the consensus mask and their performance was consistent. STAPLE's Dice coefficient was significantly different compared to all the methods, except for ANTs and MBWSS.

The Dice coefficient metric represents a compromise between

Conclusions

We have proposed and developed a public, multi-centre, multi-field strength T1 3D brain MR dataset and used it to evaluate agreement between eight publicly available skull stripping techniques plus the STAPLE algorithm. The overall analysis indicated that STAPLE, ANTS and BEaST achieve the best Dice coefficients, which reflects a compromise between sensitivity and specificity. Also, although not as robust as ANTs and BEaST, MBWSS obtains comparable results and is capable of correctly segmenting

Acknowledgments

This project was supported by FAPESP CEPID-BRAINN (2013/07559-3) and CAPES PVE (88881.062158/2014-01). Roberto A. Lotufo thanks CNPq (311228/2014-3), Simone Appenzeller thanks CNPq (157534/2015-4), Roberto Souza thanks FAPESP (2013/23514-0) and the NSERC CREATE I3T foundation, Oeslle Lucena thanks FAPESP (2016/18332-8). Richard Frayne is supported by the Canadian Institutes of Health Research (CIHR, MOP-333931) and the Hopewell Professorship in Brain Imaging. Infrastructure in the Calgary Image

References (36)

  • D.W. Shattuck et al.

    Magnetic resonance image tissue classification using a partial volume model

    NeuroImage

    (2001)
  • D.W. Shattuck et al.

    Construction of a 3D probabilistic atlas of human cortical structures

    NeuroImage

    (2008)
  • P.A. Yushkevich et al.

    User-guided 3D active contour segmentation of anatomical structures: significantly improved efficiency and reliability

    NeuroImage

    (2006)
  • A.J. Asman et al.

    Robust statistical label fusion through consensus level, labeler accuracy, and truth estimation (COLLATE)

    IEEE Trans. Med. Imaging

    (2011)
  • D. Bates et al.

    Fitting linear mixed-effects models using lme4

    J. Stat. Softw.

    (2015)
  • R. Beare, J. Chen, C. Adamson, T. Silk, D. Thompson, J. Yang, V. Anderson, M. Seal, A. Wood, Brain extraction using the...
  • L. Breiman

    Random forests

    Mach. Learn

    (2001)
  • L. Breiman et al.

    Classification and Regression Trees

    (1984)
  • Cited by (173)

    View all citing articles on Scopus
    View full text