Elsevier

NeuroImage

Volume 154, 1 July 2017, Pages 219-229
NeuroImage

Reprint of ‘A Comprehensive review of group level model performance in the presence of heteroscedasticity: Can a single model control Type I errors in the presence of outliers?’

https://doi.org/10.1016/j.neuroimage.2017.05.032Get rights and content

Highlights

  • In fMRI data analysis, outliers can inflate false positive rates and reduce power.

  • We study a wide array of analysis approaches and heteroscedasticity settings.

  • No model designed to deal with outliers performs perfectly.

  • Visual inspection of data can help avoid the scenarios found to have invalid results.

Abstract

Even after thorough preprocessing and a careful time series analysis of functional magnetic resonance imaging (fMRI) data, artifact and other issues can lead to violations of the assumption that the variance is constant across subjects in the group level model. This is especially concerning when modeling a continuous covariate at the group level, as the slope is easily biased by outliers. Various models have been proposed to deal with outliers including models that use the first level variance or that use the group level residual magnitude to differentially weight subjects. The most typically used robust regression, implementing a robust estimator of the regression slope, has been previously studied in the context of fMRI studies and was found to perform well in some scenarios, but a loss of Type I error control can occur for some outlier settings. A second type of robust regression using a heteroscedastic autocorrelation consistent (HAC) estimator, which produces robust slope and variance estimates has been shown to perform well, with better Type I error control, but with large sample sizes (500–1000 subjects). The Type I error control with smaller sample sizes has not been studied in this model and has not been compared to other modeling approaches that handle outliers such as FSL's Flame 1 and FSL's outlier de-weighting. Focusing on group level inference with a continuous covariate over a range of sample sizes and degree of heteroscedasticity, which can be driven either by the within- or between-subject variability, both styles of robust regression are compared to ordinary least squares (OLS), FSL's Flame 1, Flame 1 with outlier de-weighting algorithm and Kendall's Tau. Additionally, subject omission using the Cook's Distance measure with OLS and nonparametric inference with the OLS statistic are studied. Pros and cons of these models as well as general strategies for detecting outliers in data and taking precaution to avoid inflated Type I error rates are discussed.

Introduction

When analyzing fMRI data, even with thorough preprocessing, it is likely that artifacts will prevail in some subject's data causing outlying blood oxygen level dependent (BOLD) contrast estimates in the group level analyses. This can be a concern when the group level model involves a continuous covariate, since outliers can easily influence the fit of a regression line. It can also be an issue with categorical covariates, although mean estimates are often less impacted by outliers than regression slopes. A drawback of the most common analysis strategy for imaging data is that it involves blindly applying a model in a voxelwise fashion, inspecting only the p-value maps. Comparatively, in a standard single regression analysis, say using behavioral data only, multiple plotting strategies and statistical assessments are used to study heteroscedasticity and other violations of regression assumptions. This practice is somewhat difficult in voxelwise analyses and so a common goal is to find a model, such as robust regression, that aims to detect and downweight the contribution of outliers in a regression analysis. Although there have been studies that focus on models that are robust to outliers (Wager et al., 2005, Fritsch et al., 2015, Woolrich, 2008), in each case only subsets of robust models have been compared over a somewhat limited set of heteroscedasticity scenarios and not all focused on performance with continuous regressors, but focus on group 1-sample t-tests. The purpose of this work is to examine the Type I error rate across a wide selection of regression models, including some that have not been considered in the context of fMRI analysis. Also, a larger set of heteroscedasticity settings, varying both the type and degree of heteroscedasticity are considered.

The most commonly used robust regression approaches rely on estimators of the regression slope that are robust to outliers. Another class of robust regression approaches, utilizing heteroscedastic autocorrelation consistent (HAC) estimators, also provide robust variance estimates. These will be referred to as “doubly robust” since both the slope and variance estimators are robust to outliers. In this work, the models compared are two types robust regression (singly and doubly robust), FSL's Flame 1 (similar to AFNI's MEMA), FSL's Flame 1 with outlier de-weighting, Ordinary Least Squares (OLS), which is equivalent to most commonly used model in SPM and AFNI, and Kendall's rank correlation. Improvements to OLS also considered are removing subjects according to the Cook's D metric and using nonparametric inference, which has fewer assumptions than parametric inference. All other approaches rely on parametric inference.

Due to the repeated measures nature of fMRI data, the variance structure has both a within-subject and a between-subject variance component and the outliers can be driven by heteroscedasticity in either of these variances. Past works only consider model comparisons with heteroscedasticity within one of these variance types, whereas here the comparison is across all models with heteroscedasticity in either variance component. Lastly, a wider selection of heteroscedastic variance patterns are considered, including univariate outliers, multivariate outliers and heteroscedasticity that correlates with the group model covariate (e.g. variance in BOLD contrast increases with an impulsivity measure of interest). Also, instead of only considering one level of outlying variance, a continuum of outlier degree is studied, illustrating how models perform with weak and strong outliers.

The residual plots (residual versus explanatory variable) in Fig. 1 illustrate the heteroscedasticity settings considered here. In the univariate outlier case (top row), the outlier is either in the explanatory variable or in the explained variable, while in the multivariate case (bottom left) both the explanatory and explained values are outlying. The final case, which has never been considered in robust regression studies of neuroimaging data, is when the variance increases along with the explanatory variable (bottom right). This will be referred to as heteroscedasticity without outliers, since there are no clear outlying values, but the variance is still heterogeneous.

Here it is assumed that each subject has a single functional run of data and in this case the standard modeling approach is the two-stage summary statistics model (Mumford and Nichols, 2006). The first stage models the time series data and, for subject i, results in a within-subject estimate of the BOLD contrast, β^i, as well as the within-subject variance of the contrast, which will be denoted σw,i2. The second stage model combines the within-subject contrast estimates and their variances in a group model. This model results in a group contrast estimate, γ, as well as a between-subject variance, σ2b, which is combined with the within-subject variance to form the mixed effects variance, σw,i2+σb2. Specifically, for subject i, let β^i be the level 1 contrast estimate, Wi is the group level covariate value (assumed to be a scalar), and γ is the group-level parameter (regression slope) thenβ^iN(Wiγ,σw,i2+σb2).

Given this structure, it is clear that outliers in the β^i can be driven either by inflated within- or between-subject variance. To be clear, the focus here is on outliers in the first level parameter estimates (β^i) and not in the time series data, which are not directly studied in this work. Of course it could be the case that a subject with multiple outliers in their time series data, say due to motion, may have an inflated value for σw,i2. The following section describes the various estimation strategies for this group model and their corresponding assumptions. Specifically, some models will simplify the variance structure by assuming the within-subject variance is constant across subjects.

The specific details of the group models will be given in the methods, but will be broadly discussed here. The simplest group model is OLS, where the within-subject variance is assumed to be constant across subjects, simplifying the mixed effects variance to a single parameter, σ2, so Eq. (1) becomes β^iN(Wiγ,σ2). The estimate of this model is found by least squares, which is the value of γ that minimizesi=1N(β^iWiγ)2.This estimator is commonly used in the SPM software package (www.fil.ion.ucl.ac.uk/spm/and in AFNI (afni.nimh.nih.gov/afni/).

In comparison, FSL's (fsl.fmrib.ox.ac.uk/fsl) Flame 1 model (Woolrich et al., 2004) and AFNI's MEMA model (Chen et al., 2012) do not relax the assumption of Eq. (1), and allow for heteroscedasticity in the within-subject variance. Univariate outliers driven by the within-subject variance, alone, and the impact on Type I error and power between this style of model and OLS was compared in Beckmann et al. (2003) and Mumford and Nichols (2009). The model of interest was the 1-sample t-test and it was found that Type I error was preserved for both approaches, but power was slightly reduced for OLS in the presence of univariate outliers. This work will instead focus on continuous regressors in the group level, which has not yet been done for this model comparison.

In robust regression, a score function is used to differentially weight subjects according to the size of their residual, which will account for some forms of heteroscedasticity. Specifically, the ratio of the subject's residual and the overall standard deviation, σ, is passed into a function ρ and the minimization problem is defined by finding γ that minimizesi=1Nρ(β^iWiγσ).Effectively this turns into a weighted linear regression, where outlying subjects, as determined by their residual magnitude, contribute less to the parameter. The weight is a function of the score function, ρ. Common settings for ρ include Tukey's Bisquare or Huber's loss function, which are plotted in Fig. 2 and must be chosen when running a robust regression. A second choice involves the estimator of σ and options include M, S and MM. Among other properties, these estimators differ in their computational ease and what is called the breakdown point, which indicates what proportion of the data can contain outliers before the estimator may fail. The M estimator uses a median absolute deviation (MAD) estimator of σ. Although it is computationally simple, it has a breakdown point of 0 (Huber, 1981). On the other hand, S estimators use a residual scale estimator of σ and are also simple to estimate, but have a breakdown point of 50%. The downside is the estimates tend to have low efficiency (i.e. are more variable). The MM estimator combines the S and M estimators, where the S estimation strategy is used as a starting point for the M estimator, which allows for a higher breakdown point (50%) while retaining efficiency. The specifics about these estimators can be found in Croux et al. (2003). Since the parameter, σ, appears within the minimization step, the two parameters, γ and σ are estimated iteratively using what is known as iteratively reweighted least squares. In this work, this model will be referred to as “singly robust” as only the estimate of γ is robust to outliers and not σ.

In Wager et al. (2005) both the Bisquare and Huber loss function were considered in singly robust models in comparison with OLS and some other estimation approaches not revisited here (e.g. dropping subjects with high Mahalanobis distance). Outliers were either univariate or multivariate and the regression setting was with a single continuous regressor and the sample sizes considered were between 20 and 40 subjects. The findings indicated that Type 1 error is controlled for univariate outliers in the explained variable for both robust methods and OLS, but OLS suffered from a loss in power. In the multivariate case, no method properly controlled Type I error rates.

A slightly different approach to robust regression was presented in Fritsch et al. (2015), where a Randomized Parcellation Based Inference (Da Mota et al., 2014) is combined with robust regression. Huber's loss function was used and the sample size was much larger than in Wager et al. (2005), typically 400–1000 subjects. Competing models included OLS, support vector regression (SVR) and least trimmed squares (LTS) and univariate outliers in models with continuous regressors were considered. As in Wager et al. (2005), OLS and robust regression were found to control Type I errors, but with a loss in power for OLS when univariate outliers were present. Interestingly, although Type I errors were found to be controlled, the real data analysis revealed a significant cluster with OLS that was not present with robust regression, in which a single influential outlier caused the false positive activation. This does not contradict the finding that Type I errors are preserved with OLS, but instead reflects that when using a p-value threshold of 0.05 there is still a 5% chance for false positives to occur with either method and the two methods will not necessarily both experience false positives with the same data.

The robust regression estimator considered up until now only provides an estimate of γ that is robust to outliers. This is the most commonly found robust regression model across software packages. Within the R software package, in some cases the p-values are output as part of robust regression output, but in many cases, such as the rlm, p-values are not supplied. This is because there is a debate over the proper way to derive p-values for robust regression (Croux et al., 2003). The R software package's lmrob function does output p-values and, therefore, may be thought to be a better robust model. This function uses an estimation strategy that additionally provides robust standard error estimates that are heteroscedasticity and autocorrelation consistent (HAC) in addition to being robust to outliers. The derivation of this estimator can be found in Croux et al. (2003), where, for large sample sizes (1000 subjects) it was found to have the strongest control of Type I error over singly robust models in the case of heteroscedasticity without outliers. Most notably, the singly robust approach and OLS were not able to control the Type I error rate due to an underestimate of the standard error. The weakness of the HAC estimator is when the errors are homoscedastic. Although the Type I error is preserved, at the large sample sizes they considered, the estimate of the standard error suffers from a loss in precision (Croux et al., 2003). Although this work provides promise for doubly robust regression, the sample sizes and heteroscedasticity settings were not realistic representations of a typical fMRI study, where sample sizes are likely less than 100 and if heteroscedasticity without outliers is present, it will not be as dramatic as in the simulations used in Croux et al. (2003).

Lastly, the outlier de-weighting algorithm that is part of FSL (Woolrich, 2008) models the between-subject variance as a mixture of two Gaussian distributions while also allowing for heteroscedastic within-subject variances using Flame 1. Inference is based on a Bayesian framework, using an expectation-maximization estimation. This allows subjects to have difference between-subject variances and be weighted differentially in the model estimation. In Woolrich (2008), comparison models included OLS, singly robust with Tukey's Bisquare loss function and a permutation-based test that incorporated the within-subject variances. The permutation method used the lower level variances by permuting data in the usual way, but using the Flame 1 T-statistic instead of an OLS statistic. This permutation strategy was not considered here because in the situations where Flame 1 had issues, permutation tests would not be likely to offer any improvement due to exchangeability assumption violations. The simulations focused on univariate outliers in the between-subject variance with a 1-sample t-test as well as regression with a single continuous covariate. The Flame 1 with outlier de-weighting approach was found to generally perform better than OLS and robust regression with better control of Type I errors and higher power. The permutation test results fell between that of OLS and Flame 1 with outlier de-weighting. Importantly, Woolrich (2008) found that when the regression covariate was skewed, the Flame 1 with outlier de-weighting approach did not perform well. This is to be expected, since it would deviate from the assumption that the distribution of the errors follows a mixture of two Gaussian distributions.

This work combines all of these models to study and compare their performances across a wider set of heteroscedasticity settings than have been previously used and focus on a wide array of sample sizes, from 30 to 500. Also, instead of only focusing on a single magnitude of outlier, a continuum of outlier values is studied to provide a thorough comparison of the models. The primary question is whether there will be a model that can perform well in all situations and, if not, what can be done to help control the influence of outliers on results, while preserving the Type I error rate.

Section snippets

Models considered

The robust regression routines used were from the R software package (www.r-project.org. For singly robust regression, the rlm function within the MASS library was used and the doubly robust regression, using the HAC estimator for the variance, was implemented using the lmrob function within the robustbase library. In both cases MM estimation using Tukey's bisquare score function was used. For OLS regression, the lm function was used and when using a Cook's D criterion to select and omit

Results

The following will present the Type I error and power results for all simulations. The OLS model using Cook's D<4/N never appears in plots, as this approach did not control the Type I error well in any of the simulations. Also, note that throughout the results the Permutation-based results were so similar to OLS that a separate line was not plotted. This may seem unexpected, but is a result of heteroscedasticity violating the exchangeability assumption of the permutation test. More about the

Discussion

This work has presented an extended review of models typically used in fMRI analysis, or considered when outliers may be present. Type I error rates and power were compared across a wider set of heteroscedasticity settings, degree of heteroscedasticity and sample sizes than has been done in previous work. The findings support that there is not a single model that can control Type I error properly for all types of heteroscedasticity, although some approaches should be avoided regardless of

Conclusion

There is not a single model that can handle all types of heteroscedasticity for all sample sizes, although for large sample sizes (>500), the doubly robust model is a good alternative. The worst scenarios, in terms of controlling Type I error, involved issues with the explanatory variable, which is simple to visually inspect. If the explanatory variable is skewed, a transformation should be considered. If there are outliers in the explanatory variable, they should be considered from removal,

Acknowledgements

This work was supported by NIH grant P30HD003352.

References (21)

There are more references available in the full text version of this article.

Cited by (0)

A publisher's error resulted in this article appearing in the wrong issue. The article is reprinted here for the reader's convenience and for the continuity of the special issue. For citation purposes, please use the original publication details; J.A. Mumford/NeuroImage 147 (2017) 658–668.

View full text