Solutions to the Behrens–Fisher problem

https://doi.org/10.1016/S0169-2607(02)00021-4Get rights and content

Abstract

When testing the equality of the means from two independent normally distributed populations given that the variances of the two populations are unknown but assumed equal the classical Student's two sample t-test is recommended. If the underlying population distributions are normal with unequal and unknown variances, either Welch's t-statistic or Satterthwaite's Approximate F test is suggested. However, Welch's procedure is non-robust under most non-normal distributions. There is a variable tolerance level around the strict assumptions of data independence, homogeneity of variances, identically and normal distributions. Few textbooks offer alternatives when one or more of the underlying assumptions are not defensible. We have developed an executable FORTRAN code for producing the statistics suggested by Cressie and Whitford, Yuen and Dixon, and Yuen. An executable FORTRAN is available from the author on request (e-mail only).

Introduction

When testing the equality of the means from two independent normally distributed populations given that the variances of the two populations are unknown but assumed equal the classical Student's two sample t-test is recommended. Student's t-test is asymptotically robust, and for finite m and n it possesses type I error robustness if m=n or if the distribution is symmetric; if mn and the distribution is skewed, the effects of departure from normality may be considerable [1]. If the underlying population distributions are normal with unequal and unknown variances, either Welch's t-statistic [2] or Satterthwaite's Approximate F test [3] is suggested. However, Welch's procedure is non-robust under most non-normal distributions [4], [5]. Actual data are more often non-smooth, multi-modal, highly skewed, and have heavy tails [6], [7]. This has serious consequences because even slight departures from normality are known to substantially reduce power when testing hypotheses about means.

Research often relies on Student's t-test to judge treatments and recommend new therapies. These tests are often abused and there is reasonable criticism of their use when the underlying assumptions are not met. Fortunately, there are loose criteria applied to the strict assumptions of data independence, homogeneity of variances, identically and normal distributions.

Every elementary statistics text book addresses the problem of testing the equality of the means from two populations. Few, however, offer alternatives when one or more of the underlying assumptions are not defensible. We have developed an executable FORTRAN code for producing the statistics outlined in this paper suggested by Cressie and Whitford [8], Yuen and Dixon [9], and Yuen [5]. An executable FORTRAN is available from the author on request.

Section snippets

Numerical methods

The two-sample t-test assumes that both samples (Xi, i=1, 2, . . ., m; Yj, j=1, 2, . . ., n) are random and are jointly independent, are identically distributed, are from normal populations (XN(μX, σ2), Xi=1, 2, . . ., m and YjN(μY, σ2), i=1, 2, . . ., n), and have equal variances (σ2X=σ2Y=σ2). When these conditions hold, then the test of whether H0: μx=μy against H1: μxμy or H1: μx>μy is T.

LetT=(X′−Y)/((1/m+1/n)((m−1)S2X+(n−1)S2Y)/(m+n−2))1/2T*=(X′−Y)/(S2X/m+S2Y/n)1/2whereX=iXi/m, and S2X=i(XiX)2/(m−1)Y=

Example

Data for the example come from an experiment reported by Dolkart, Halperin and Perlman [11]. Two groups of mice (normal mice and diabetic mice) were treated with bovine serum albumen (BSA) for 28 days. On the 29th day the amount of BSA nitrogen bound, in μg/ml of undiluted mouse serum, was measured. The hypothesis of interest was that the average amount of BSA bound in normal mice would be greater than the BSA bound by diabetic mice. The data were as follows: normal mice {155.76, 282.00,

Discussion

Testing the equality of two means from independent samples is a common statistical procedure and covered in every elementary statistics textbook. When the underlying distributions are normally distributed with equal population variances, we would use Student's t-test (T). When m=n, a test of equal means based on T possess levels robust against heterogeneous variances. However, T is sensitive to non-normality.

The first exact solution to the case where the distributions are normal but the

Program

We have written and tested a FORTRAN program that produces the statistics outlined in the Numerical Methods section. Both the program in an executable format and sample data sets are available from the author on request (e-mail only). The input data file (TEST.DTA) is in free format form (treatment group, outcome) where the treatment variable is an integer (either a 1 or 2), and the outcome variable is continuous.

Acknowledgements

Anonymous referees serve as a valuable resource to this journal. I am very grateful for the thorough review, comments and suggestions that have improved the presentation of this paper. Thanks.

References (14)

  • M.L. Tiku et al.

    Robust Inference

    (1986)
  • B.L. Welch

    The significance of the difference between two means when the population variances are unequal

    Biometrika

    (1937)
  • F.E. Satterthwaite

    An approximate distribution of estimates of variance components

    Biomet. Bull.

    (1946)
  • N.A. Cressie et al.

    How to use the two sample t-test

    Biomet. J.

    (1986)
  • K.K. Yuen

    The two sample trimmed t for unequal population variances

    Biometrika

    (1974)
  • M.A. Hill et al.

    Robustness in real life: a study of clinical laboratory data

    Biometrics

    (1982)
  • R.R. Wilcox

    Some results on the Tukey–McLalughlin and Yuen methods for trimmed means when distributions are skewed

    Biomet. J.

    (1990)
There are more references available in the full text version of this article.

Cited by (6)

  • Statistical inference on difference or ratio of means from heteroscedastic normal populations

    2010, Journal of Statistical Planning and Inference
    Citation Excerpt :

    Zumbo and Coulombe (1997) discuss the robust rank-order test for non-normal populations with unequal variances. Reed (2003) discusses the solution of BF problem. Lix et al. (2005) discuss robust tests for the multivariate BF problem.

  • Robust tests for the multivariate Behrens-Fisher problem

    2005, Computer Methods and Programs in Biomedicine
  • Effect sizes for research: A broad practical approach

    2014, Effect Sizes for Research: A Broad Practical Approach
  • Effect sizes for research: Univariate and multivariate applications, second edition

    2012, Effect Sizes for Research: Univariate and Multivariate Applications, Second Edition
View full text