On estimation and influence diagnostics for the Grubbs’ model under heavy-tailed distributions

doi:10.1016/j.csda.2008.10.034

Computational Statistics & Data Analysis

Volume 53, Issue 4, 15 February 2009, Pages 1249-1263

https://doi.org/10.1016/j.csda.2008.10.034 Get rights and content

Abstract

The Grubbs’ measurement model is frequently used to compare several measuring devices. It is common to assume that the random terms have a normal distribution. However, such assumption makes the inference vulnerable to outlying observations, whereas scale mixtures of normal distributions have been an interesting alternative to produce robust estimates, keeping the elegancy and simplicity of the maximum likelihood theory. The aim of this paper is to develop an EM-type algorithm for the parameter estimation, and to use the local influence method to assess the robustness aspects of these parameter estimates under some usual perturbation schemes. In order to identify outliers and to criticize the model building we use the local influence procedure in a study to compare the precision of several thermocouples.

Introduction

The problem of comparing the precision and accuracy of different measuring instruments may appear in various scientific applications like engineering (Grubbs, 1948, Grubbs, 1973) and medicine (Barnett, 1969). Taking measurements of the same unknown characteristic $x$ from different individuals or experimental units has been the usual way for comparing the instruments. These may differ in some aspects such as cost, speed and convenience. The relative quality in the measurements is evaluated considering the precision and bias of the different instruments.

The assessment of robustness aspects of the parameter estimates in statistical models has been an important concern of various researchers in the last decades. The deletion methodology, which consists of studying the impact on the parameter estimates after dropping individual observations, is probably the most employed technique to detect influential observations (see, for example, Cook and Weisberg (1982) and Chatterjee and Hadi (1988)). Nevertheless, the local influence procedure (Cook, 1986), that investigates the influence of small perturbations in the model/data on the parameter estimates, has received increasing attention in the last 20 years, mainly due to its’ flexibility in constructing different kinds of graphics and its’ applicability in various statistical models (see discussion in Cook (1997)). In particular, Galea et al. (2002) and Lachos et al. (2007) applied the methodology in normal comparative calibration and Grubbs’ models, notifying under some usual perturbation schemes the well known lack of robustness of the least-squares estimates against outlying observations.

Several methodologies have been proposed to attenuate the influence of outlying observations on the parameter estimates under normality, such as modifications of the least-squares methodology (see, for instance, Huber (1981)). Other approaches that assume heavy-tailed error distributions, for which the maximum likelihood estimates appear to be robust against extreme observations, have been proposed (see, for example, Galea et al. (2005)). In this work, we will assume scale mixtures of normal distributions (Andrews and Mallows, 1974) for the accommodation of extreme and outlying observations in the Grubbs’ model. Properties of distributions in this class, such as Student-t, power exponential and contaminated normal may be found in Andrews and Mallows (1974) and Lange and Sinsheimer (1993). In this paper, scale mixtures of normal distributions are assumed for the Grubbs’ model, and the hierarchical representation proposed by Pinheiro et al. (2001) is considered. Our aim is to apply the local influence method in the Grubbs’ model under heavy-tailed distributions in order to assess the influence of minor perturbations on the model/data, our results are generalizations of the results obtained by Lachos et al. (2007). The rest of the paper is organized as follows: In Section 2 some inferential aspects are discussed and an EM-type algorithm is developed for the parameter estimation; Section 3 introduces the local influence methodology (Cook, 1986, Zhu and Lee, 2001). The normal curvature for some usual perturbation schemes is derived in Section 4. The methodology is illustrated in Section 5 in which Grubbs’ models under normal and scale mixtures of normal distributions are compared according to the robustness aspects of the maximum likelihood estimates. Finally, some concluding remarks are given in Section 6.

Section snippets

Model description

Grubbs, 1948, Grubbs, 1973, Grubbs, 1983 proposes a linear model for comparing $p$ different instruments, in which the characteristic $x_{i}$ of the $i$ th experimental unit is measured once by all the $p$ instruments. The model assumes the form

$Y_{i j} = α_{j} + x_{i} + ϵ_{i j}, i = 1, \dots, M and j = 1, \dots, p,$ where $Y_{i j}$ denotes the measurement of the $j$ th instrument for the $i$ th experimental unit and $α_{j}$ is called additive bias. The measurement errors $ϵ_{i j}$ are assumed to be independent of the random variables $x_{1}, \dots, x_{M}$ . In addition, one has that

Local influence

The aim of local influence (Cook, 1986) is to investigate the behavior of some influence measure $T (ω)$ when small perturbations are made into the model/data, where $ω$ is a $q$ -dimensional vector of perturbations restricted to some open subset $Ω \subset R^{q}$ . In this work we assess local influence by using an appropriate measure based on the complete log-likelihood function and particularly recommended for incomplete data.

Let $L (θ, ω | Y)$ and $L (θ, ω | Y_{c})$ be the perturbed log-likelihood functions for observed and

Curvature derivation

In this section we will derive the normal curvature for the Grubbs’ model by considering the hierarchical formulation given in (5). We will compute $\ddot{Q} (θ) = \partial^{2} Q (θ | \hat{θ}) / \partial θ \partial θ^{T}$ and $Δ = \partial^{2} Q (θ, ω | \hat{θ}) / \partial θ \partial ω^{T}$ by using results of matrix differentiation described in Magnus and Neudecker (1988). Details on the differential calculations for the matrices $\ddot{Q}$ and $Δ$ under different perturbation schemes are given in Appendix B.

Application

We consider Grubbs’ model given in (2) with the following hierarchical formulation: $Y_{i} | z_{i} \overset{ind}{\sim} S M N_{5} (μ + 1 z_{i}, D (ϕ); H) and$ $z_{i} \overset{ind}{\sim} S M N (0, ϕ_{x}; H), i = 1, \dots, 64,$ where $Y_{i} = {(Y_{i 1}, \dots, Y_{i 5})}^{T}$ , $D (ϕ) = diag (ϕ_{1}, \dots, ϕ_{5})$ and $H$ denote the distribution function for the mixture variable $V_{i}$ , $i = 1, \dots, 64$ .

In our analysis we suppose that the mixture variables follows a Gamma distribution, Beta, discrete and point mass in $V_{i}$ , that is, the marginal response $Y_{i}$ follows a Student-t, slash, contaminated normal and normal distribution,

Concluding remarks

In this work, we have discussed the parameter estimation in the Grubbs’ model under a class of distributions that presents heavier tails than the normal ones. Through a local influence study some aspects of robustness of the maximum likelihood estimators under the scale mixture of normal distributions were noted. Explicit expressions are obtained for matrix $Δ$ under different perturbation schemes considered. It is noted, however that other perturbation schemes can be considered in analogous way.

Acknowledgements

This work was partially supported by the FONDECYT grants 11075071 and 1070919, Chile and CNPq and FAPESP, Brazil. The authors thank the Associate Editor and a referee for valuable suggestions.

References (40)

S.Y. Lee et al.
Influence analysis of nonlinear mixed-effects models
Computational Statistics & Data Analysis
(2004)
F. Osorio et al.
Assessment of local influence in elliptical linear models with longitudinal structure
Computational Statistics & Data Analysis
(2007)
J.Y. Shyr et al.
Inference about comparative precision in linear structural relationships
Journal of Statistical Planning and Inference
(1986)
D.F. Andrews et al.
Scale mixtures of normal distributions
Journal of the Royal Statistical Society, Series B
(1974)
V.D. Barnett
Simultaneous pairwise linear structural relationships
Biometrics
(1969)
E.J. Bedrick
An efficient scores test for comparing several measuring devices
Journal of Quality Technology
(2001)
H. Bolfarine et al.
Structural comparative calibration using the EM algorithm
Journal of Applied Statistics
(1995)
H. Bolfarine et al.
One structural comparative calibration under a t-model
Computational Statistics
(1996)
S. Chatterjee et al.
Sensitivity Analysis in Linear Regression
(1988)
R. Christensen et al.
Test for precision and accuracy of multiple measuring devices
Technometrics
(1993)

R.D. Cook

Assessment of local influence (with discussion)

Journal of the Royal Statistical Society, Ser. B

(1986)

R.D. Cook

R.D. Cook et al.

Residuals and Influence in Regression

(1982)

A.P. Dempster et al.

Maximum likelihood from incomplete data via the EM algorithm (with discussion)

Journal of the Royal Statistical Society, Ser. B

(1977)

K.T. Fang et al.

Symmetric Multivariate and Related Distributions

(1990)

C. Fernández et al.

Multivariate Student-t regression models: Pitfalls and inference

Biometrika

(1999)

Galea, M., 1995. Calibração Comparativa Estrutural e Funcional. Unpublished Ph.D. Dissertation. (Dept. of Statistics,...

M. Galea et al.

Local influence in comparative calibration models

Biometrical Journal

(2002)

M. Galea et al.

Local influence in comparative calibration models under elliptical t-distributions

Biometrical Journal

(2005)

F.E. Grubbs

On estimating precision of measuring instruments and product variability

Journal of the American Statistical Association

(1948)

Cited by (24)

Conditional likelihood inference in a heteroscedastic functional measurement error model
2023, Chemometrics and Intelligent Laboratory Systems
In this paper, we deal with inference about the structural parameters in a heteroscedastic functional measurement error models under the normal distribution assumption. Given a minimal sufficient statistic for the incidental parameters, the conditional maximum likelihood (CML) approach is used. We show that CML estimators have explicit expressions and their sampling distribution is exact. We also show that the classical test statistics to test hypotheses of interest coincide and have exact distributions. We apply the statistical inference tools developed to a data set on comparison of measurement methods.
Outliers detection method of multiple measuring points of parameters in power plant units
2015, Applied Thermal Engineering
Citation Excerpt :
Based on the research of Grubbs, the distribution of test statistics for outlier detection drawn from heavy-tailed distributions was studied in literature [26], which extend classical results of outlier test statistics for the finite-variance case to the heavy-tailed infinite variance case. And literature [27] indicates Grubbs' model is based on the assumption that random terms have a normal distribution, and Osorio et al. proposed robustness aspects of parameter estimates to identify outliers under heavy-tailed distributions. Facing these conclusions, robust statistical outlier detection methods are more applicable for parameters in thermal system with small number of measuring points, because few or no assumptions about the distribution of data set is needed to make, and robust methods do not rely on distribution parameters.
A novel outlier detection method known as modified Grubbs method, which is based on median and median absolute deviation, is proposed to solve outlier detection in multiple measuring points' parameters. Weights are introduced to modify median absolute deviation and the test criterion. In the paper, a comparative study of the proposed method and the original Grubbs method in outlier detection on simulated data is presented. Due to the shortcomings of the original Grubbs method, the modified Grubbs method is a more robust alternative. The performances of the proposed method are illustrated by main steam temperature data set with and without outliers. The obtained results demonstrate that the proposed method can be used in outlier detection in thermal power plants and it is highly efficient and robust.
Robust linear functional mixed models
2015, Journal of Multivariate Analysis
Citation Excerpt :
Ultrastructural, structural and functional elliptical measurement error models are investigated in [7,3,2,39]. It is also the case that recent statistical literature has experienced growing interest in elliptical generalizations of normal models [34,4], perhaps influenced by the development of more efficient computational procedures. The paper is organized as follows.
In this paper we propose a linear functional model with normal random effects and elliptical errors, thus extending the standard normal models considered previously. The corrected score approach (Nakamura, 1990) is used for parameter estimation and the resulting estimators are shown to be consistent and asymptotically normal. The local influence approach (Cook, 1986) is used for assessing influence of small perturbations on the parameter estimates. A simulation study is presented illustrating the good performance of the proposed approach, including the robustness property for the heavier tail models.
On estimation of a heteroscedastic measurement error model under heavy-tailed distributions
2012, Computational Statistics and Data Analysis
It is common in epidemiology and other fields that the analyzing data is collected with error-prone observations and the variances of the measurement errors change across observations. Heteroscedastic measurement error (HME) models have been developed for such data. This paper extends the structural HME model to situations in which the observations jointly follow scale mixtures of normal (SMN) distribution. We develop the EM algorithm to compute the maximum likelihood estimates for the model with and without equation error respectively, and derive closed forms of asymptotic variances. We also conduct simulations to verify the effective of the EM estimates and confirm their robust behaviors based on heavy-tailed SMN distributions. A practical application is reported for the data from the WHO MONICA Project on cardiovascular disease.
Influence analyses of skew-normal/independent linear mixed models
2010, Computational Statistics and Data Analysis
A extension of some diagnostic procedures to skew-normal/independent linear mixed models is discussed. This class provides a useful generalization of normal (and skew-normal) linear mixed models since it is assumed that the random effects and the random error terms follow jointly a multivariate skew-normal/independent distribution. Inspired by the EM algorithm, a local influence analysis for linear mixed models, following Zhu and Lee’s approach is developed. This is because the observed data log-likelihood function associated with the proposed model is somewhat complex and Cook’s well-known approach can be very difficult for obtaining measures of local influence. Moreover, the local influence measures obtained under this approach are invariant under reparameterization. Four specific perturbation schemes are also discussed. Finally, a real data set is analyzed in order to illustrate the usefulness of the proposed methodology.
Multivariate Regression with Stable Errors Using Order Statistics
2022, Fluctuation and Noise Letters

View all citing articles on Scopus

View full text

On estimation and influence diagnostics for the Grubbs’ model under heavy-tailed distributions

Abstract

Introduction

Section snippets

Model description

Local influence

Curvature derivation

Application

Concluding remarks

Acknowledgements

Computational Statistics & Data Analysis

Computational Statistics & Data Analysis

Journal of Statistical Planning and Inference

Scale mixtures of normal distributions

Journal of the Royal Statistical Society, Series B

Simultaneous pairwise linear structural relationships

Biometrics

An efficient scores test for comparing several measuring devices

Journal of Quality Technology

Structural comparative calibration using the EM algorithm

Journal of Applied Statistics

One structural comparative calibration under a t-model

Computational Statistics

Sensitivity Analysis in Linear Regression

Test for precision and accuracy of multiple measuring devices

Technometrics

Assessment of local influence (with discussion)

Journal of the Royal Statistical Society, Ser. B

Residuals and Influence in Regression

Maximum likelihood from incomplete data via the EM algorithm (with discussion)

Journal of the Royal Statistical Society, Ser. B

Symmetric Multivariate and Related Distributions

Multivariate Student-t regression models: Pitfalls and inference

Biometrika

Local influence in comparative calibration models

Biometrical Journal

Local influence in comparative calibration models under elliptical t-distributions

Biometrical Journal

On estimating precision of measuring instruments and product variability

Journal of the American Statistical Association