The iteratively reweighted estimating equation in minimum distance problems

https://doi.org/10.1016/S0167-9473(02)00326-2Get rights and content

Abstract

The class of density based minimum distance estimators provide attractive alternatives to the maximum likelihood estimator because several members of this class have nice robustness properties while being first-order efficient under the assumed model. A helpful computational technique—similar to the iteratively reweighted least squares used in robust regression—is introduced which makes these estimators computationally much more feasible. This technique is much simpler than the Newton–Raphson (NR) method to implement. The loss suffered in the rate of convergence compared to the NR method can be made to vanish in some exponential family situations by a little modification in the weight function—in which case the performance is comparable to the NR method. For a large number of parameters the performance of this modified version is actually expected to be better than the NR method. In view of the widespread interest in density based robust procedures, this modification appears to be of great practical value.

Introduction

Minimum distance estimation forms an important subclass of statistical methodology. Originally, minimum distance functions were developed for goodness of fit purposes. The popular distances in the early literature were the Kolmogorov–Smirnov distance, the Cramér-von Mises distance, as well as weighted versions and other variants of these. The basic ingredient in this approach is the measurement of a distance between the data, summarized by the empirical distribution function, and the hypothesized probability distribution. During the last few decades statisticians have become increasingly aware of the potential of this approach in robust estimation. Much of the work in the minimum distance area was pioneered by Wolfowitz 1953, Wolfowitz 1954, Wolfowitz 1957 in the mid-1950s. There was a revival of this line of research in the early 1980s as evidenced by the works of Parr and Schucany (1980), Boos (1981), Parr and DeWet (1981), Heathcote and Silvapulle (1981), and others. Parr (1981) also provides an extensive bibliography of minimum distance estimation up to that point of time. Works of Wiens (1987), Hettmansperger et al. (1994), Özturk (1994), Özturk and Hettmansperger (1997), and Özturk et al. (1999) have further extended this line of research.

Many of the estimators proposed in the above papers have strong robustness properties under model misspecification. However their robustness is usually achieved at the cost of first order efficiency at the model. On the other hand, a second and a relatively more modern branch of minimum distance estimation, that based on density based distances (or divergences in general) has been shown to produce a large class of estimators which combine attractive robustness properties with full asymptotic efficiency. Beran (1977) appears to be the first to use a density based distance for the purpose of robust inference. He demonstrated that the minimum Hellinger distance estimators are simultaneously robust and first-order efficient. Other authors, such as Stather (1981), Tamura and Boos (1986), Simpson 1987, Simpson 1989 Donoho and Liu 1988a, Donoho and Liu 1988b, Eslinger and Woodward (1991), Basu and Harris (1994), Cao et al. (1995), Markatou (1996), Basu et al. (1997b), and Basu and Basu (1998) have further investigated related estimators. Lindsay (1994) introduced a class of minimum disparity estimators and illustrated the geometry behind their robustness. These ideas were extended to continuous models by Basu and Lindsay (1994). Also see Basu et al. (1997a) for a comprehensive review of minimum disparity inference.

The density based minimum distance estimators (or minimum disparity estimators in particular) provide attractive alternatives to the maximum likelihood estimator. However, the defining equations of the minimum disparity estimators are usually nonlinear and numerical methods have to be applied to solve them. The numerical difficulty increases greatly with the number of parameters. For example, to carry out the estimation of (μ,Σ) in a multidimensional normal model in d dimensions, there are p=d+d(d+1)/2=d(d+3)/2 unknown parameters. Each step of NR requires (p+1)(p+2)/2 numerical integrations and the inversion of a p dimensional Hessian matrix.

In this paper we consider a method closely related to iteratively reweighted least squares with the aim to reduce the numerical difficulty described in the previous paragraph. Our initial motivation came from the fact that the new method is vastly simpler to program and, in the d dimensional normal, requires (p+2) numerical integrations and no matrix inversion per step. Even for the case d=3, the number of parameters is 9 and so each NR step requires 55 numerical integrations and the inversion of a 9×9 matrix, while the new method requires only 11 numerical integrations per step. While the price one might expect to pay for this is a decrease from quadratic to linear convergence, our most striking finding was that by a careful (but very simple) selection of weights, we could make the method competitive in speed with NR even in the univariate model, where p=2. (A simple scalar adjustment makes the method quadratically convergent when the data exactly fit the model). Our theoretical calculations are substantiated by several numerical investigations. We believe this paper demonstrates generally applicable techniques for applying iterative reweighting algorithms in new problems and for making them more efficient. In particular we expect the algorithm described here to be of great practical use in view of the widespread interest in the minimum Hellinger distance and related methods.

The rest of the paper is organized as follows: In Section 2, we provide a brief review of minimum disparity estimation. The main contributions of the paper are presented in Section 3, where we first develop the iteratively reweighted estimating equation (IREE) algorithm in the spirit of iteratively reweighted least squares (IRLS) used in robust regression, and then demonstrate that a simple refinement can make the method comparable in performance to the NR algorithm, while keeping the implementation substantially simpler. Some further issues including a second order analysis, some discussion on the range of applicability of the method in small samples, and a weighted likelihood modification resulting from the IREE idea are discussed in Section 4. A small appendix presents a step by step implementation of the algorithm.

Section snippets

Minimum disparity estimation

Let us briefly review minimum disparity estimation leading up to the estimating equation that we will be concerned with. We start with the discrete model. Let mβ(x) represent the model density function indexed by an unknown β∈Ω; without loss of generality let the sample space be X={0,1,…,}. Let d(x) represent the proportion of observations in a sample of size n that have the value x. Define δ(x)=d(x)/mβ(x)−1 to be the Pearson residual at x. Then a general disparity measure ρ can be expressed in

The iteratively reweighted least squares (IRLS)

We first discuss the IRLS and then develop the IREE along those lines. The IRLS is an algorithm often used in determining the parameter estimates in robust regression. It is generally attributed to Beaton and Tukey (1974), and is far simpler to apply than the NR method. Holland and Welsch (1977), McCullagh and Nelder (1989) and Green (1984) are good general references. Byrd and Pyne (1979) and Birch (1980) discuss convergence results and Del Pino (1989) provides an extensive bibliography.

Second order analysis for the optimally weighted IREE

Let A2=A″(0) represent the second derivative of the residual adjustment function of the disparity evaluated at zero. Lindsay (1994) and Basu and Lindsay (1994) have shown that this plays an important role in determining the theoretical properties of the estimator. In this section we will show that the right-hand side of Eq. (3.12) can be expressed as a function of A2 when the residuals are small.

Direct differentiation of w(x) givesw′(x)=−[A′(δ(x))d(x)−m(x)(A(δ(x))−λ)]u(x),where u(x)=∇m(x)/m(x)

References (48)

  • A. Basu et al.

    Minimum negative exponential disparity estimation in parametric models

    J. Statist. Plann. Inference

    (1997)
  • R. Cao et al.

    Minimum distance density-based estimation

    Comput. Statist. Data Anal.

    (1995)
  • T.P. Hettmansperger et al.

    Minimum distance estimators

    J. Statist. Plann. Inference

    (1994)
  • J.S. Marron

    Comments on a data based bandwidth selector

    Comput. Statist. Data Anal.

    (1989)
  • Basu, A., 1991. Minimum disparity estimation in the continuous case: efficiency, distributions, robustness and...
  • A. Basu et al.

    Penalized minimum disparity methods for multinomial models

    Statist. Sinica

    (1998)
  • A. Basu et al.

    Robust predictive distributions for exponential families

    Biometrika

    (1994)
  • A. Basu et al.

    Minimum disparity estimation for continuous modelsefficiency, distributions and robustness

    Ann. Inst. Stat. Math.

    (1994)
  • Basu, A., Harris, I.R., Basu, S., 1997a. Minimum distance estimation: the approach using density based distances. In:...
  • A.E. Beaton et al.

    The fitting of power series, meaning polynomials, illustrated on band spectroscopic data

    Technometrics

    (1974)
  • R.J. Beran

    Minimum Hellinger distance estimates for parametric models

    Ann. Statist.

    (1977)
  • J.B. Birch

    Some convergence properties of iterated least squares in the location model

    Comm. Statist. B

    (1980)
  • D.D. Boos

    Minimum distance estimators for location and goodness-of-fit

    J. Amer. Statist. Assoc.

    (1981)
  • Byrd, R.H., Pyne, D.A., 1979. Some results on the convergence of the iteratively reweighted least squares. ASA Proc....
  • R. Cao et al.

    The consistency of a smoothed minimum distance estimate

    Scand. J. Statist.

    (1996)
  • G.E. Del Pino

    The unifying role of the iterative generalized least squares in statistical algorithms (with discussions)

    Statist. Sci.

    (1989)
  • L. Devroye

    A Course in Density Estimation

    (1987)
  • Devroye, L., Gyorfi, L., 1985. Nonparametric Density Estimation: The L1 View. John Wiley, New...
  • D.L. Donoho et al.

    The automatic robustness of minimum distance functionals

    Ann. Statist.

    (1988)
  • D.L. Donoho et al.

    Pathologies of some minimum distance estimators

    Ann. Statist.

    (1988)
  • P.W. Eslinger et al.

    Minimum Hellinger distance estimation for normal models

    J. Statist. Comput. Simulation

    (1991)
  • P.J. Green

    Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives (with discussions)

    J. Roy. Statist. Soc. B

    (1984)
  • P. Hall et al.

    Lower bounds for bandwidth selection in density estimation

    Probab. Theory Related Fields

    (1991)
  • W. Härdle et al.

    How far are automatically chosen regression smoothing parameters from their optimum?

    J. Amer. Statist. Assoc.

    (1988)
  • Cited by (14)

    • Divergence measures for statistical data processing - An annotated bibliography

      2013, Signal Processing
      Citation Excerpt :

      Power divergence estimates, based on the divergence (16) and written as M-estimates, are investigated in [32] in terms of consistency, influence function, equivariance, and robustness; see also [33,170,209]. Iteratively reweighted estimating equations for robust minimum distance estimation are proposed in [34] whereas a boostrap root search is discussed in [177]. Recent investigations of the power divergence estimates include robustness to outliers and a local learning property [92] and Bahadur efficiency [118].

    • Minimum disparity computation via the iteratively reweighted least integrated squares algorithms

      2007, Computational Statistics and Data Analysis
      Citation Excerpt :

      We have implemented HMIX and have also experienced slow convergence in our simulation studies. Basu and Lindsay (2004) recently proposed the iteratively reweighted estimating equation (IREE) algorithm for minimum disparity computation. It is a fast algorithm, as also confirmed in our studies, but it can only be applied to some distributions of the exponential family.

    View all citing articles on Scopus
    1

    Lindsay was partially supported by the National Science Foundation under grant DMS 0104443.

    View full text