The iteratively reweighted estimating equation in minimum distance problems

doi:10.1016/S0167-9473(02)00326-2

Computational Statistics & Data Analysis

Volume 45, Issue 2, 1 March 2004, Pages 105-124

https://doi.org/10.1016/S0167-9473(02)00326-2 Get rights and content

Abstract

The class of density based minimum distance estimators provide attractive alternatives to the maximum likelihood estimator because several members of this class have nice robustness properties while being first-order efficient under the assumed model. A helpful computational technique—similar to the iteratively reweighted least squares used in robust regression—is introduced which makes these estimators computationally much more feasible. This technique is much simpler than the Newton–Raphson (NR) method to implement. The loss suffered in the rate of convergence compared to the NR method can be made to vanish in some exponential family situations by a little modification in the weight function—in which case the performance is comparable to the NR method. For a large number of parameters the performance of this modified version is actually expected to be better than the NR method. In view of the widespread interest in density based robust procedures, this modification appears to be of great practical value.

Introduction

Minimum distance estimation forms an important subclass of statistical methodology. Originally, minimum distance functions were developed for goodness of fit purposes. The popular distances in the early literature were the Kolmogorov–Smirnov distance, the Cramér-von Mises distance, as well as weighted versions and other variants of these. The basic ingredient in this approach is the measurement of a distance between the data, summarized by the empirical distribution function, and the hypothesized probability distribution. During the last few decades statisticians have become increasingly aware of the potential of this approach in robust estimation. Much of the work in the minimum distance area was pioneered by Wolfowitz 1953, Wolfowitz 1954, Wolfowitz 1957 in the mid-1950s. There was a revival of this line of research in the early 1980s as evidenced by the works of Parr and Schucany (1980), Boos (1981), Parr and DeWet (1981), Heathcote and Silvapulle (1981), and others. Parr (1981) also provides an extensive bibliography of minimum distance estimation up to that point of time. Works of Wiens (1987), Hettmansperger et al. (1994), Özturk (1994), Özturk and Hettmansperger (1997), and Özturk et al. (1999) have further extended this line of research.

Many of the estimators proposed in the above papers have strong robustness properties under model misspecification. However their robustness is usually achieved at the cost of first order efficiency at the model. On the other hand, a second and a relatively more modern branch of minimum distance estimation, that based on density based distances (or divergences in general) has been shown to produce a large class of estimators which combine attractive robustness properties with full asymptotic efficiency. Beran (1977) appears to be the first to use a density based distance for the purpose of robust inference. He demonstrated that the minimum Hellinger distance estimators are simultaneously robust and first-order efficient. Other authors, such as Stather (1981), Tamura and Boos (1986), Simpson 1987, Simpson 1989 Donoho and Liu 1988a, Donoho and Liu 1988b, Eslinger and Woodward (1991), Basu and Harris (1994), Cao et al. (1995), Markatou (1996), Basu et al. (1997b), and Basu and Basu (1998) have further investigated related estimators. Lindsay (1994) introduced a class of minimum disparity estimators and illustrated the geometry behind their robustness. These ideas were extended to continuous models by Basu and Lindsay (1994). Also see Basu et al. (1997a) for a comprehensive review of minimum disparity inference.

The density based minimum distance estimators (or minimum disparity estimators in particular) provide attractive alternatives to the maximum likelihood estimator. However, the defining equations of the minimum disparity estimators are usually nonlinear and numerical methods have to be applied to solve them. The numerical difficulty increases greatly with the number of parameters. For example, to carry out the estimation of (μ,Σ) in a multidimensional normal model in d dimensions, there are p=d+d(d+1)/2=d(d+3)/2 unknown parameters. Each step of NR requires (p+1)(p+2)/2 numerical integrations and the inversion of a p dimensional Hessian matrix.

In this paper we consider a method closely related to iteratively reweighted least squares with the aim to reduce the numerical difficulty described in the previous paragraph. Our initial motivation came from the fact that the new method is vastly simpler to program and, in the d dimensional normal, requires (p+2) numerical integrations and no matrix inversion per step. Even for the case d=3, the number of parameters is 9 and so each NR step requires 55 numerical integrations and the inversion of a 9×9 matrix, while the new method requires only 11 numerical integrations per step. While the price one might expect to pay for this is a decrease from quadratic to linear convergence, our most striking finding was that by a careful (but very simple) selection of weights, we could make the method competitive in speed with NR even in the univariate model, where p=2. (A simple scalar adjustment makes the method quadratically convergent when the data exactly fit the model). Our theoretical calculations are substantiated by several numerical investigations. We believe this paper demonstrates generally applicable techniques for applying iterative reweighting algorithms in new problems and for making them more efficient. In particular we expect the algorithm described here to be of great practical use in view of the widespread interest in the minimum Hellinger distance and related methods.

The rest of the paper is organized as follows: In Section 2, we provide a brief review of minimum disparity estimation. The main contributions of the paper are presented in Section 3, where we first develop the iteratively reweighted estimating equation (IREE) algorithm in the spirit of iteratively reweighted least squares (IRLS) used in robust regression, and then demonstrate that a simple refinement can make the method comparable in performance to the NR algorithm, while keeping the implementation substantially simpler. Some further issues including a second order analysis, some discussion on the range of applicability of the method in small samples, and a weighted likelihood modification resulting from the IREE idea are discussed in Section 4. A small appendix presents a step by step implementation of the algorithm.

Section snippets

Minimum disparity estimation

Let us briefly review minimum disparity estimation leading up to the estimating equation that we will be concerned with. We start with the discrete model. Let m_β(x) represent the model density function indexed by an unknown $β∈Ω$ ; without loss of generality let the sample space be $X ={0,1,…,}$ . Let d(x) represent the proportion of observations in a sample of size n that have the value x. Define δ(x)=d(x)/m_β(x)−1 to be the Pearson residual at x. Then a general disparity measure ρ can be expressed in

The iteratively reweighted least squares (IRLS)

We first discuss the IRLS and then develop the IREE along those lines. The IRLS is an algorithm often used in determining the parameter estimates in robust regression. It is generally attributed to Beaton and Tukey (1974), and is far simpler to apply than the NR method. Holland and Welsch (1977), McCullagh and Nelder (1989) and Green (1984) are good general references. Byrd and Pyne (1979) and Birch (1980) discuss convergence results and Del Pino (1989) provides an extensive bibliography.

Second order analysis for the optimally weighted IREE

Let A₂=A″(0) represent the second derivative of the residual adjustment function of the disparity evaluated at zero. Lindsay (1994) and Basu and Lindsay (1994) have shown that this plays an important role in determining the theoretical properties of the estimator. In this section we will show that the right-hand side of Eq. (3.12) can be expressed as a function of A₂ when the residuals are small.

Direct differentiation of w(x) gives $w′(x)=−[A′(δ(x))d(x)−m(x)(A(δ(x))−λ)]u(x),$ where u(x)=∇m(x)/m(x)

References (48)

A. Basu et al.
Minimum negative exponential disparity estimation in parametric models
J. Statist. Plann. Inference
(1997)
R. Cao et al.
Minimum distance density-based estimation
Comput. Statist. Data Anal.
(1995)
T.P. Hettmansperger et al.
Minimum distance estimators
J. Statist. Plann. Inference
(1994)
J.S. Marron
Comments on a data based bandwidth selector
Comput. Statist. Data Anal.
(1989)
Basu, A., 1991. Minimum disparity estimation in the continuous case: efficiency, distributions, robustness and...
A. Basu et al.
Penalized minimum disparity methods for multinomial models
Statist. Sinica
(1998)
A. Basu et al.
Robust predictive distributions for exponential families
Biometrika
(1994)
A. Basu et al.
Minimum disparity estimation for continuous modelsefficiency, distributions and robustness
Ann. Inst. Stat. Math.
(1994)
Basu, A., Harris, I.R., Basu, S., 1997a. Minimum distance estimation: the approach using density based distances. In:...
A.E. Beaton et al.
The fitting of power series, meaning polynomials, illustrated on band spectroscopic data
Technometrics
(1974)

R.J. Beran

Minimum Hellinger distance estimates for parametric models

Ann. Statist.

(1977)

J.B. Birch

Some convergence properties of iterated least squares in the location model

Comm. Statist. B

(1980)

D.D. Boos

Minimum distance estimators for location and goodness-of-fit

J. Amer. Statist. Assoc.

(1981)

Byrd, R.H., Pyne, D.A., 1979. Some results on the convergence of the iteratively reweighted least squares. ASA Proc....

R. Cao et al.

The consistency of a smoothed minimum distance estimate

Scand. J. Statist.

(1996)

G.E. Del Pino

The unifying role of the iterative generalized least squares in statistical algorithms (with discussions)

Statist. Sci.

(1989)

L. Devroye

A Course in Density Estimation

(1987)

Devroye, L., Gyorfi, L., 1985. Nonparametric Density Estimation: The L1 View. John Wiley, New...

D.L. Donoho et al.

The automatic robustness of minimum distance functionals

Ann. Statist.

(1988)

D.L. Donoho et al.

Pathologies of some minimum distance estimators

Ann. Statist.

(1988)

P.W. Eslinger et al.

Minimum Hellinger distance estimation for normal models

J. Statist. Comput. Simulation

(1991)

P.J. Green

Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives (with discussions)

J. Roy. Statist. Soc. B

(1984)

P. Hall et al.

Lower bounds for bandwidth selection in density estimation

Probab. Theory Related Fields

(1991)

W. Härdle et al.

How far are automatically chosen regression smoothing parameters from their optimum?

J. Amer. Statist. Assoc.

(1988)

Cited by (14)

Reliable inference for complex models by discriminative composite likelihood estimation
2016, Journal of Multivariate Analysis
Composite likelihood estimation has an important role in the analysis of multivariate data for which the full likelihood function is intractable. An important issue in composite likelihood inference is the choice of the weights associated with lower-dimensional data sub-sets, since the presence of incompatible sub-models can deteriorate the accuracy of the resulting estimator. In this paper, we introduce a new approach for simultaneous parameter estimation by tilting, or re-weighting, each sub-likelihood component called discriminative composite likelihood estimation (D-McLE). The data-adaptive weights maximize the composite likelihood function, subject to moving a given distance from uniform weights; then, the resulting weights can be used to rank lower-dimensional likelihoods in terms of their influence in the composite likelihood function. Our analytical findings and numerical examples support the stability of the resulting estimator compared to estimators constructed using standard composition strategies based on uniform weights. The properties of the new method are illustrated through simulated data and real spatial data on multivariate precipitation extremes.
Divergence measures for statistical data processing - An annotated bibliography
2013, Signal Processing
Citation Excerpt :
Power divergence estimates, based on the divergence (16) and written as M-estimates, are investigated in [32] in terms of consistency, influence function, equivariance, and robustness; see also [33,170,209]. Iteratively reweighted estimating equations for robust minimum distance estimation are proposed in [34] whereas a boostrap root search is discussed in [177]. Recent investigations of the power divergence estimates include robustness to outliers and a local learning property [92] and Bahadur efficiency [118].
Minimum disparity estimation: Improved efficiency through inlier modification
2013, Computational Statistics and Data Analysis
Inference procedures based on density based minimum distance techniques provide attractive alternatives to likelihood based methods for the statistician. The minimum disparity estimators are asymptotically efficient under the model; several members of this family also have strong robustness properties under model misspecification. Similarly, the disparity difference tests have the same asymptotic null distribution as the likelihood ratio test but are often superior than the latter in terms of robustness properties. However, many disparities put large weights on the inliers, cells with fewer data than expected under the model, which appears to be responsible for a somewhat poor efficiency of the corresponding methods in small samples. Here we consider several techniques which control the inliers without significantly affecting the robustness properties of the estimators and the corresponding tests. Extensive numerical studies involving simulated data illustrate the performance of the methods.
Bayesian inference for the proportion of true null hypotheses using minimum Hellinger distance
2012, Journal of Statistical Planning and Inference
It is important that the proportion of true null hypotheses be estimated accurately in a multiple hypothesis context. Current estimation methods, however, are not suitable for high-dimensional data such as microarray data. First, they do not consider the (strong) dependence between hypotheses (or genes), thereby resulting in inaccurate estimation. Second, the unknown distribution of false null hypotheses cannot be estimated properly by these methods. Third, the estimation is affected strongly by outliers. In this paper, we find out the optimal procedure for estimating the proportion of true null hypotheses under a (strong) dependence based on the Dirichlet process prior. In addition, by using the minimum Hellinger distance, the estimation should be robust to any model misspecification as well as to any outliers while maintaining efficiency. The results are confirmed by a simulation study, and the newly developed methodology is illustrated by a real microarray data.
Generalized weighted likelihood density estimators with application to finite mixture of exponential family distributions
2011, Computational Statistics and Data Analysis
The family of weighted likelihood estimators largely overlaps with minimum divergence estimators. They are robust to data contaminations compared to MLE. We define the class of generalized weighted likelihood estimators (GWLE), provide its influence function and discuss the efficiency requirements. We introduce a new truncated cubic-inverse weight, which is both first and second order efficient and more robust than previously reported weights. We also discuss new ways of selecting the smoothing bandwidth and weighted starting values for the iterative algorithm. The advantage of the truncated cubic-inverse weight is illustrated in a simulation study of three-component normal mixtures model with large overlaps and heavy contaminations. A real data example is also provided.
Minimum disparity computation via the iteratively reweighted least integrated squares algorithms
2007, Computational Statistics and Data Analysis
Citation Excerpt :
We have implemented HMIX and have also experienced slow convergence in our simulation studies. Basu and Lindsay (2004) recently proposed the iteratively reweighted estimating equation (IREE) algorithm for minimum disparity computation. It is a fast algorithm, as also confirmed in our studies, but it can only be applied to some distributions of the exponential family.
Minimum disparity estimation is appealing in that the estimates it provides are simultaneously robust and efficient. This paper presents a family of algorithms called iteratively reweighted least integrated squares for minimum disparity computation. This family of algorithms, indexed by a real parameter $α$ , approximates the disparity measure by quadratic functions, in a form of integrated weighted squared errors, and minimizes the quadratic functions conveniently by using weighted least squares linear regression algorithms. Among all potential values of $α$ , we advocate the use of $α = 1$ from the consideration of robust estimation, which results in an algorithm similar in spirit to the Fisher scoring method for maximum likelihood computation. Numerical studies show that the new algorithms, especially the one that uses $α = 1$ , give competitive or better performance over the other algorithms available in the literature.

View all citing articles on Scopus

¹: Lindsay was partially supported by the National Science Foundation under grant DMS 0104443.

View full text

The iteratively reweighted estimating equation in minimum distance problems

Abstract

Introduction

Section snippets

Minimum disparity estimation

The iteratively reweighted least squares (IRLS)

Second order analysis for the optimally weighted IREE

J. Statist. Plann. Inference

Comput. Statist. Data Anal.

J. Statist. Plann. Inference

Comput. Statist. Data Anal.

Penalized minimum disparity methods for multinomial models

Statist. Sinica

Robust predictive distributions for exponential families

Biometrika

Minimum disparity estimation for continuous modelsefficiency, distributions and robustness

Ann. Inst. Stat. Math.

The fitting of power series, meaning polynomials, illustrated on band spectroscopic data

Technometrics

Minimum Hellinger distance estimates for parametric models

Ann. Statist.

Some convergence properties of iterated least squares in the location model

Comm. Statist. B

Minimum distance estimators for location and goodness-of-fit

J. Amer. Statist. Assoc.

The consistency of a smoothed minimum distance estimate

Scand. J. Statist.

The unifying role of the iterative generalized least squares in statistical algorithms (with discussions)

Statist. Sci.

A Course in Density Estimation

The automatic robustness of minimum distance functionals

Ann. Statist.

Pathologies of some minimum distance estimators

Ann. Statist.

Minimum Hellinger distance estimation for normal models

J. Statist. Comput. Simulation

Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives (with discussions)

J. Roy. Statist. Soc. B

Lower bounds for bandwidth selection in density estimation

Probab. Theory Related Fields

How far are automatically chosen regression smoothing parameters from their optimum?

J. Amer. Statist. Assoc.