On distance-type Gaussian estimation

https://doi.org/10.1016/j.jmva.2021.104831Get rights and content

Abstract

In this paper, we develop a new procedure for estimating the parameters of a model by combining Zhang’s (2019) recent Gaussian estimator and the minimum density power divergence estimators of Basu et al. (1998). The proposed estimator is called the Minimum Density Power Divergence Gaussian Estimator (MDPDGE). The consistency and asymptotic normality of the MDPDGE are proved. The MDPDGE is applied to some classical univariate distributions and it is also investigated for the family of elliptically contoured distributions. A numerical study illustrates the robustness of the proposed estimator.

Introduction

The recent paper by Zhang [34] proposes a method to estimate the parameters of a model when the implementation of maximum likelihood estimation is difficult or impossible. The proposed estimators are characterized by nice properties, such as consistency and asymptotic normality. To formulate the problem, let Y1,,Yn be independent and identically distributed replications of the m-variate random vector Y which is governed by the probability density or mass function fθ(y), where θΘ is a parameter vector, Θ is a measurable subset of Rd, and d is the dimension of θ. In this setting, the maximum likelihood estimator (MLE) of the unknown parameter θ is defined as θ̂MLE=argmaxθΘ(θ),where (θ)=i=1nlnfθ(yi) is the log-likelihood of θ, which is defined on the basis of a set of observations y1,,yn, with yi a realization of Yi, i{1,,n}. The MLE, θ̂MLE, obeys nice properties, like consistency and asymptotic normality, subject to some mild conditions. In fact, it is well known that the MLE is a BAN (best asymptotically normal) estimator. Although the maximum likelihood estimation method is simple and it leads to efficient estimators of the unknown parameters in the large sample case, it cannot be applied so easily in practice when the initial model fθ(y), which describes the data, is not so tractable or it contains intractable normalizing constants which include the unknown parameters. Moreover, maximum likelihood estimation is highly affected by the presence of outliers in the data. To overcome these shortcomings, several methods have been proposed in the literature. We mention the composite likelihood method and the distance-based estimation methods, among many others.

The main task in the approach proposed by Zhang [34] is to construct a likelihood function, like (θ)=i=1nlnfθ(yi) in (1), which can be easily managed. In this direction, such an estimation function may be that of the multivariate normal distribution. This is the main idea in the paper by Zhang [34], which proposes that if the computation of fθ(y) is difficult but the computation of μθ=Eθ(Y) and Σθ=Covθ(Y) is not so arduous, then we can put them into the log-likelihood function of the multivariate normal distribution, which provides, in this way, an estimation function. This motivates Zhang’s Gaussian estimation approach.

To formulate this approach suppose, as above, that y1,,yn are realizations of the independent and identically distributed replications Y1,,Yn of the random vector Y which is described by a probability density function or a probability mass function fθ(y), where θΘRd, d1. Moreover, suppose that the mean vector μθ=Eθ(Y) and variance–covariance matrix Σθ=Covθ(Y) of Y=(Y1,,Ym)T are known or they can be easily obtained in an explicit form. In this frame, Zhang [34] proposes that instead of using the true but intractable log-likelihood function (θ)=i=1nlnfθ(yi), in the estimation approach described in (1), to use the Gaussian estimation function of θ under y1,,yn, which is defined as G(θ)=nm2ln(2π)n2ln|Σθ|12i=1n(yiμθ)TΣθ1(yiμθ).In this setting, the Gaussian estimator of θ is then defined by θ̂G=argmaxθΘG(θ).If θ̂G is an interior point of Θ, then it also satisfies Gθ̂G=0,where for each j{1,,d}, the jth component of Gθ is G(θ)θj=n2trΣθ1Σθθj+i=1nμθTθjΣθ1(yiμθ)+12i=1n(yiμθ)TΣθ1Σθθj(yiμθ).Based on the above exposition and on Definition 1 in Zhang ([34], p. 236) we formulate the following, quite similar definition:

Definition 1 Zhang, [34]

For any random vector Y with mean vector μθ and variance–covariance matrix Σθ, G(θ) given by (2) is called the Gaussian estimation function, θ̂G given by (3) is called the Gaussian estimator, and (4) is called the Gaussian estimation equation, of θ under Y, respectively. The entire approach is called the general Gaussian estimation approach or general Gaussian estimation for short.

At a first glance, the replacement of the initial and valid, even intractable, model fθ(y) by the normal model, in Zhang’s [34] method, looks a little bit arbitrary. However, this apparent arbitrariness is removed when taking into account the maximum entropy principle, introduced by Jaynes [23] (cf. also, Zografos [35], [36] and the references appeared therein, among many others). Indeed, Zhang’s [34] method is developed on the basis of two assumptions: First, that the model which describes the existing data is difficult to be managed and second, that the mean and variance of the model are available in an explicit form. On the other hand, it is well known, by means of the maximum entropy principle, that the normal model is the most suitable model to describe and fit the available data, among all the other candidate distributions with given mean and variance (cf., for example, Kagan et al. [25], Theorem 13.2.2, p. 410 or Cover and Thomas, [14], p. 413). Hence, the use by Zhang [34] of the normal model and the subsequent Gaussian estimation function (2) is not so arbitrary as it seems, and it can be considered as a direct consequence and a nice application of the maximum entropy principle to the area of maximum likelihood point estimation.

On the other hand, minimum distance estimation methods occupy a significant part of the existing literature in estimation theory as they are characterized by good efficiency and robustness properties. These methods play an important and decisive role in the area of multivariate analysis because parameter estimation procedures are directly connected with classic topics like plug-in classification rules in discriminant analysis. Minimum distance estimation techniques are based on a distance measure between the empirical model and the unknown, in practice, parametric model that ideally describes the available data. In this setting, minimum distance estimators are the result of a minimization, with respect to the unknown parameters, of the distance measure between the empirical and the theoretical model that is adopted to fit the existing data. Minimum density power divergence (DPD) estimation method, introduced in the pioneer paper by Basu et al. [5] and exhaustively studied in the monograph by Basu et al. [7], occupies a prominent position among the existing minimum distance estimation methods due to its balance between efficiency and robustness. Basu et al. [6] presented a class of generalized Wald-type test statistics based on DPD in case of independently and identically distributed observations, while Basu et al. [4] extended these tests to independently but non identically distributed data. To date, several studies have confirmed the effectiveness of the use of DPD and DPD-based Wald-type tests in different areas, such as regression models (Ghosh and Basu, [17]; Castilla et al. [9], [10]), survival analysis (Ghosh et al. [18]) or even meteorology (Hazra and Ghosh, [22]). In recent years, DPD has been also applied in the prominent area of high-dimensional data; see Ghosh and Majumdar [20] or Ghosh et al. [19].

This paper provides elaborated new estimators which are developed on the common ground of Zhang’s [34] recent general Gaussian estimation method and the minimum DPD estimation method of Basu et al. [5]. In particular, in Section 2 both approaches are combined to introduce the minimum DPD Gaussian estimation (MDPDGE). The consistency and asymptotic normality of the so called estimators are studied in the subsequent Section 3. In Section 4, the behaviour of MDPDGE, in the univariate case, is illustrated with some elementary examples. Section 5 concentrates on the elliptic family of multivariate distributions. It initially reviews estimation of the parameters of the model by the classic maximum likelihood method. In the sequel, Zhang’s [34] type Gaussian estimators of the parameters of the elliptic family are obtained in an explicit form and they are applied to particular members of the elliptic family along with the respective MDPDGE. Finally, in Section 6 some numerical studies are developed to illustrate the behaviour of the proposed estimators in different models and to investigate the robustness of the MDPDGE. Some concluding remarks and possible future research lines are discussed in Section 7. The paper is integrated with some technical details which are presented in the final section.

Section snippets

Combining Zhang’s approach with minimum divergence estimation

This section focuses on the placement of Zhang’s [34] general Gaussian estimation method in the context of the minimum divergence methods of estimation. In particular, it introduces the MDPDGE as a result of the compilation of Basu et al. [5] and Zhang [34] methods of estimation.

Consistency and asymptotic normality of MDPDGE

The leading paper by Basu et al. [5] and the subsequent monograph by Basu et al. [7] are concentrated in the consistency and the asymptotic normality of MDPDE by the formulation of some regularity conditions which ensure the satisfaction of these properties. Juárez and Schucany [24] provide an alternative and more general list of conditions to prove the consistency and the asymptotic normality of the MDPDE. In light of these conditions, we study the consistency and asymptotic normality of the

MDPDGE in the univariate distributions

For the sake of illustration, let us now concentrate on the univariate case. That is, let y1,,yn be realizations of Y1,,Yn, which are independent and identically distributed replications of the random variable Y. Let us also consider univariate parameter θ. This is, m=d=1. In this simplified context, for a fixed value of a, the MDPDGE of θ is defined by θ̂Ga=argmaxθΘa+1a(2π)a/2σθ2a/21ni=1nea2σθ2(yiμθ)2a(1+a)3/2.

We try to obtain the explicit form of the estimating equation in the

MDPDGE in elliptically contoured distributions

Suppose that the m-dimensional random vector Y=(Y1,,Ym)T is distributed according to an elliptically contoured distribution with location parameter μRm and scale parameter V, a m×m positive definite matrix. The characteristic function of Y has the form ψ(t)=eitTμϕ(tTV t),for some scalar function ϕ. If the distribution has a density, it can be written in the form |V|1/2gm[(yμ)TV1(yμ)].In this case we will use the notation YECm(μ,V,gm). Following Fang et al. ([15], p. 46–47), any function g

Numerical results

In this section we numerically illustrate the proposed methods. In particular, we want to evaluate the robustness of the proposed MDPDGE with respect to Zhang’s Gaussian estimator, the MLE and the mentioned MDPDE.

Conclusions and future lines of research

The aim of this section is to summarize the contribution of the paper and also to underline possible topics for a future work in light of the present results. Maximum likelihood estimation is the cornerstone of statistical inference. The resulting estimators of the unknown parameter(s) of the model that drives the data are easily obtained in the case of a tractable model. Moreover, they are efficient but, at the same time, they are highly non-robust in the light of outliers in the data.

Derivation of the variance–covariance matrix for the general case

Here we develop the necessary computations for Proposition 3. Note that we follow similar ideas by Castilla et al. ([11], Appendix A.5) in the context of composite likelihood.

Let fN(y;μθ,Σθ) denote the multivariate normal distribution of mean μθ and variance–covariance matrix Σθ. This is, fN(y;μθ,Σθ)=1(2π)m/2|Σθ|1/2exp12(yμθ)TΣθ1(yμθ).We have that fN(y;μθ,Σθ)a+1=1(2π)m/2|Σθ|1/21+aexpa+12(yμθ)TΣθ1(yμθ),=1(2π)m/2|Σθ|1/2a1(1+a)m/2(1+a)m/2(2π)m/2|Σθ|1/2expa+12(yμθ)TΣθ1(yμθ)=1(2π)m/2|Σθ|1

CRediT authorship contribution statement

Elena Castilla: Methodology, Software, Formal analysis, Validation, Visualization, Writing - original draft, Writing - review & editing. Konstantinos Zografos: Conceptualization, Methodology, Formal analysis, Validation, Visualization, Writing - original draft, Writing - review & editing.

Acknowledgements

It is a great pleasure to contribute to this Special Issue, devoted to the celebrations of the 50 year anniversary of the Journal of Multivariate Analysis. We congratulate the Editors for this initiative and we thank them for the invitation to submit a paper for consideration in this Special Issue. We take this occasion to wish the journal to continue providing an excellent, prestigious and prominent medium for those who are interested in the field of multivariate statistical analysis and

References (36)

  • BasuA. et al.

    Statistical Inference the Minimum Distance Approach

    (2011)
  • BreuschT.S. et al.

    The emperor’s new clothes: a critique of the multivariate t regression model

    Stat. Neerl.

    (1997)
  • CastillaE. et al.

    New robust statistical procedures for the polytomous logistic regression models

    Biometrics

    (2018)
  • CastillaE. et al.

    Robust semiparametric inference for polytomous logistic regression with complex survey design

    Adv. Data Anal. Classif.

    (2020)
  • CastillaE. et al.

    Composite likelihood methods based on minimum density power divergence estimator

    Entropy

    (2018)
  • CastillaE. et al.

    Model selection in a composite likelihood framework based on density power divergence

    Entropy

    (2020)
  • CastillaE. et al.

    Composite likelihood methods: Rao-type tests based on composite minimum density power divergence estimator

    Statist. Pap.

    (2021)
  • CoverT.M. et al.

    Elements of Information Theory

    (2006)
  • Cited by (0)

    View full text