On distance-type Gaussian estimation
Introduction
The recent paper by Zhang [34] proposes a method to estimate the parameters of a model when the implementation of maximum likelihood estimation is difficult or impossible. The proposed estimators are characterized by nice properties, such as consistency and asymptotic normality. To formulate the problem, let be independent and identically distributed replications of the -variate random vector which is governed by the probability density or mass function , where is a parameter vector, is a measurable subset of , and is the dimension of . In this setting, the maximum likelihood estimator (MLE) of the unknown parameter is defined as where is the log-likelihood of , which is defined on the basis of a set of observations , with a realization of , . The MLE, , obeys nice properties, like consistency and asymptotic normality, subject to some mild conditions. In fact, it is well known that the MLE is a BAN (best asymptotically normal) estimator. Although the maximum likelihood estimation method is simple and it leads to efficient estimators of the unknown parameters in the large sample case, it cannot be applied so easily in practice when the initial model , which describes the data, is not so tractable or it contains intractable normalizing constants which include the unknown parameters. Moreover, maximum likelihood estimation is highly affected by the presence of outliers in the data. To overcome these shortcomings, several methods have been proposed in the literature. We mention the composite likelihood method and the distance-based estimation methods, among many others.
The main task in the approach proposed by Zhang [34] is to construct a likelihood function, like in (1), which can be easily managed. In this direction, such an estimation function may be that of the multivariate normal distribution. This is the main idea in the paper by Zhang [34], which proposes that if the computation of is difficult but the computation of and is not so arduous, then we can put them into the log-likelihood function of the multivariate normal distribution, which provides, in this way, an estimation function. This motivates Zhang’s Gaussian estimation approach.
To formulate this approach suppose, as above, that are realizations of the independent and identically distributed replications of the random vector which is described by a probability density function or a probability mass function , where , . Moreover, suppose that the mean vector and variance–covariance matrix of are known or they can be easily obtained in an explicit form. In this frame, Zhang [34] proposes that instead of using the true but intractable log-likelihood function , in the estimation approach described in (1), to use the Gaussian estimation function of under , which is defined as In this setting, the Gaussian estimator of is then defined by If is an interior point of , then it also satisfies where for each , the th component of is Based on the above exposition and on Definition 1 in Zhang ([34], p. 236) we formulate the following, quite similar definition:
Definition 1 Zhang, [34] For any random vector with mean vector and variance–covariance matrix , given by (2) is called the Gaussian estimation function, given by (3) is called the Gaussian estimator, and (4) is called the Gaussian estimation equation, of under , respectively. The entire approach is called the general Gaussian estimation approach or general Gaussian estimation for short.
At a first glance, the replacement of the initial and valid, even intractable, model by the normal model, in Zhang’s [34] method, looks a little bit arbitrary. However, this apparent arbitrariness is removed when taking into account the maximum entropy principle, introduced by Jaynes [23] (cf. also, Zografos [35], [36] and the references appeared therein, among many others). Indeed, Zhang’s [34] method is developed on the basis of two assumptions: First, that the model which describes the existing data is difficult to be managed and second, that the mean and variance of the model are available in an explicit form. On the other hand, it is well known, by means of the maximum entropy principle, that the normal model is the most suitable model to describe and fit the available data, among all the other candidate distributions with given mean and variance (cf., for example, Kagan et al. [25], Theorem 13.2.2, p. 410 or Cover and Thomas, [14], p. 413). Hence, the use by Zhang [34] of the normal model and the subsequent Gaussian estimation function (2) is not so arbitrary as it seems, and it can be considered as a direct consequence and a nice application of the maximum entropy principle to the area of maximum likelihood point estimation.
On the other hand, minimum distance estimation methods occupy a significant part of the existing literature in estimation theory as they are characterized by good efficiency and robustness properties. These methods play an important and decisive role in the area of multivariate analysis because parameter estimation procedures are directly connected with classic topics like plug-in classification rules in discriminant analysis. Minimum distance estimation techniques are based on a distance measure between the empirical model and the unknown, in practice, parametric model that ideally describes the available data. In this setting, minimum distance estimators are the result of a minimization, with respect to the unknown parameters, of the distance measure between the empirical and the theoretical model that is adopted to fit the existing data. Minimum density power divergence (DPD) estimation method, introduced in the pioneer paper by Basu et al. [5] and exhaustively studied in the monograph by Basu et al. [7], occupies a prominent position among the existing minimum distance estimation methods due to its balance between efficiency and robustness. Basu et al. [6] presented a class of generalized Wald-type test statistics based on DPD in case of independently and identically distributed observations, while Basu et al. [4] extended these tests to independently but non identically distributed data. To date, several studies have confirmed the effectiveness of the use of DPD and DPD-based Wald-type tests in different areas, such as regression models (Ghosh and Basu, [17]; Castilla et al. [9], [10]), survival analysis (Ghosh et al. [18]) or even meteorology (Hazra and Ghosh, [22]). In recent years, DPD has been also applied in the prominent area of high-dimensional data; see Ghosh and Majumdar [20] or Ghosh et al. [19].
This paper provides elaborated new estimators which are developed on the common ground of Zhang’s [34] recent general Gaussian estimation method and the minimum DPD estimation method of Basu et al. [5]. In particular, in Section 2 both approaches are combined to introduce the minimum DPD Gaussian estimation (MDPDGE). The consistency and asymptotic normality of the so called estimators are studied in the subsequent Section 3. In Section 4, the behaviour of MDPDGE, in the univariate case, is illustrated with some elementary examples. Section 5 concentrates on the elliptic family of multivariate distributions. It initially reviews estimation of the parameters of the model by the classic maximum likelihood method. In the sequel, Zhang’s [34] type Gaussian estimators of the parameters of the elliptic family are obtained in an explicit form and they are applied to particular members of the elliptic family along with the respective MDPDGE. Finally, in Section 6 some numerical studies are developed to illustrate the behaviour of the proposed estimators in different models and to investigate the robustness of the MDPDGE. Some concluding remarks and possible future research lines are discussed in Section 7. The paper is integrated with some technical details which are presented in the final section.
Section snippets
Combining Zhang’s approach with minimum divergence estimation
This section focuses on the placement of Zhang’s [34] general Gaussian estimation method in the context of the minimum divergence methods of estimation. In particular, it introduces the MDPDGE as a result of the compilation of Basu et al. [5] and Zhang [34] methods of estimation.
Consistency and asymptotic normality of MDPDGE
The leading paper by Basu et al. [5] and the subsequent monograph by Basu et al. [7] are concentrated in the consistency and the asymptotic normality of MDPDE by the formulation of some regularity conditions which ensure the satisfaction of these properties. Juárez and Schucany [24] provide an alternative and more general list of conditions to prove the consistency and the asymptotic normality of the MDPDE. In light of these conditions, we study the consistency and asymptotic normality of the
MDPDGE in the univariate distributions
For the sake of illustration, let us now concentrate on the univariate case. That is, let be realizations of , which are independent and identically distributed replications of the random variable . Let us also consider univariate parameter . This is, . In this simplified context, for a fixed value of , the MDPDGE of is defined by
We try to obtain the explicit form of the estimating equation in the
MDPDGE in elliptically contoured distributions
Suppose that the -dimensional random vector is distributed according to an elliptically contoured distribution with location parameter and scale parameter , a positive definite matrix. The characteristic function of has the form for some scalar function . If the distribution has a density, it can be written in the form In this case we will use the notation . Following Fang et al. ([15], p. 46–47), any function
Numerical results
In this section we numerically illustrate the proposed methods. In particular, we want to evaluate the robustness of the proposed MDPDGE with respect to Zhang’s Gaussian estimator, the MLE and the mentioned MDPDE.
Conclusions and future lines of research
The aim of this section is to summarize the contribution of the paper and also to underline possible topics for a future work in light of the present results. Maximum likelihood estimation is the cornerstone of statistical inference. The resulting estimators of the unknown parameter(s) of the model that drives the data are easily obtained in the case of a tractable model. Moreover, they are efficient but, at the same time, they are highly non-robust in the light of outliers in the data.
Derivation of the variance–covariance matrix for the general case
Here we develop the necessary computations for Proposition 3. Note that we follow similar ideas by Castilla et al. ([11], Appendix A.5) in the context of composite likelihood.
Let denote the multivariate normal distribution of mean and variance–covariance matrix . This is, We have that
CRediT authorship contribution statement
Elena Castilla: Methodology, Software, Formal analysis, Validation, Visualization, Writing - original draft, Writing - review & editing. Konstantinos Zografos: Conceptualization, Methodology, Formal analysis, Validation, Visualization, Writing - original draft, Writing - review & editing.
Acknowledgements
It is a great pleasure to contribute to this Special Issue, devoted to the celebrations of the 50 year anniversary of the Journal of Multivariate Analysis. We congratulate the Editors for this initiative and we thank them for the invitation to submit a paper for consideration in this Special Issue. We take this occasion to wish the journal to continue providing an excellent, prestigious and prominent medium for those who are interested in the field of multivariate statistical analysis and
References (36)
- et al.
Independent or uncorrelated disturbances in linear regression: An illustration of the difference
Econom. Lett.
(1985) - et al.
Some extremal type elliptical distributions
Statist. Probab. Lett.
(2001) General Gaussian estimation
J. Multivariate Anal.
(2019)On maximum entropy characterization of pearson’s type II and VII multivariate distributions
J. Multivariate Anal.
(1999)An Introduction to Multivariate Statistical Analysis
(2003)- et al.
Maximum-likelihood estimates and likelihood-ratio criteria for multivariate elliptically contoured distributions
Canad. J. Statist.
(1986) - et al.
On the optimal density power divergence tuning parameter
J. Appl. Stat.
(2020) - et al.
Robust wald-type tests for non-homogeneous observations based on the minimum density power divergence estimator
Metrika
(2018) - et al.
Robust and efficient estimation by minimizing a density power divergence
Biometrika
(1998) - et al.
Generalized wald-type tests based on minimum density power divergence estimators
Statistics
(2016)