Robust estimation in multivariate heteroscedastic regression models with autoregressive covariance structures using EM algorithm
Introduction
Repeated measures are prevalent in various fields, from biology to economics. The correlation between these measures is a function of the time difference or the location distance. Most experimental research includes study designs in which the same variable is repeatedly measured or observed over time for each subject in a balanced or unbalanced structure. The key feature of such data (longitudinal data) or more generally repeated measures is that responses within a subject are very likely to be correlated even though responses between subjects may be independent. Since misidentification of the covariance structure may lead to a significant loss of efficiency of the mean parameter estimators [45], taking into account the within-subject correlation is essential for modeling such data sets. Also, suppose these data sets contain certain missing values, outliers, and they are not normally distributed. In that case, the mean parameter estimators may be biased because of the misspecification of the covariance structure [7]. When the within-subject correlation depends on time or location, the classical models related to the time series (e.g., AR, MA, or exchangeable structure) may fail to flexibly represent the dynamic dependence structure [47]. Furthermore, since modeling covariance structures has been a direct interest in some areas such as social sciences, finance, economics, and geology [8], the joint modeling mean, and covariance models are commonly used for dynamic covariance modeling (see [6], [32], [37], [38], [39], [40], [49], [50], [51]). In multivariate data analysis, joint mean–covariance models can be considered a multivariate regression model in which the covariance matrix has a special structure. In particular, the mean vector is parameterized as a linear model, and the covariance matrix is formed to represent the heteroscedastic and autoregressive structure of the measurements within the same subject. It should be noted that the aforementioned model type can also be used for time and location-dependent data sets.
The positive-definiteness is a challenging constraint to realize in modeling the covariance matrix. To handle this difficulty, Pourahmadi [37], [38] has proposed a data-driven method based on the modified Cholesky decomposition and guaranteed the positive-definiteness of the estimated covariance at no additional computational cost. The advantages of such a method include the detailed statistical interpretation of the parameters and the assurance of positive-definiteness of the covariance structure. Pourahmadi [37], [38] has considered GLMs for the components of the modified Cholesky decomposition of the covariance matrix. Several authors have considered the joint mean and covariance models based on the modified Cholesky decomposition or its extensions [18], [32], [33]. These approaches are based on the normality assumption, which is well-known not to be realistic. Thus, they are sensitive to outliers, contamination, or heavy-tailed distributions. Since an outlier in the subject level may cause a set of outlying observations in the sample, extra effort should be given to combat outliers. Therefore, robust approaches should be considered for analyzing data sets that may contain some outliers or are not suitable for modeling with the normality assumption.
Concerning the robustness in this context, there are several approaches. For example, Fan et al. [11] have introduced a generalized estimating equation method based on the bounded scoring of the Huber function. Croux et al. [5] have developed a statistical procedure for robustifying the mean and covariance where they set up estimating equations for both the mean and the dispersion parameter. Zheng et al. [52], [53] have proposed robust generalized estimating equations for the joint mean–covariance model by employing the Huber’s function and leverage-based weights for the mean and covariance to achieve robustness against outliers. Then, Wang et al. [46] have proposed a robust variable selection approach by adopting the exponential squared loss function with a tuning parameter. The proposed approach not only achieves good robustness concerning outliers in the dataset but also is as asymptotically efficient as the least-squares estimation without outliers under normal error with a selected tuning parameter.
Maadooliat et al. [29]) have proposed a double-robust procedure for modeling the correlation matrix of a longitudinal dataset. Qin and Zhu [41] have further developed a robust method for simultaneous estimation of mean and covariance under the assumptions that random errors follow an elliptical distribution and all the subjects share the same within-subject covariance matrix, which does not depend on covariates. Lv et al. [28] have extended the bounded exponential score function in longitudinal data analysis and introduced efficient and robust generalized estimating equations. Lv et al. [27] have proposed a new adaptive robust procedure for bivariate longitudinal data utilizing Huber’s score function. Recently, Xu et al. [47] have proposed a robust estimation of the mean–covariance models using the maximum Lq-likelihood estimation method. Their paper has considered the mean–covariance modeling of the normal distribution. Still, instead of the ordinary maximum likelihood method, they used the maximum Lq-likelihood estimation method to obtain robust estimators for the parameters of interest. Most of the studies in this context are concerned with suggesting robust parameter estimation methods, except for the paper of Lin and Wang [25]. They have proposed a multivariate t-regression model that is useful when the data set has heavier tails than the normal distribution with its mean and scale covariance modeled jointly for longitudinal data analysis. However, this procedure is computationally intensive since the degrees of freedom parameter needs to be estimated along with the other parameters in the t-distribution case.
Similar to the t-distribution, the multivariate Laplace is an alternative heavy-tailed distribution to the multivariate normal distribution. However, the Laplace distribution has an advantage over the multivariate t-distribution concerning the number of parameters to be estimated. As a heavy-tailed distribution, the multivariate Laplace distribution has the same number of parameters as the multivariate normal distribution. This feature provides great convenience to the Laplace distribution in computational aspects. On the other hand, since the multivariate Laplace distribution is heavy-tailed, unlike the multivariate normal distribution, it will give robust estimators sensitive to outliers and heavy tailedness features of data sets. In recent years, this property of the Laplace distribution has been considered so that statistical modeling based on Laplace distribution and extensions thereof has developed rapidly in theory and applications [2], [9], [14], [19], [21], [22], [23], [26], [42], [48]. Kotz et al. [21] can be seen for the details of multivariate Laplace distribution’s properties, generalizations, and applications.
In this paper, inspired by the robustness property of the Laplace distribution, we propose a robust approach to the analysis of repeated measurements by extending Pourahmadi’s approach of joint mean and covariance parameterizations to general linear models with Laplace distributed error term, which will be so-called Laplace joint modeling model (LJMM). Our method’s advantage over the other proposed robust alternative (such as [25]) is that the Laplace distribution is a heavy-tailed alternative to the multivariate normal distribution and has an advantage over the t-distribution concerning the number of parameters that need to be estimated. Besides being robust, the number of parameters to be estimated is the same for the normal case.
The remainder of the present paper is structured as follows. In Section 2, we define the LJMM. We present three distinct representations of the log-likelihood function of LJMM and the maximum likelihood (ML) estimates of the parameters in Section 3. A Fisher scoring algorithm and Expectation–Maximization (EM) algorithm for the implementation of ML estimation are described in Section 4. Prediction of future values is discussed in Section 5. Some simulation studies are reported in Section 6. Section 7 includes a data example. Finally, some concluding remarks are briefly summarized in Section 8.
Section snippets
Multivariate Laplace distribution
There are different extensions of the univariate Laplace distributions called multivariate Laplace distribution. Ulrich and Chen [44] introduced the bivariate case of the Laplace distribution. The different forms in larger dimensions were discussed in the literature. Anderson [1] introduced a multivariate Laplace distribution as a special case of a multivariate Linnik distribution. Ernst [10] and Fernandez et al. [13] presented the multivariate Laplace distribution as a particular multivariate
ML estimation for the parameters
Assume that we have independent observations of subjects from model (7). Then the likelihood and log-likelihood functions are where . For the sake of simplicity let be the vector of residuals, be the predictor of , and be the total number of observations. Using the following result, ,
Computational aspects of parameter estimation
As we mentioned in Section 3, likelihood equations cannot be explicitly solved. Therefore we have to use numerical algorithms to compute the ML estimates. One of the algorithms is Newton–Raphson or Fisher scoring algorithm used in the paper by [25]. Therefore, we can also use the Newton–Raphson algorithm for the Laplace case, which will be defined in this section. However, it is well known that the Newton–Raphson algorithm may be computationally intensive because of the computation of the
Prediction
In this section, following [25] the technique for the prediction of future responses under this model is considered. Let be a future observations of , be a matrix of prediction regressors corresponding to with , and denote a and a covariate vector associated with with and for and respectively. Moreover, we assume that the joint distribution for follows an -variate Laplace
Simulation study
In the simulation study, our objective is to compare the proposed model (LJMM) with NJMM and TJMMs in terms of modeling capacity and the predictive accuracy of LJMM over NJMM. Consistent with the literature, we carry on the theoretical part of the paper by assuming different for each subject, but we observe that estimation and computation will be simplified if ’s are equal. Therefore, we only consider the balanced models in simulation and real data. We use the modified EM algorithm
Data example
This section uses the orthodontic growth curve data analyzed by [36] originally. It is an open data set located in (nlme) package in R [34]. The data show the dental growth of distances from the pituitary to the pterygomaxillary fissure (mm) measured repeatedly at ages 8, 10, 12, and 14 years of 11 females and 16 males. We want to compare the performances of NJMM and our proposed LJMM for this data set. See the lattice graph showing the growth curves of each subject in Fig. 1. The difference in
Conclusion
We have proposed a robust approach to joint modeling of mean and scale covariance using multivariate Laplace distribution, a very flexible distributional assumption, and adopted the modified Cholesky decomposition to factorize the dependence structure in terms of unconstrained autoregressive and scale innovation parameters. Our approach can be regarded as a robust extension of the normal joint mean and covariance model in which the modified Cholesky decomposition proposed by Pourahmadi [37] is
CRediT authorship contribution statement
Yesim Guney: Conceptualization, Methodology, Software, Writing – original draft. Olcay Arslan: Supervision, Validation, Writing – original draft, Editing. Fulya Gokalp Yavuz: Data curation, Software, Investigation, Writing – original draft, Editing.
Acknowledgments
We thank the Editor-in-Chief, and the two anonymous referees for their thorough reading of the former version of the paper. Their helpful comments and numerous suggestions led to a considerable improvement of the manuscript. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
References (53)
A multivariate Linnik distribution
Statist. Probab. Lett.
(1992)Convergence behavior of an iterative reweighting algorithm to compute multivariate M-estimates for location and scatter
J. Statist. Plann. Inference
(2004)- et al.
Modeling covariance matrices via partial autocorrelations
J. Multivariate Anal.
(2009) - et al.
Variable selection in robust regression models for longitudinal data
J. Multivariate Anal.
(2012) - et al.
Goodness-of-fit tests for multivariate Laplace distributions
Math. Comput. Model.
(2011) A moment method for the multivariate asymmetric Laplace distribution
Statist. Probab. Lett.
(2013)- et al.
Asymmetric Laplace laws and modeling financial data
Math. Comput. Model.
(2001) - et al.
A robust approach to joint modeling of mean and scale covariance for longitudinal data
J. Statist. Plann. Inference
(2009) - et al.
Multivariate distributions with correlation matrices for nonlinear repeated measurements
Comput. Statist. Data Anal.
(2006) - et al.
An efficient and robust variable selection method for longitudinal generalized linear models
Comput. Statist. Data Anal.
(2015)