Abstract
We propose a heteroscedastic replicated measurement error model based on the class of scale mixtures of skew-normal distributions, which allows the variances of measurement errors to vary across subjects. We develop EM algorithms to calculate maximum likelihood estimates for the model with or without equation error. An empirical Bayes approach is applied to estimate the true covariate and predict the response. Simulation studies show that the proposed models can provide reliable results and the inference is not unduly affected by outliers and distribution misspecification. The method has also been used to analyze a real data of plant root decomposition.
Similar content being viewed by others
References
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19(6):716–723
Alejandro J, Fernando Q, Ernesto SM (2008) Linear mixed models with skew-elliptical distributions: a Bayesian approach. Comput Stat Data Anal 52(11):5033–5045
Andrews DF, Mallows CL (1974) Scale mixtures of normal distributions. J R Stat Soc B 36(1):99–102
Azzalini A, Capitanio A (1999) Statistical applications of the multivariate skew-normal distribution. J R Stat Soc B 61(3):579–602
Barndorff-Nielsen OE (1997) Normal inverse Gaussian distributions and stochastic volatility modelling. Scand J Stat 24(1):1–13
Basso RM, Lachos VH, Cabral CR, Ghosh P (2010) Robust mixture modeling based on scale mixtures of skew-normal distributions. Comput Stat Data Anal 54(12):2926–2941
Branco MD, Dey DK (2001) A general class of multivariate skew-elliptical distributions. J Multivar Anal 79(1):99–113
Cancho VG, Lachos HL, Ortega EMM (2010) A nonlinear regression model with skew-normal errors. Stat Pap 51(3):547–558
Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM (2006) Measurement error in nonlinear models: a modern perspective, 2nd edn. Chapman and Hall, Boca Raton
Cao CZ, Lin JG, Shi JQ (2014) Diagnostics on nonlinear model with scale mixtures of skew-normal and first-order autoregressive errors. Statistics 48(5):1033–1047
Cao CZ, Lin JG, Shi JQ, Wang W, Zhang XY (2015) Multivariate measurement error models for replicated data under heavy-tailed distributions. J Chemom 29(8):457–466
Cao CZ, Lin JG, Zhu XX (2012) On estimation of a heteroscedastic measurement error model under heavy-tailed distributions. Comput Stat Data Anal 56(2):438–448
Cheng CL, Riu J (2006) On Estimating linear relationships when both variables are subject to heteroscedastic measurement errors. Technometrics 48(4):511–519
Cheng CL, Van Ness JW (1999) Statistical regression with measurement error. Arnold, London
Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39(1):1–38
de Castro M, Galea M, Bolfarine H (2008) Hypothesis testing in an errors-in-variables model with heteroscedastic measurement errors. Stat Med 27(25):5217–5234
de Castro M, Galea M (2010) Robust inference in an heteroscedastic measurement error model. J Korean Stat Soc 39(4):439–447
Fang KT, Kotz S, Ng KW (1990) Symmetrical multivariate and related distributions. Chapman and Hall, London
Fuller WA (1987) Measurement error models. Wiley, New York
Garay AM, Lachos VH, Abanto-Valle CA (2011) Nonlinear regression models based on scale mixtures of skew-normal distributions. J Korean Stat Soc 40(1):115–124
Genton MG (2004) Skew-elliptical distributions and their applications: a journey beyond normality. Chapman and Hall, Boca Raton
Goebel M, Hobbie SE, Bulaj B, Zadworny M, Archibald DD, Oleksyn J, Reich PB, Eissenstat DM (2011) Decomposition of the finest root branching orders: linking belowground dynamics to fine root function and structure. Ecol Monogr 81(1):89–102
Guo DL, Mitchell RJ, Hendricks JJ (2004) Fine root branch orders respond differentially to carbon source-sink manipulations in a longleaf pine forest. Oecologia 140(3):450–457
Kelly BC, Brandon C (2007) Some aspects of measurement error in linear regression of astronomical data. Astrophys J 665(2):1489–1506
Kulathinal SB, Kuulasmaa K, Gasbarra D (2002) Estimation of an errors-in-variables regression model when the variances of the measurement errors vary between the observations. Stat Med 21(8):1089–1101
Lachos VH, Angolini T, Abanto-Valle CA (2011) On estimation and local influence analysis for measurement errors models under heavy-tailed distributions. Stat Pap 52(3):567–590
Lachos VH, Ghosh PG, Arellano-Valle RB (2010a) Likelihood based inference for skew-normal independent linear mixed models. Stat Sin 20(1):303–322
Lachos VH, Labra FV, Bolfarine H, Ghosh P (2010b) Multivariate measurement error models based on scale mixtures of the skew-normal distribution. Statistics 44:541–556
Lange KL, Sinsheimer JS (1993) Normal/independent distributions and their applications in robust regression. J Comput Gr Stat 2(2):175–198
Lin JG, Cao CZ (2013) On estimation of measurement error models with replication under heavy-tailed distributions. Comput Stat 28(2):809–829
Liu CH, Rubin DB (1994) The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 81(4):633–648
McAssey MP, Hsieh F (2010) Slope estimation in structural line-segment heteroscedastic measurement error models. Stat Med 29(25):2631–2642
McLachlan GL, Krishnan T (1997) The EM algorithm and extensions. Wiley, New York
Montenegro LC, Bolfarine H, Lachos VH (2010) Inference for a skew extension of the Grubbs model. Stat Pap 51:701–715
Osorio F, Paula GA, Galea M (2009) On estimation and influence diagnostics for the Grubbs model under heavy-tailed distributions. Comput Stat Data Anal 53(4):1249–1263
Patriota AG, Bolfarine H, de Castro M (2009) A heteroscedastic structural errors-in-variables model with equation error. Stat Methodol 6(4):408–423
Raich JW, Russell AE, Valverde-Barrantes O (2009) Fine root decay rates vary widely among lowland tropical tree species. Oecologia 161(2):325–330
Reiersol O (1950) Identifiability of a linear relation between variables which are subject to errors. Econometrica 18(4):375–389
Silver WL, Miya RK (2001) Global patterns in root decomposition: comparisons of climate and litter quality effects. Oecologia 129(3):407–419
Veenendaal EM, Mantlana KB, Pammenter NW, Weber P, Huntsman-Mapila P, Lloyd J (2008) Growth form and seasonal variation in leaf gas exchange of Colophospermum Mopane savanna trees in northwest Botswana. Tree Physiol 28(3):417–424
Zhang XY, Wang W (2015) The decomposition of fine and coarse roots: their global patterns and controlling factors. Sci Rep 5:9940
Zeller CB, Carvalho RR, Lachos VH (2012) On diagnostics in multivariate measurement error models under asymmetric heavy-tailed distributions. Stat Pap 53(3):665–683
Zeller CB, Lachos HL, Vilca-Labra FE (2014) Influence diagnostics for Grubbs’s model with asymmetric heavy-tailed distributions. Stat Pap 55(3):671–690
Acknowledgements
This research was supported by the National Science Foundation of China (Grant No. 11301278), the MOE (Ministry of Education in China) Project of Humanities and Social Sciences (Grant No. 13YJC910001), and Academic Degree Postgraduate innovation projects of Jiangsu province Ordinary University (Grant No. KYLX15-0883). We are grateful to the associate editor and the referees for their helpful and constructive comments. We would also like to thank Dr. Zhang for providing us the root decomposition data.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Appendix A: Some heavy-tailed SMSN distributions
The pdf and the conditional moments of some heavy-tailed SMSN distributions are listed here:
1. The multivariate skew-t distribution \(\text {ST}_m(\varvec{\mu }, \varvec{\Sigma }, \varvec{\lambda };\nu )\):
\(\kappa (u)=1/u\), \(U\sim \text {Gamma}(\nu /2,\nu /2)\) with \(\nu > 0\). The pdf of \(\varvec{Y}\) is given by
where \(d=(\varvec{y}-\varvec{\mu })^{\top }\varvec{\Sigma }^{-1}(\varvec{y}-\varvec{\mu })\), \(t_m(\cdot |\varvec{\mu }, \varvec{\Sigma };\nu )\) and \(T(\cdot ;\nu )\) denotes the pdf of m-variate Student-t distribution and the cdf of standard univariate t distribution, respectively. When \(\nu \rightarrow +\infty \), one obtains the SN distribution.
The conditional moments are given by
where , i.e. the pdf of the class of SMN distribution when \(\varvec{\lambda }=\varvec{0}\).
2. The multivariate skew-slash distribution \(\text {SS}_m(\varvec{\mu }, \varvec{\Sigma }, \varvec{\lambda };\nu )\):
\(\kappa (u)=1/u\), \(U\sim \text {Beta}(\nu , 1)\) with \(0<u<1\) and \(\nu > 0\). The pdf of \(\varvec{Y}\) is given by
When \(\nu \rightarrow +\infty \), it reduces to the SN one. The conditional moments are
where \(S\sim \text {Gamma}((2\nu +m+2r)/2,d/2)\text {I}_{(0,1)}\) and \(P_x(a,b)\) denotes the cdf of the \(\text {Gamma}(a,b)\) distribution evaluated at x.
3. The multivariate skew-contaminated normal distribution \(\text {SCN}_m(\varvec{\mu }, \varvec{\Sigma }, \varvec{\lambda };\nu ,\gamma )\):
When \(\kappa (u)=1/u\) and U follows a discrete random probability function \(h(u;\nu ,\gamma )=\nu \text {I}_{(u=\gamma )}+(1-\nu ) \text {I}_{(u=1)}\) with parameter \(0<\nu<1, 0<\gamma \leqslant 1\), one get the multivariate skew-contaminated normal distribution with the pdf as
It reduces to the SN distribution when \(\gamma =1\). In this case, we have
1.2 Appendix B: EM algorithm for SMSN-HRME model without equation error
Denoting the complete data set of model (4) without equation error by \(\varvec{Z}_c=\{\varvec{Z}_{ct}=(\varvec{Z}_t^{\top }, x_t, u_t, v_t)^{\top }|t=1,\ldots ,n\}\), an equivalent form of structure (5) is given by
The complete log-likelihood function based on \(\varvec{Z}_c\) omitting items unrelated with \(\varvec{\theta }\) can be written as
where,
The EM algorithm is listed as follows.
E-step: Let \(\varvec{\theta }^{(k)}\) be the estimates of \(\varvec{\theta }\) at the k-th iteration. By calculating the conditional expectation \(\text {E}\big [l(\varvec{\theta }|\varvec{Z}_c)\big |\widehat{\varvec{\theta }}^{(k)}, \varvec{Z}\big ]\), we get the Q-function
with
and
where \(\widehat{ux}_t^{(k)}=\text {E}[\kappa ^{-1}(U_t)x_t|\widehat{\varvec{\theta }}^{(k)},\varvec{Z}_t]\), \(\widehat{ux^2}_t^{(k)}=\text {E}[\kappa ^{-1}(U_t)x_t^2|\widehat{\varvec{\theta }}^{(k)},\varvec{Z}_t]\), \(\widehat{uxv}_t^{(k)}=\text {E}[\kappa ^{-1}(U_t)x_t V_t|\widehat{\varvec{\theta }}^{(k)},\varvec{Z}_t]\), and their computational expressions are as follows
with \(\widehat{r}_t=\widehat{\mu }_x+\widehat{\gamma }_x\widehat{a}_t/\widehat{c}_{1t}\), \(\widehat{s}_t=\widehat{\tau }_x/\widehat{c}_{1t}\). Here \(a_t=(\varvec{Z}_t-\varvec{\mu }_t)^{\top }\mathbf D ^{-1}(\varvec{\phi }_{t})\varvec{b}_t\), \(c_t=1+\phi _x\varvec{b}_t^{\top }\mathbf D ^{-1}(\varvec{\phi }_{t})\varvec{b}_t\), \(c_{1t}=1+\gamma _x\varvec{b}_t^{\top }\mathbf D ^{-1}(\varvec{\phi }_{t})\varvec{b}_t\).
M-step: Maximizing \(Q(\varvec{\theta }|\widehat{\varvec{\theta }}^{(k)})\) with respect to \(\varvec{\theta }\), we achieve the updated estimates \(\widehat{\varvec{\theta }}^{(k+1)}\) by the following iterative equations:
where \(\bar{X}_t=\frac{1}{p_t}\sum _{i=1}^{p_t} X_t^{(i)}\), \(\bar{Y}_t=\frac{1}{q_t}\sum _{j=1}^{q_t} Y_t^{(j)}\). Note that the estimators \(\widehat{\lambda }_x\) and \(\widehat{\phi }_x\) can be inferred from the one-to-one transformation \(\widehat{\lambda }_x=\widehat{\tau }_x/\sqrt{\widehat{\gamma }_x}\) and \(\widehat{\phi }_x=\widehat{\gamma }_x+\widehat{\tau }_x^2\).
1.3 Appendix C: The related derivatives
By direct calculations, we have the first derivatives of \(d_t\) and \(A_t\) respect to \(\varvec{\theta }\) as follows:
where \(\psi _t=\frac{\lambda _x\phi _x}{\sqrt{\phi _x+\lambda _x^2\Lambda _{xt}}}\).
In addition, we also need to calculate the following derivatives:
Rights and permissions
About this article
Cite this article
Cao, C., Chen, M., Wang, Y. et al. Heteroscedastic replicated measurement error models under asymmetric heavy-tailed distributions. Comput Stat 33, 319–338 (2018). https://doi.org/10.1007/s00180-017-0720-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-017-0720-8