Skip to main content
Log in

A mixed model approach to measurement error in semiparametric regression

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

An essential assumption in traditional regression techniques is that predictors are measured without errors. Failing to take into account measurement error in predictors may result in severely biased inferences. Correcting measurement-error bias is an extremely difficult problem when estimating a regression function nonparametrically. We propose an approach to deal with measurement errors in predictors when modelling flexible regression functions. This approach depends on directly modelling the mean and the variance of the response variable after integrating out the true unobserved predictors in a penalized splines model. We demonstrate through simulation studies that our approach provides satisfactory prediction accuracy largely outperforming previously suggested local polynomial estimators even when the model is incorrectly specified and is competitive with the Bayesian estimator.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Azzalini, A.: A class of distributions which includes the normal ones. Scand. J. Stat. 12, 171–178 (1985)

    MathSciNet  MATH  Google Scholar 

  • Azzalini, A.: The skew-normal and related families. Cambridge University Press, Cambridge (2013)

    Book  Google Scholar 

  • Berry, S.M., Carroll, R.J., Ruppert, D.: Bayesian smoothing and regression splines for measurement error problems. J. Am. Stat. Assoc. 97, 160–169 (2002)

    Article  MathSciNet  Google Scholar 

  • Carroll, R.J.: Covariance analysis in generalized linear measurement error models. Stat. Med. 8, 1075–1093 (1989)

    Article  Google Scholar 

  • Carroll, R.J., Maca, J.D., Ruppert, D.: Nonparametric regression with errors in covariates. Biometrika 86, 541–554 (1999)

    Article  MathSciNet  Google Scholar 

  • Carroll, R.J., Ruppert, D., Stefanski, L., Crainiceanu, C.: Measurement Error in Nonlinear Models: A Modern Perspective, 2nd edn. Chapman and Hall, Boca Raton (2006)

    Book  Google Scholar 

  • Cook, J.R., Stefanski, L.A.: Simulation-extrapolation estimation in parametric measurement error models. J. Am. Stat. Assoc. 89, 1314–1328 (1994)

    Article  Google Scholar 

  • Davidian, M., Giltinan, D.M.: Nonlinear Models for Repeated Measurement Data. Chapman and Hall, New York (1995)

    Google Scholar 

  • Delaigle, A., Gijbels, I.: Estimation of integrated squared density derivatives from a contaminated sample. J. Roy. Stat. Soc. B 64, 869–886 (2002)

    Article  MathSciNet  Google Scholar 

  • Delaigle, A., Gijbels, I.: Practical bandwidth selection in deconvolution kernel density estimation. Comput. Stat. Data Anal. 45, 249–267 (2004)

    Article  MathSciNet  Google Scholar 

  • Delaigle, A., Hall, P.: Using SIMEX for Smoothing-parameter Choice in Errors-invariables Problems. J Am Stat Assoc 103, 280–287 (2008)

    Article  Google Scholar 

  • Delaigle, A., Fan, J., Carroll, R.J.: A design-adaptive local polynomial estimator for the errors-in-variables problem. J. Am. Stat. Assoc. 104, 348–359 (2009)

    Article  MathSciNet  Google Scholar 

  • Eilers, P.H.C., Marx, B.D.: Flexible smoothing with B-splines and penalties (with discussion). Stat. Sci. 11, 89–102 (1996)

    Article  Google Scholar 

  • Fan, J., Truong, Y.K.: Nonparametric regression with errors in variables. Ann. Stat. 21, 1900–1925 (1993)

    MathSciNet  MATH  Google Scholar 

  • Fuller, W.A.: Measurement Error Models. John Wiley and Sons, New York (1987)

    Book  Google Scholar 

  • Ganguli, B., Staudenmayer, J., Wand, M.P.: Additive models with predictors subject to measurement error. Aust. N. Z. J. Stat. 47, 193–202 (2005)

    Article  MathSciNet  Google Scholar 

  • Harezlak, J., Ruppert, D., Wand, M.P.: Semiparametric Regression with R, Use R!. Springer, New York (2018)

    Book  Google Scholar 

  • Huang, X., Zhou, H.: An alternative local polynomial estimator for the error-in-variables problem. J. Nonparametr. Stat. 29, 301–325 (2017)

    Article  MathSciNet  Google Scholar 

  • Ruppert, D., Carroll, R.J.: Spatially adaptive penalties for spline fitting. Aust. N. Z. J. Stat. 42, 205–223 (2000)

    Article  Google Scholar 

  • Ruppert, D., Wand, M.P., Carroll, R.J.: Semiparametric Regression. Cambridge University Press, Cambridge, UK (2003)

    Book  Google Scholar 

  • Sarkar, A., Mallick, B.K., Carroll, R.J.: Bayesian semiparametric regression in the presence of conditionally heteroscedastic measurement and regression errors. Biometrics 70, 823–834 (2014)

    Article  MathSciNet  Google Scholar 

  • Spiegelman, D., Rosner, B., Logan, R.: Estimation and inference for logistic regression with covariate misclassification and measurement error, in main study/validation study designs. J. Am. Stat. Assoc. 95, 51–61 (2000)

    Article  Google Scholar 

  • Staudenmayer, J., Ruppert, D.: Local polynomial regression and simulation-extrapolation. J. R. Stat. Soc. Ser. B 66, 17–30 (2004)

    Article  MathSciNet  Google Scholar 

  • Wang, X., Shen, J., Ruppert, D.: On the asymptotics of penalized spline smoothing. Electr. J. Stat. 5, 1–17 (2011)

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad W. Hattab.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proof of Theorem 2

We will demonstrate (10) only since (7), (8), and (9) will follow similarly. First since \(b >a\),

$$\begin{aligned}&\int \limits _{-\infty }^{+\infty } (x-a)_+(x-b)_+ f(x) \, \mathrm {d} x = \int \limits _{b}^{+\infty } (x-a)(x-b) f(x) \, \mathrm {d} x \nonumber \\&\quad = \int \limits _{b}^{+\infty } (x-\mu +\mu -a)(x-\mu + \mu -b) f(x) \, \mathrm {d} x \nonumber \\&\quad = \int \limits _{b}^{+\infty } (x-\mu )^2 f(x) \, \mathrm {d} x + (2\mu -a-b)\int \limits _{b}^{+\infty } (x-\mu ) f(x) \, \mathrm {d} x \nonumber \\&\qquad + (\mu -a) (\mu -b) (1-F(b)). \end{aligned}$$
(20)

The first term in (20)

$$\begin{aligned}&\int \limits _{b}^{+\infty } (x-\mu )^2 f(x) \, \mathrm {d} x= \int \limits _{b}^{+\infty } (2\pi s^2)^{-1/2} (x-\mu )^2 \exp (-(x-\mu )^2/(2s^2)) \, \mathrm {d} x \nonumber \\&\quad = \int \limits _{b-\mu }^{+\infty } (2\pi s^2)^{-1/2} t^2 \exp (-t^2/(2s^2)) \, \mathrm {d} t \nonumber \\&\quad = -(2\pi s^2)^{-1/2} s^2 t \exp (-t^2/(2s^2)) \Big |_{b-\mu }^{+\infty }\nonumber \\&\quad + s^2 \int \limits _{b-\mu }^{+\infty } (2\pi s^2)^{-1/2} \exp (-t^2/(2s^2)) \, \mathrm {d} t \nonumber \\&\quad = (2\pi s^2)^{-1/2} s^2 (b-\mu ) \exp (-(\mu -b)^2/(2s^2)) +s^2 (1-F(b)) \nonumber \\&\quad = s^2 (b-\mu ) f(b) +s^2 (1-F(b)). \end{aligned}$$
(21)

Now, the second term in (20)

$$\begin{aligned}&(2\mu -a-b)\int \limits _{b}^{+\infty } (x-\mu ) f(x) \, \mathrm {d}x\nonumber \\&\quad = (2\mu -a-b) \int \limits _{b}^{+\infty } (2\pi s^2)^{-1/2} (x-\mu ) \exp (-(x-\mu )^2/(2s^2)) \, \mathrm {d} x \nonumber \\&\quad = (2\mu -a-b) \int \limits _{(b-\mu )^2/2}^{+\infty } (2\pi s^2)^{-1/2} \exp (-t/s^2) \, \mathrm {d} t \nonumber \\&\quad = (2\mu -a-b)( -s^2 (2\pi s^2)^{-1/2} \exp (-t/s^2) )\Big |_{(b-\mu )^2/2}^{+\infty } \nonumber \\&\quad = (2\mu -a-b)s^2f(b). \end{aligned}$$
(22)

Substituting (21) and (22) in (20) gives (10).

Appendix B: \(\mathrm{Cov}(\varvec{Y}|\varvec{w}, \varvec{u})\)

Using result 1, the (ij) entry of \(\mathrm{Cov}(\varvec{X} \varvec{\beta }|\varvec{w})\) is

$$\begin{aligned}&\mathrm{Cov}( \beta _0 + \beta _1 x_i, \beta _0 + \beta _1 x_j|\varvec{w}) = \beta _1^2 \mathrm{Cov}(x_i,x_j|\varvec{w}) \nonumber \\&\quad = \left\{ \begin{array}{ll} 0 &{} i\ne j \\ \beta _1^2 \frac{\sigma _x^2\sigma _{w|x}^2}{\sigma _x^2 + \sigma _{w|x}^2} &{} i= j \end{array} \right. . \end{aligned}$$
(23)

\(\mathrm{Cov}(\varvec{X} \varvec{\beta }|\varvec{w})\) is a diagonal matrix since the \(x_i\)’s are independent given \(\varvec{w}\). Similarly, the ith element of the vector \(\varvec{Z} \varvec{u} \) is a function of \(x_i\) only. Therefore, \(\mathrm{Cov}(\varvec{Z} \varvec{u}|\varvec{w}, \varvec{u})\) is a diagonal matrix as well. The (ii) entry of this matrix is

$$\begin{aligned} \mathrm{Cov}(\varvec{Z} \varvec{u}|\varvec{w}, \varvec{u})_{ii} = \mathrm{Var}\left( \sum ^{k}_{j=1} u_j (x_i - k_j)_+|w_i,\varvec{u}\right) . \end{aligned}$$
(24)

More compactly using vectors, (24) can be re-expressed as

$$\begin{aligned} \mathrm{Cov}(\varvec{Z} \varvec{u}|\varvec{w}, \varvec{u})_{ii} = \varvec{u}^\top \mathrm{Cov}(\varvec{z}_i|w_i) \varvec{u}. \end{aligned}$$
(25)

where \(z_i\) is the ith row vector of \(\varvec{Z}\). Specifically, \(\varvec{z}_i = \bigl [(x_i - k_1)_+, \ \ldots , (x_i-k_k)_+ \bigr ]^\top \). The (lr) entry of \(\mathrm{Cov}(\varvec{z}_i|w_i)\)

$$\begin{aligned}&\mathrm{Cov}\bigl ((x_i-k_l)_+, (x_i-k_r)_+|w_i\bigr ) = \mathrm{E}\bigl ( (x_i-k_l)_+ (x_i-k_r)_+ |w_i\bigr )\nonumber \\&\quad - \mathrm{E}\bigl ( (x_i-k_l)_+|w_i\bigr ) \mathrm{E}\bigl (x_i-k_r)_+ |w_i\bigr ). \end{aligned}$$
(26)

The last term can be found similar to (12) whereas the first term, by (10),

$$\begin{aligned}&\mathrm{E}\bigl ( (x_i-k_l)_+ (x_i-k_r)_+ |w_i\bigr ) = \frac{\sigma _x^2\sigma _{w|x}^2}{\sigma _x^2 + \sigma _{w|x}^2} f_i(k_r) \left( \frac{\sigma _x^2 w_i+ \mu _x \sigma _{w|x}^2}{\sigma _x^2 + \sigma _{w|x}^2} -k_l\right) \nonumber \\&\quad +\left( \frac{\sigma _x^2\sigma _{w|x}^2}{\sigma _x^2 + \sigma _{w|x}^2} + \left( \frac{\sigma _x^2 w_i + \mu _x \sigma _{w|x}^2}{\sigma _x^2 + \sigma _{w|x}^2}-k_l\right) \times \left( \frac{\sigma _x^2 w_i + \mu _x \sigma _{w|x}^2}{\sigma _x^2 + \sigma _{w|x}^2}-k_r\right) \right) \nonumber \\&\quad \times \bigl (1-F_i(k_r)\bigr ), \end{aligned}$$
(27)

assuming \(l\le r\).

Finally, similar to above, \(\mathrm{Cov}(\varvec{X} \varvec{\beta }, \varvec{Z} \varvec{u}|\varvec{w}, \varvec{u}) \) is a diagonal matrix since the ith element for both vectors \(\varvec{X} \varvec{\beta }\) and \(\varvec{Z} \varvec{u}\) depends on \(x_i\) alone. The (ii) entry of this matrix is

$$\begin{aligned} \mathrm{Cov}(\varvec{X} \varvec{\beta }, \varvec{Z} \varvec{u}|\varvec{w}, \varvec{u})_{ii}= & {} \mathrm{Cov}(\beta _0 + \beta _1 x_i,\varvec{z}_i^\top \varvec{u}|w_i,\varvec{u})\nonumber \\= & {} \beta _1 \mathrm{Cov}(x_i,\varvec{z}_i|w_i) \varvec{u}\nonumber \\= & {} \beta _1\sum _{j=1}^{k} u_j \mathrm{Cov}\bigl (x_i,(x_i-k_j)_+|w_i\bigr ) \end{aligned}$$
(28)

which can be directly found using (9).

Let \(v_i\) be (ii) entry of the \(\mathrm{Cov}(\varvec{Y}|\varvec{w}, \varvec{u})\). Putting (5), (23), (27) and (28) together concludes that \(\mathrm{Cov}(\varvec{Y}|\varvec{w}, \varvec{u})\) is a diagonal matrix with \(v_i\) being

$$\begin{aligned} v_i = \sigma _e^2 + \beta _1^2 \frac{\sigma _x^2\sigma _{w|x}^2}{\sigma _x^2 + \sigma _{w|x}^2} +2\beta _1 \mathrm{Cov}(x_i,\varvec{z}_i|w_i) \varvec{u} +\varvec{u}^\top \mathrm{Cov}(\varvec{z}_i|w_i) \varvec{u}.\nonumber \\ \end{aligned}$$
(29)

Appendix C: varying the number of knots

Figure 7 shows the MSE performance for the conditional approach for Case 1 while varying k the number of knots. It basically repeats the analysis in panel (a) of Fig. 2 for the conditional approach with \(k=10, 20, 30, 40\) and 50. Except when n is small, the performance of the estimator is slightly affected by choice of k.

Fig. 7
figure 7

MSE for Case 1 computed for the conditional approach for different number of knots; \(k=10\) (triangle), \(k=20\) (plus), \(k=30\) (cross), \(k=40\) (diamond) and \(k=50\) (inverted triangle)

Fig. 8
figure 8

Pointwise MSE for Case 1 computed at six points for the Bayesian approach (solid triangles), and the conditional approach (solid circles). The boundary points \(\{-2,2\}\) are shown in panels (a) and (b), the critical points \(\{-1,0,0.43\}\) are shown in panels (c),(d),(e), and the inflection point 0.81 is shown in (f)

Appendix D: MSE pointwise assessment

Figure 8 provides pointwise MSE for six points for the conditional and the Bayesian approaches for Case 1. Two points are on the boundary of the grid \(\{-2,2\}\), three critical points \(\{-1,0,0.43\}\), and one inflection point at 0.81.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hattab, M.W., Ruppert, D. A mixed model approach to measurement error in semiparametric regression. Stat Comput 31, 31 (2021). https://doi.org/10.1007/s11222-021-10005-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11222-021-10005-x

Keywords

Navigation