Skip to main content
Log in

Semiparametric approaches for matched case–control studies with error-in-covariates

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

The matched case–control study is a popular design in public health, biomedical, and epidemiological research for human, animal, and other subjects for clustered binary outcomes. Often covariates in such studies are measured with error. Not accounting for this error can lead to incorrect inference for all covariates in the model. The methods for assessing and characterizing error-in-covariates in matched case–control studies are quite limited. In this article we propose several approaches for handling error-in-covariates that detect both parametric and nonparametric relationships between the covariates and the binary outcome. We propose a Bayesian approach and two approximate-Bayesian approaches for addressing error-in-covariates that is additive and Gaussian, where the variable measured with error has an unknown, nonlinear relationship with the response. The Bayesian approaches use an approximate latent variable probit model. All methods are developed using the nonparametric method of low-rank thin-plate splines. We assess the performance of each method in terms of mean squared error and mean bias in both simulations and a perturbed example of 1–4 matched case-crossover study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

Download references

Acknowledgements

We would like to thank Pang Du, Leanna House, Scotland Leman, George Terrell, and Matt Williams for their advice and assistance. We would also like to thank Ho Kim for supplying the aseptic meningitis data.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Inyoung Kim.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

Marvok chain Monte Carlo details for implementation

The full poserior conditional distributions are as follows:

  • Full conditional for \(x_{ij}\) is:

    $$\begin{aligned}{}[x_{ij}|-] \propto&L(l_{ij},W_{ij}|Y_{ij},x_{ij},Z_{ij},\beta ,q(S),\sigma ^2_u)\times N(x_{ij};\mu _x,\sigma ^2_x), \end{aligned}$$
  • Full conditional for \(\sigma ^2_u\) is:

    $$\begin{aligned}{}[\sigma ^2_u|-] \sim&IG\left[ \sigma ^2_u; (1/2)\sum _{i=1}^N\sum _{j=1}^{M+1} K_{ij}+A_u\right. , \\&\left. \sum _{i=1}^N\sum _{j=1}^{M+1}(W_{ij}-x_{ij})^T(W_{ij}-x_{ij})/2+B_u\right] , \end{aligned}$$
  • Full conditional for \(\mu _x\) is:

    $$\begin{aligned}{}[\mu _x |-] \sim&N\left\{ \mu _x ; \left[ t^2_\mu \sum _{i=1}^N \sum _{j=1}^{M+1} x_{ij} + g_\mu \sigma ^2_x\right] \bigg /[ N(M+1)t^2_\mu +\sigma ^2_x],\right. \\&\left. t^2_\mu \sigma ^2_x/[ N(M+1)t^2_\mu + \sigma ^2_x] \right\} , \end{aligned}$$
  • Full conditional for \(\sigma ^2_x\) is:

    $$\begin{aligned}{}[\sigma ^2_x|-] \sim&IG[\sigma ^2_x; N(M+1)/2 + A_x, (x-\mu _x)^T(x-\mu _x)/2 + B_x] , \end{aligned}$$
  • Full conditional for \((\beta _x,\beta _z)\) is:

    $$\begin{aligned}{}[\beta _x,\beta _z|-] \sim&MN\{ \beta _x,\beta _z ; \\&[(X,Z)^T(X,Z)+I/t^2_\beta ]^{-1}(X,Z)^T[l-L_p(x)\beta _L-Jq(S)], \\&[(X,Z)^T(X,Z)+I/t^2_\beta ]^{-1} \} , \end{aligned}$$
  • Full conditional for \(\beta _L\) is:

    $$\begin{aligned}{}[\beta _L|-] \sim&MN\{ \beta _L ; \\&[L_p(x)^T L_p(x)+I/\sigma ^2_\beta ]^{-1}L_p(x)^T[l-X\beta _x-Z\beta _z-Jq(S)], \\&[L_p(x)^T L_p(x)+I/\sigma ^2_\beta ]^{-1} \} , \end{aligned}$$
  • Full conditional for \(\sigma ^2_\beta \) is:

    $$\begin{aligned}{}[\sigma ^2_\beta |-] \sim&IG(\sigma ^2_\beta ; \kappa /2 + A_\beta , \beta _L^T\beta _L/2 + B_\beta ) , \end{aligned}$$
  • Full conditional for q(S) is:

    $$\begin{aligned}{}[q(S)|-] \sim&MN[ q(S) ; \\&(J^T J +I/\sigma ^2_q)^{-1}\{\beta _0/\sigma ^2_q+J^T[l-X\beta _x-Z\beta _z-L_p(x)\beta _L]\}, \\&(J^T J+I/\sigma ^2_q)^{-1} ] , \end{aligned}$$
  • Full conditional for \(\sigma ^2_q\) is:

    $$\begin{aligned}{}[\sigma ^2_q|-] \sim&IG\{\sigma ^2_q; N/2 + A_q, [q(S)-\beta _0]^T[q(S)-\beta _0]/2 + B_q\} , \end{aligned}$$
  • Full conditional for \(\beta _0\) is:

    $$\begin{aligned}{}[\beta _0|-] \sim&N\left\{ \beta _0 ; \bigg [t^2_0\sum _{i=1}^N q(S_i)+g_0\sigma ^2_q\bigg ]/(N t^2_0+\sigma ^2_q), t^2_q \sigma ^2_q/(N t^2_q + \sigma ^2_q) \right\} , \end{aligned}$$

where J is a \(N(M+1)\times N\) matrix defined by the Kronecker product \(I_{N\times N}\otimes 1_{(M+1)\times 1}\).

When choosing a proposal distribution for \(x_{ij}\), we followed Berry et al. (2002) and used \(x_{ij}^{(t)} \sim N(x_{ij}^{(t-1)},2^2\sigma ^{2^{(t-1)}}_u/K_{ij})\), where \(2\sigma _u/\sqrt{K_{ij}}\) is chosen as the proposal standard deviation because it covers about 95% of the sampling distribution for \(\bar{w}_{ij\cdot } = K_{ij}^{-1}\sum _{k=1}^{K_{ij}}w_{ijk}\). Alternatively, an automatically tuned proposal distribution (Shaby and Wells 2010) could be used to ensure optimal acceptance rates.

Derivation of Laplace approximations

1.1 First order Laplace approximation for approximate-Bayesian methods

The goal of this section is to show using first order Laplace approximation that,

$$\begin{aligned} \int \! L(l,W|Y,x,Z,S,\beta ,\sigma ^2_u) \, \mathrm {d}x \approx L(l|Y,x=\bar{w},Z,S,\beta ). \end{aligned}$$

Note that:

$$\begin{aligned} L(l_{ij},W_{ij}|Y_{ij},x_{ij},Z_{ij},S,\beta ,\sigma ^2_u)&= L(l_{ij}|Y_{ij},x_{ij},Z_{ij},S_i,\beta )\times N(W_{ij};x_{ij},\sigma ^2_u) \\&= L(l_{ij}|Y_{ij},x_{ij},Z_{ij},S_i,\beta )(2\pi \sigma _u^2)^{K_{ij}/2} \\&\quad \times \exp [-(W_{ij}-x_{ij})^T(W_{ij}-x_{ij})/(2\sigma ^2_u)]. \end{aligned}$$

We will write \(A(x_{ij})=L(l_{ij}|Y_{ij},x_{ij},Z_{ij},\beta )(2\pi \sigma _u^2)^{-K_{ij}/2}\) and \(h(x_{ij}) = (W_{ij}-x_{ij})^T(W_{ij}-x_{ij})/(2\sigma ^2_u)\). It is easy to show \(h(x_{ij})\) has unique maximum \(\bar{w}_{ij\cdot }\), since \(h(\cdot )\) is a quadratic form, and that the second derivative \(h^{\prime \prime }(x_{ij}) = 1/\sigma ^2_u\), both for all ij. Tierney and Kadane (1986) show that we can approximate \(\int \! A(x_{ij})\exp [-h(x_{ij})] \, \mathrm {d}x_{ij}\) by \(A(\tilde{x})\exp [-h(\tilde{x})]\sqrt{\frac{2\pi }{K_{ij}h^{\prime \prime }(\tilde{x})}}\), where \(\tilde{x}\) is the value that maximizes \(h(\cdot )\). We then get:

$$\begin{aligned} \int \! A(x_{ij})\exp [-h(x_{ij})] \, \mathrm {d}x_{ij}&\approx L(l_{ij}|Y_{ij},\bar{w}_{ij},Z_{ij},S_i,\beta ) \left[ K_{ij}(2 \pi \sigma ^2_u)^{K_{ij}-1} \right] ^{-1/2} \\&\quad ~ \times \exp [-(W-\bar{w})^T(W-\bar{w})/(2\sigma ^2_u)] \\&\propto L(l_{ij}|Y_{ij},\bar{w}_{ij},Z_{ij},S_i,\beta ). \end{aligned}$$

It is then clear that:

$$\begin{aligned}&\int \! L(l,W|Y,x,Z,S,\beta ,\sigma ^2_u) \, \mathrm {d}x\\&\quad = \prod _{i=1}^N\prod _{j=1}^{M+1} \int \! L(l_{ij},W_{ij}|Y_{ij},x_{ij},Z_{ij},S_i,\beta ,\sigma ^2_u) \, \mathrm {d} x_{ij} \\&\quad \approx \prod _{i=1}^N\prod _{j=1}^{M+1} \bigg \{ L(l_{ij}|Y_{ij},\bar{w}_{ij},Z_{ij},S_i,\beta )\left[ K_{ij}(2 \pi \sigma ^2_u)^{K_{ij}-1} \right] ^{-1/2} \\&\qquad \times \exp [-(W_{ij}-\bar{w}_{ij})^T(W_{ij}-\bar{w}_{ij})/(2\sigma ^2_u)] \bigg \} \\&\quad \propto \prod _{i=1}^N\prod _{j=1}^{M+1} L(l_{ij}|Y_{ij},\bar{w}_{ij},Z_{ij},\beta ) \\&\quad = L(l|Y,\bar{w},Z,\beta ). \end{aligned}$$

It follows from a similar argument that,

$$\begin{aligned}&\int \! L(l,W|Y,x,Z,S,\beta ,\sigma ^2_u)N(x|\mu _x,\sigma ^2_x) \, \mathrm {d}x\\&\quad \approx L\left( l|Y,x=\frac{K\bar{w}\sigma ^2_x+\mu _x\sigma ^2_u}{K\sigma ^2_x+\sigma ^2_u},Z,S,\beta \right) . \end{aligned}$$

1.2 First order Laplace for E2 approach to E-step

Consider now where the goal is to use first order Laplace approximation to find:

$$\begin{aligned} E\{\log [L(Y|x,Z,S,\beta )]\}&= \int \! \log [L(Y|x,Z,S,\beta )]N(W;x,\sigma ^2_u)N(x;\mu _x,\sigma ^2_x) \, \mathrm {d}x, \\&\approx \log [L(Y|\tilde{x},Z,S,\beta )], \\ \tilde{x}_{ij}&= \frac{K_{ij}\bar{w}_{ij\cdot }\sigma ^2_x+\mu _x\sigma ^2_u}{K_{ij}\sigma ^2_x+\sigma ^2_u}. \end{aligned}$$

We can rewrite the integration as follows:

$$\begin{aligned}&\int \! \log [L(Y_{ij}|x_{ij},Z_{ij},S_i,\beta )]N(W_{ij};x_{ij},\sigma ^2_u)N(x_{ij};\mu _x,\sigma ^2_x) \, \mathrm {d}x_{ij} \\&\quad = \int \! \log [L(Y_{ij}|x_{ij},Z_{ij},S_i,\beta )] (2\pi \sigma ^2_u)^{-K_{ij}/2} \\&\qquad \times \exp \left[ -0.5\sum _{k=1}^{K_{ij}}(w_{ijk}-x_{ij})^2/\sigma ^2_u\right] \\&\qquad \times (2\pi \sigma ^2_x)^{-1/2}\exp \left[ -0.5(x_{ij}-\mu _x)^2/\sigma ^2_x\right] \, \mathrm {d}x_{ij} \\&\quad \approx \log [L(Y|\tilde{x},Z,S,\beta )], \\&\quad = \int \! A(x_{ij})\exp \left[ -h(x_{ij})\right] \, \mathrm {d} x_{ij}, \end{aligned}$$

where:

$$\begin{aligned} A(x_{ij})&= \log [L(Y_{ij}|x_{ij},Z_{ij},S_i,\beta )] (2\pi \sigma ^2_u)^{-K_{ij}/2}(2\pi \sigma ^2_x)^{-1/2}, \quad \text {and} \\ h(x_{ij})&= \frac{\sum _{k=1}^{K_{ij}}(w_{ijk}-x_{ij})^2+(x_{ij}-\mu _x)^2}{2\sigma ^2_x\sigma ^2_x}. \end{aligned}$$

It should be clear since \(h(x_{ij})\) is the sum of two quadratic functions of \(x_{ij}\), that the unique maximum of \(h(\cdot )\) is the Bayes estimator \(\tilde{x}_{ij} = \frac{K_{ij}\bar{w}_{ij\cdot }\sigma ^2_x+\mu _x\sigma ^2_u}{K_{ij}\sigma ^2_x+\sigma ^2_u}\). Also the second derivative \(h^{\prime \prime }(x_{ij}) = \frac{1}{\sigma ^2_x\sigma ^2_u}\). It follows then that:

$$\begin{aligned} \int \! A(x_{ij})\exp \left[ -h(x_{ij})\right] \, \mathrm {d} x_{ij}&\approx A(\tilde{x}_{ij})\exp \left[ -h(\tilde{x}_{ij})\right] \sqrt{\frac{2\pi \sigma ^2_u\sigma ^2_x}{K_{ij}+1}}, \\&\propto \log [L(Y_{ij}|\tilde{x}_{ij},Z_{ij},S_i,\beta )]. \end{aligned}$$

It follows from a similar argument as in Appendix 2.1 that:

$$\begin{aligned} E\{\log [L(Y|x,Z,S,\beta )]\} \approx \log [L(Y|\tilde{x},Z,S,\beta )]. \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Johnson, N.G., Kim, I. Semiparametric approaches for matched case–control studies with error-in-covariates. Comput Stat 34, 1675–1692 (2019). https://doi.org/10.1007/s00180-019-00888-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-019-00888-w

Keywords

Navigation