Abstract
The matched case–control study is a popular design in public health, biomedical, and epidemiological research for human, animal, and other subjects for clustered binary outcomes. Often covariates in such studies are measured with error. Not accounting for this error can lead to incorrect inference for all covariates in the model. The methods for assessing and characterizing error-in-covariates in matched case–control studies are quite limited. In this article we propose several approaches for handling error-in-covariates that detect both parametric and nonparametric relationships between the covariates and the binary outcome. We propose a Bayesian approach and two approximate-Bayesian approaches for addressing error-in-covariates that is additive and Gaussian, where the variable measured with error has an unknown, nonlinear relationship with the response. The Bayesian approaches use an approximate latent variable probit model. All methods are developed using the nonparametric method of low-rank thin-plate splines. We assess the performance of each method in terms of mean squared error and mean bias in both simulations and a perturbed example of 1–4 matched case-crossover study.
Similar content being viewed by others
References
Agresti A (2002) Categorical data analysis, 2nd edn. Wiley series in probability and statistics. Wiley, Hoboken
Albert J, Chib S (1993) Bayesian-analysis of binary and polytochtomous response data. J Am Stat Assoc 88(422):669–679. https://doi.org/10.2307/2290350
Bartlett J, Keogh R (2018) Bayesian correction for covariate measurement error: a frequentist evaluation and comparison with regression calibration. Stat Methods Med Res 27:1695–1708
Berry SM, Carroll RJ, Ruppert D (2002) Bayesian smoothing and regression splines for measurement error problems. J Am Stat Assoc 97(457):160–169. https://doi.org/10.1198/016214502753479301
Buzas JS, Stefanski LA (1996) A note on corrected-score estimation. Stat Probab Lett 28(1):1–8. https://doi.org/10.1016/0167-7152(95)00074-7
Camilli G (1994) Origin of the scaling constant \(\text{ d }=1.7\), in item response theory. J Educ Behav Stat 19(3):293–295
Carroll R, Roeder K, Wasserman L (1999) Flexible parametric measurement error models. Biometrics 55(1):44–54. https://doi.org/10.1111/j.0006-341X.1999.00044.x
Carroll R, Ruppert D, Tosteson T, Crainiceanu C, Karagas M (2004) Nonparametric regression and instrumental variables. J Am Stat Assoc 99:736–750
Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM (2006) Measurement error in nonlinear models: a modern perspective, 2nd edn. Monographs on statistics and applied probability. Chapman and Hall/CRC, Boca Raton
Crainiceanu C, Ruppert D, Wand MP (2005) Bayesian analysis for penalized spline regression using winbugs. J Stat Softw 14(1):1–14
Eaton JW et al (2008) GNU Octave 3.0.5. www.gnu.org/software/octave/
Geman S, Geman D (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6(6):721–741
Guolo A (2008) A flexible approach to measurement error correction in case–control studies. Biometrics 64(4):1207–1214. https://doi.org/10.1111/j.1541-0420.2008.00999.x
Guolo A, Brazzale AR (2008) A simulation-based comparison of techniques to correct for measurement error in matched case–control studies. Stat Med 27(19):3755–3775. https://doi.org/10.1002/sim.3282
Gustafson P (2003) Measurement error and misclassification in statistics and epidemiology: impacts and Bayesian adjustments. CRC Press, Boca Raton
Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1):97–109. https://doi.org/10.2307/2334940
Hosmer DW Jr, Lemeshow S (2000) Applied logistic regression, 2nd edn. Wiley series in probability and statistics. Wiley, Hoboken
Huang Y, Wang C (2000) Cox regression with accurate covariates unascertainable: a nonparametric-correction approach. J Am Stat Assoc 95(452):1209–1219
Huang Y, Wang C (2001) Consistent functional methods for logistic regression with errors in covariates. J Am Stat Assoc 96(456):1469–1482
MATLAB (2012) Version 7.14.0.739 (R2012a). The MathWorks Inc., Natick
McShane L, Midthune D, Dorgan J, Freedman L, Carroll R (2001) Covariate measurement error adjustment for matched case–control studies. Biometrics 57(1):62–73. https://doi.org/10.1111/j.0006-341X.2001.00062.x
Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21(6):1087–1092. https://doi.org/10.1063/1.1699114
Parker PA, Vining GG, Wilson SR, Szarka JL III, Johnson NG (2010) The prediction properties of classical and inverse regression for the simple linear calibration problem. J Qual Technol 42(4):332–347
Peleg AY, Husain S, Qureshi ZA, Silveira FP, Sarumi M, Shutt KA, Kwak EJ, Paterson DL (2007) Risk factors, clinical characteristics, and outcome of nocardia infection in organ transplant recipients: a matched case–control study. Clin Infect Dis 44(10):1307–1314. https://doi.org/10.1086/514340
Ruppert D, Wand MP, Carroll RJ (2003) Semiparametric regression. Cambridge series on statistical and probabilistic mathematics. Cambridge University Press, New York
Ryu D, Li E, Mallick B (2011) Bayesian nonparametric regression analysis of data with random effects covariates from longitudinal measurements. Biometrics 67:454–466
Scott AJ, Wild CJ (1997) Fitting regression models to case–control data by maximum likelihood. Biometrika 84(1):57–71
Shaby B, Wells M (2010) Exploring an adaptive metropolis algorithm. Technical report, Department of Statistical Science, Duke University
Sinha S, Mukherjee B, Ghosh M, Mallick BK, Carroll RJ (2005) Semiparametric Bayesian analysis of matched case–control studies with missing exposure. J Am Stat Assoc 100(470):591–601
Sinha S, Mallick B, Kipnis V, Carroll R (2010) Semiparametric Bayesian analysis of nutritional epidemiology data in the presence of measurement error. Biometrics 66:444–454
Stefanski LA, Carroll RJ (1987) Conditional scores and optimal scores for generalized linear measurement-error models. Biometrika 74(4):703–716. https://doi.org/10.1093/biomet/74.4.703
Tester J, Rutherford G, Wald Z, Rutherford M (2004) A matched case–control study evaluating the effectiveness of speed humps in reducing child pedestrian injuries. Am J Public Health 94(4):646–650. https://doi.org/10.2105/AJPH.94.4.646
Tierney L, Kadane J (1986) Accurate approximations for posterior moments and marginal densities. J Am Stat Assoc 81(393):82–86. https://doi.org/10.2307/2287970
Whitney CG, Pilishvili T, Farley MM, Schaffner W, Craig AS, Lynfield R, Nyquist A-C, Gershman KA, Vazquez M, Bennett NM, Reingold A, Thomas A, Glode MP, Zell ER, Jorgensen JH, Beall B, Schuchat A (2006) Effectiveness of seven-valent pneumococcal conjugate vaccine against invasive pneumococcal disease: a matched case–control study. Lancet 368(9546):1495–1502. https://doi.org/10.1016/S0140-6736(06)69637-2
Woodward M (2013) Epidemiology: study design and data analysis, 3rd edn. Chapman & Hall, Boca Raton
Acknowledgements
We would like to thank Pang Du, Leanna House, Scotland Leman, George Terrell, and Matt Williams for their advice and assistance. We would also like to thank Ho Kim for supplying the aseptic meningitis data.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix
Marvok chain Monte Carlo details for implementation
The full poserior conditional distributions are as follows:
-
Full conditional for \(x_{ij}\) is:
$$\begin{aligned}{}[x_{ij}|-] \propto&L(l_{ij},W_{ij}|Y_{ij},x_{ij},Z_{ij},\beta ,q(S),\sigma ^2_u)\times N(x_{ij};\mu _x,\sigma ^2_x), \end{aligned}$$ -
Full conditional for \(\sigma ^2_u\) is:
$$\begin{aligned}{}[\sigma ^2_u|-] \sim&IG\left[ \sigma ^2_u; (1/2)\sum _{i=1}^N\sum _{j=1}^{M+1} K_{ij}+A_u\right. , \\&\left. \sum _{i=1}^N\sum _{j=1}^{M+1}(W_{ij}-x_{ij})^T(W_{ij}-x_{ij})/2+B_u\right] , \end{aligned}$$ -
Full conditional for \(\mu _x\) is:
$$\begin{aligned}{}[\mu _x |-] \sim&N\left\{ \mu _x ; \left[ t^2_\mu \sum _{i=1}^N \sum _{j=1}^{M+1} x_{ij} + g_\mu \sigma ^2_x\right] \bigg /[ N(M+1)t^2_\mu +\sigma ^2_x],\right. \\&\left. t^2_\mu \sigma ^2_x/[ N(M+1)t^2_\mu + \sigma ^2_x] \right\} , \end{aligned}$$ -
Full conditional for \(\sigma ^2_x\) is:
$$\begin{aligned}{}[\sigma ^2_x|-] \sim&IG[\sigma ^2_x; N(M+1)/2 + A_x, (x-\mu _x)^T(x-\mu _x)/2 + B_x] , \end{aligned}$$ -
Full conditional for \((\beta _x,\beta _z)\) is:
$$\begin{aligned}{}[\beta _x,\beta _z|-] \sim&MN\{ \beta _x,\beta _z ; \\&[(X,Z)^T(X,Z)+I/t^2_\beta ]^{-1}(X,Z)^T[l-L_p(x)\beta _L-Jq(S)], \\&[(X,Z)^T(X,Z)+I/t^2_\beta ]^{-1} \} , \end{aligned}$$ -
Full conditional for \(\beta _L\) is:
$$\begin{aligned}{}[\beta _L|-] \sim&MN\{ \beta _L ; \\&[L_p(x)^T L_p(x)+I/\sigma ^2_\beta ]^{-1}L_p(x)^T[l-X\beta _x-Z\beta _z-Jq(S)], \\&[L_p(x)^T L_p(x)+I/\sigma ^2_\beta ]^{-1} \} , \end{aligned}$$ -
Full conditional for \(\sigma ^2_\beta \) is:
$$\begin{aligned}{}[\sigma ^2_\beta |-] \sim&IG(\sigma ^2_\beta ; \kappa /2 + A_\beta , \beta _L^T\beta _L/2 + B_\beta ) , \end{aligned}$$ -
Full conditional for q(S) is:
$$\begin{aligned}{}[q(S)|-] \sim&MN[ q(S) ; \\&(J^T J +I/\sigma ^2_q)^{-1}\{\beta _0/\sigma ^2_q+J^T[l-X\beta _x-Z\beta _z-L_p(x)\beta _L]\}, \\&(J^T J+I/\sigma ^2_q)^{-1} ] , \end{aligned}$$ -
Full conditional for \(\sigma ^2_q\) is:
$$\begin{aligned}{}[\sigma ^2_q|-] \sim&IG\{\sigma ^2_q; N/2 + A_q, [q(S)-\beta _0]^T[q(S)-\beta _0]/2 + B_q\} , \end{aligned}$$ -
Full conditional for \(\beta _0\) is:
$$\begin{aligned}{}[\beta _0|-] \sim&N\left\{ \beta _0 ; \bigg [t^2_0\sum _{i=1}^N q(S_i)+g_0\sigma ^2_q\bigg ]/(N t^2_0+\sigma ^2_q), t^2_q \sigma ^2_q/(N t^2_q + \sigma ^2_q) \right\} , \end{aligned}$$
where J is a \(N(M+1)\times N\) matrix defined by the Kronecker product \(I_{N\times N}\otimes 1_{(M+1)\times 1}\).
When choosing a proposal distribution for \(x_{ij}\), we followed Berry et al. (2002) and used \(x_{ij}^{(t)} \sim N(x_{ij}^{(t-1)},2^2\sigma ^{2^{(t-1)}}_u/K_{ij})\), where \(2\sigma _u/\sqrt{K_{ij}}\) is chosen as the proposal standard deviation because it covers about 95% of the sampling distribution for \(\bar{w}_{ij\cdot } = K_{ij}^{-1}\sum _{k=1}^{K_{ij}}w_{ijk}\). Alternatively, an automatically tuned proposal distribution (Shaby and Wells 2010) could be used to ensure optimal acceptance rates.
Derivation of Laplace approximations
1.1 First order Laplace approximation for approximate-Bayesian methods
The goal of this section is to show using first order Laplace approximation that,
Note that:
We will write \(A(x_{ij})=L(l_{ij}|Y_{ij},x_{ij},Z_{ij},\beta )(2\pi \sigma _u^2)^{-K_{ij}/2}\) and \(h(x_{ij}) = (W_{ij}-x_{ij})^T(W_{ij}-x_{ij})/(2\sigma ^2_u)\). It is easy to show \(h(x_{ij})\) has unique maximum \(\bar{w}_{ij\cdot }\), since \(h(\cdot )\) is a quadratic form, and that the second derivative \(h^{\prime \prime }(x_{ij}) = 1/\sigma ^2_u\), both for all ij. Tierney and Kadane (1986) show that we can approximate \(\int \! A(x_{ij})\exp [-h(x_{ij})] \, \mathrm {d}x_{ij}\) by \(A(\tilde{x})\exp [-h(\tilde{x})]\sqrt{\frac{2\pi }{K_{ij}h^{\prime \prime }(\tilde{x})}}\), where \(\tilde{x}\) is the value that maximizes \(h(\cdot )\). We then get:
It is then clear that:
It follows from a similar argument that,
1.2 First order Laplace for E2 approach to E-step
Consider now where the goal is to use first order Laplace approximation to find:
We can rewrite the integration as follows:
where:
It should be clear since \(h(x_{ij})\) is the sum of two quadratic functions of \(x_{ij}\), that the unique maximum of \(h(\cdot )\) is the Bayes estimator \(\tilde{x}_{ij} = \frac{K_{ij}\bar{w}_{ij\cdot }\sigma ^2_x+\mu _x\sigma ^2_u}{K_{ij}\sigma ^2_x+\sigma ^2_u}\). Also the second derivative \(h^{\prime \prime }(x_{ij}) = \frac{1}{\sigma ^2_x\sigma ^2_u}\). It follows then that:
It follows from a similar argument as in Appendix 2.1 that:
Rights and permissions
About this article
Cite this article
Johnson, N.G., Kim, I. Semiparametric approaches for matched case–control studies with error-in-covariates. Comput Stat 34, 1675–1692 (2019). https://doi.org/10.1007/s00180-019-00888-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-019-00888-w