Skip to main content
Log in

Bayesian analysis for matrix-variate logistic regression with/without response misclassification

  • Original Paper
  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Matrix-variate logistic regression is useful in facilitating the relationship between the binary response and matrix-variates which arise commonly from medical imaging research. However, inference based on such a model is impaired by the presence of the response misclassification and spurious covariates It is imperative to account for the misclassification effects and select active covatiates when employing matrix-variate logistic regression to handle such data. In this paper, we develop Bayesian inferential methods with the horse-shoe prior. We numerically examine the biases induced from the naive analysis which ignores misclassification of responses. The performance of the proposed methods is justified empirically and their usage is illustrated by the application to the Lee Silverman Voice Treatment (LSVT) Companion data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bhattacharya, A., Pati, D., Pillai, N.S., Dunson, D.B.: Dirichlet–Laplace priors for optimal shrinkage. J. Am. Stat. Assoc. 110, 1479–1490 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  • Biane, P., Pitman, J., Yor, M.: Probability laws related to the Jacobi theta and Riemann zeta functions, and Brownian excursions. Bull. Am. Math. Soc. 38, 435–465 (2001)

  • Carvalho, C.M., Polson, N.G., Scott, J.G.: The horseshoe estimator for sparse signals. Biometrika 97, 465–480 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Choi, H.M., Hobert, J.P.: The Polya-Gamma Gibbs sampler for Bayesian logistic regression is uniformly ergodic. Electron. J. Stat. 7, 2054–2064 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Dellaportas, P., Stephens, D.A.: Bayesian analysis of errors-in-variables regression models. Biometrics 51, 1085–1095 (1993)

    Article  MATH  Google Scholar 

  • Fang, J., Yi, G.Y.: Matrix-variate logistic regression with measurement error. Biometrika 108, 83–97 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  • Gamerman, D.: Sampling from the posterior distribution in generalized linear mixed models. Stat. Comput. 7, 57–68 (1997)

    Article  Google Scholar 

  • George, E.I., McCulloch, R.E.: Variable selection via Gibbs sampling. J. Am. Stat. Assoc. 88, 881–889 (1993)

    Article  Google Scholar 

  • Gerlach, R., Stamey, J.: Bayesian model selection for logistic regression with misclassified outcomes. Stat. Model. 7, 255–273 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • Gramacy, R.B., Polson, N.G.: Simulation-based regularized logistic regression. Bayesian. Analysis 7, 567–590 (2012)

    MathSciNet  MATH  Google Scholar 

  • Guhaniyogi, R., Qamar, S., Dunson, D.B.: Bayesian tensor regression. J. Mach. Learn. Res. 18, 2733–2763 (2017)

    MathSciNet  MATH  Google Scholar 

  • Gustafson, P.: Measurement Error and Misclassification in Statistics and Epidemiology: Impacts and Bayesian Adjustments. CRC Press, Boca Raton (2003)

    Book  MATH  Google Scholar 

  • Holmes, C.C., Held, L.: Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Anal. 1, 145–168 (2006)

    MathSciNet  MATH  Google Scholar 

  • Hung, H., Wang, C.-C.: Matrix variate logistic regression model with application to EEG data. Biostatistics 14, 189–202 (2013)

    Article  Google Scholar 

  • Ishwaran, H., Rao, J.S.: Spike and slab variable selection: frequentist and Bayesian strategies. Ann. Stat. 33, 730–773 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • McInturff, P., Johnson, W.O., Cowling, D., Gardner, I.A.: Modelling risk when binary outcomes are subject to error. Stat. Med. 23, 1095–1109 (2004)

    Article  Google Scholar 

  • Paulino, C.D., Soares, P., Neuhaus, J.: Binomial regression with misclassification. Biometrics 59, 670–675 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • Polson, N.G., Scott, J.G., Windle, J.: Bayesian inference for logistic models using Polya-Gamma latent variables. J. Am. Stat. Assoc. 108, 1339–1349 (2013)

    Article  MATH  Google Scholar 

  • Polson, N.G., Scott, J.G., Windle, J.: The Bayesian bridge. J. R. Stat. Soc. B 76, 713–733 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  • Rekaya, R., Weigel, K.A., Gianola, D.: Threshold model for misclassified binary responses with applications to animal breeding. Biometrics 57, 1123–1129 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  • Richardson, S., Gilks, W.R.: A Bayesian approach to measurement error problems in epidemiology using conditional independence models. Am. J. Epidemiol. 138, 430–442 (1993)

    Article  Google Scholar 

  • Rossi, P.E., Allenby, G.M., McCulloch, R.: Bayesian Statistics and Marketing. Wiley, New York (2005)

    Book  MATH  Google Scholar 

  • Tanner, M.A., Wong, W.H.: The calculation of posterior distributions by data augmentation. J. Am. Stat. Assoc. 82, 528–540 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  • Tibshirani, R.: Regression Shrinkage and Selection via the lasso. J. R. Stat. Soc.: Ser. B (Methodol.) 58, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  • Tsanas, A., Little, M.A., Fox, C., Ramig, L.O.: Objective automatic assessment of rehabilitative speech treatment in Parkinson’s disease. IEEE Trans. Neural Syst. Rehabil. Eng. 22, 181–190 (2013)

    Article  Google Scholar 

  • Wei, R., Ghosal, S.: Contraction properties of shrinkage priors in logistic regression. J. Stat. Plan. Inference 207, 215–229 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  • Zeger, S.L., Karim, M.: Generalized linear models with random effects: a Gibbs sampling approach. J. Am. Stat. Assoc. 86, 79–86 (1991)

    Article  MathSciNet  Google Scholar 

  • Zellner, A., Rossi, P.E.: Bayesian analysis of dichotomous quantal response models. J. Econom. 25, 365–393 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  • Zhou, H., Li, L., Zhu, H.: Tensor regression with applications in neuroimaging data analysis. J. Am. Stat. Assoc. 108, 540–552 (2013)

Download references

Funding

This research was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC). Yi is Canada Research Chair in Data Science (Tier 1). Her research was undertaken, in part, thanks to funding from the Canada Research Chairs program.

Author information

Authors and Affiliations

Authors

Contributions

JF and GYY wrote the manuscript text. JF and GYY reviewed the manuscript.

Corresponding author

Correspondence to Grace Y. Yi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

Full conditional distribution of hyperparameters

As stated in (3.4), the prior distribution for hyperparameters, \(\lambda _{\alpha _i}\), \(\lambda _{\beta _i}\), \(\lambda _{\gamma _i}\) and a are set as the half-Cauchy distribution. Here we express the conditional distribution of \(\lambda _{\gamma _i}\) only, and the conditional distribution of other hyperparameters can be derived using the same manner:

$$\begin{aligned} \begin{aligned} \pi (\lambda _{\alpha _i}| \alpha , \beta ,\gamma ,a)&=\pi (\lambda _{\alpha _i}| \alpha _i,a) \\&\propto \pi (\lambda _{\alpha _i}) \cdot \pi (\alpha _i| \lambda _{\alpha _i},a ) \\&\propto \frac{2}{\pi } \frac{1}{1+\lambda ^2_{\alpha _i}} \cdot \exp \left( -\frac{\alpha ^2_i}{2\lambda ^2_{\alpha _i}a^2}\right) . \end{aligned} \end{aligned}$$

\(\pi (\lambda _{\alpha _i}| \alpha , \beta ,\gamma ,a)\) The Slice-sampling algorithm (Polson et al. 2014) is used to generate \(\lambda _{\alpha _i}\) as in Sect. 3.3.

Full conditional distribution of \(\alpha ^{(r)}\)

As claimed in (3.3), the conditional distribution of \(\alpha ^{(r)}\) is

$$\begin{aligned}{} & {} \pi (\alpha ^{(r)}|w,\beta ^{(r)},\mathcal {B}_{-r},\{\mathbb {Y},x\} ) \propto \bigg \{ \prod ^{n}_{k=1} P(Y_k = y_k|\mathcal {B})\bigg \}\nonumber \\{} & {} \qquad f(w|\mathcal {B})\pi (\alpha ^{(r)}|\beta ^{(r)},\mathcal {B}_{-r})\nonumber \\{} & {} \quad \propto \prod ^{n}_{k=1}\bigg \{ \frac{\exp (\langle x_k, \mathcal {B}\rangle )^{y_k}}{1+\exp (\langle x_k,\mathcal {B}\rangle ) } \bigg \} \textrm{cosh}\bigg ( \frac{|\langle x_k, \mathcal {B}\rangle |}{2}\bigg ) \cdot \nonumber \\{} & {} \quad \quad \exp \bigg \{-\frac{(\langle x_k, \mathcal {B}\rangle )^2 w_k}{2}\bigg \} \pi (\alpha ^{(r)}|\lambda _{\alpha ^{(r)}},a)\nonumber \\{} & {} \quad = 2^{-n} \pi (\alpha ^{(r)}|\lambda _{\alpha ^{(r)}},a) \prod ^{n}_{k=1} \exp \bigg \{y_k (\langle x_k,\mathcal {B}\rangle )- \frac{\langle x_k, \mathcal {B}\rangle }{2}\nonumber \\{} & {} \quad \quad - \frac{(\langle x_k, \mathcal {B}\rangle )^2w_k}{2} \bigg \}\nonumber \\{} & {} \quad \propto \exp \bigg \{-\frac{1}{2}\alpha ^{(r)\intercal } \Sigma ^{-1}_{\alpha ^{(r)}} \alpha ^{(r)}+ \sum ^n_{k=1} \bigg (y_k-\frac{1}{2}\bigg ) \alpha ^{(r)\intercal }x_k \beta ^{(r)}\nonumber \\{} & {} \qquad -\frac{(\alpha ^{(r)\intercal } x_k \beta ^{(r)})^2}{2}w_k \nonumber \\{} & {} \quad \quad - \alpha ^{(r)\intercal } x_k \beta ^{(r)} \bigg ( \langle x_k, \mathcal {B}_{-r} \rangle \bigg ) w_k \bigg \}\nonumber \\{} & {} \quad = \exp \bigg [ -\frac{1}{2}\alpha ^{(r)\intercal } \Sigma ^{-1}_{\alpha ^{(r)}}\alpha ^{(r)} -\frac{1}{2} \alpha ^{(r)\intercal }x^{\intercal }_{\beta ^{(r)}} \Omega (w) x_{\beta ^{(r)}} \alpha ^{(r)} \nonumber \\{} & {} \quad \quad + x_{\beta ^{(r)}}\bigg \{y- \frac{1}{2}{} {\textbf {1}}_n - x_{\mathcal {B}_{-r}}(w) \bigg \}\alpha ^{(r)} \bigg ]\nonumber \\{} & {} \quad = \exp \bigg [ -\frac{1}{2} \alpha ^{(r)\intercal } \bigg \{x^{\intercal }_{\beta ^{(r)}} \Omega (w) x_{\beta ^{(r)}}+\Sigma ^{-1}_{\alpha ^{(r)}} \bigg \}\alpha ^{(r)}\nonumber \\{} & {} \quad + x_{\beta ^{(r)}}y(w)\alpha ^{(r)} \bigg ] \end{aligned}$$
(B.1)

where the third step is from the fact that \(\mathrm{cosh(u)=\frac{1+\exp (2u)}{2\exp (u)}}\), \(x_{\beta ^{(r)}} = (x_1 \beta ^{(r)},\ldots ,x_n \beta ^{(r)})^\intercal \), \(y=(y_1,\ldots ,y_n)^{\intercal }\), \(y(w) = y - \frac{1}{2}{} {\textbf {1}}_n-x_{\mathcal {B}_{-r}}(w) \), \(x_{\mathcal {B}_{-r}}(w) = \{( \langle x_1, \mathcal {B}_{-r} \rangle )w_1,\ldots , (\langle x_n,\mathcal {B}_{-r} \rangle )w_n\}^\intercal \), \({\textbf {1}}_n\) is an \(n \times 1\) unit vector, \(\Omega (w) = diag(w)\), and \(\Sigma _{\alpha ^{(r)}}=diag(\lambda ^2_{\alpha ^{(r)}}a^2)\). We observe that (B.1) is the kernel of a multivariate normal with mean \(m_{\alpha ^{(r)}}(w)\) and covariance \(\Sigma _{\alpha ^{(r)}}(w)\) such that

$$\begin{aligned}&m_{\alpha ^{(r)}}(w) = \Sigma _{\alpha ^{(r)}}(w)x_{\beta ^{(r)}}y(w), \\&\Sigma _{\alpha ^{(r)}}(w) = \bigg \{ x^\intercal _{\beta ^{(r)}} \Omega (w) x_{\beta ^{(r)}}+\Sigma ^{-1}_{\alpha ^{(r)}} \bigg \}^{-1}. \end{aligned}$$

Conditional distribution of \(\beta ^{(r)}\)

The conditional distribution of \(\beta ^{(r)}\) is

$$\begin{aligned} \begin{aligned}&\pi (\beta ^{(r)}|w,\alpha ,\{\mathbb {Y},x\}) \propto \bigg \{ \prod ^{n}_{k=1} P(Y_k = y_k|\mathcal {B})\bigg \}\\&\qquad f(w|\mathcal {B})\pi (\beta ^{(r)}|\alpha ^{(r)},\mathcal {B}_{-r})\\&\quad \propto \prod ^{n}_{k=1}\bigg \{ \frac{\exp (\langle x_k, \mathcal {B}\rangle )^{y_k}}{1+\exp (\langle x_k, \mathcal {B}\rangle ) } \bigg \} \textrm{cosh}\bigg ( \frac{|\langle x_k, \mathcal {B}\rangle |}{2}\bigg ) \cdot \\&\quad \quad \exp \bigg \{-\frac{(\langle x_k, \mathcal {B}\rangle )^2 w_k}{2}\bigg \} \pi (\beta ^{(r)}|\lambda _{\beta ^{(r)}},a) \\&\quad = 2^{-n} \pi (\beta ^{(r)}|\lambda _{\beta ^{(r)}},a) \prod ^{n}_{k=1} \exp \bigg \{y_k (\langle x_k, \mathcal {B}\rangle )- \frac{\langle x_k, \mathcal {B}\rangle }{2} \\&\quad \quad - \frac{(\langle x_k, \mathcal {B}\rangle )^2w_k}{2} \bigg \} \\&\quad \propto \exp \bigg \{-\frac{1}{2}\beta ^{(r)\intercal } \Sigma ^{-1}_{\beta ^{(r)}} \beta ^{(r)}+ \sum ^n_{k=1} \bigg (y_k-\frac{1}{2}\bigg ) \alpha ^{(r)\intercal } x_k \beta ^{(r)} \\&\qquad -\frac{(\alpha ^{(r)\intercal } x_k \beta ^{(r)})^2}{2}w_k \\&\quad \quad - \alpha ^{(r)\intercal } x_k \beta ^{(r)} \bigg ( \langle x_k, \mathcal {B}_{-r} \rangle \bigg ) w_k \bigg \} \\&\quad = \exp \bigg [ -\frac{1}{2}\beta ^{(r)\intercal }\Sigma ^{-1}_{\beta ^{(r)}}\beta ^{(r)} -\frac{1}{2} \beta ^{(r)\intercal } x{^\intercal }_{\alpha ^{(r)}} \Omega (w) x_{\alpha ^{(r)}} \beta ^{(r)} + \\&\quad \quad x_{\alpha ^{(r)}}\bigg \{y- \frac{1}{2}{} {\textbf {1}}_n - x_{\mathcal {B}_{-r}}(w) \bigg \}\beta ^{(r)} \bigg ]\\&\quad = \exp \bigg [ -\frac{1}{2} \beta ^{(r)\intercal } \bigg \{x{^\intercal }_{\alpha ^{(r)}} \Omega (w) x_{\alpha ^{(r)}}+\Sigma ^{-1}_{\beta ^{(r)}}\bigg \}\\&\qquad \beta ^{(r)} + x_{\alpha ^{(r)}}y(w)\beta ^{(r)} \bigg ] \end{aligned} \nonumber \\ \end{aligned}$$
(C.1)

where \(x_{\alpha ^{(r)}} = ( x^\intercal _1 \alpha ^{(r)},\ldots , x^\intercal _n\alpha ^{(r)} )^\intercal \), and \(\Sigma _{\beta ^{(r)}}=diag(\lambda ^2_{\beta ^{(r)}}a^2)\). We observe that (C.1) is the kernel of a multivariate normal with mean \(m_{\beta ^{(r)}}(w)\) and covariance \(\Sigma _{\beta ^{(r)}}(w)\) such that

$$\begin{aligned} \begin{aligned}&m_{\beta ^{(r)}}(w) = \Sigma _{\beta ^{(r)}}(w)x_{\alpha ^{(r)}}y(w) \ \ \text {and} \ \ \Sigma _{\beta ^{(r)}}(w) \\&\quad = \bigg \{ x^\intercal _{\alpha ^{(r)}} \Omega (w) x_{\alpha ^{(r)}}+\Sigma ^{-1}_{\beta ^{(r)}} \bigg \}^{-1}. \end{aligned} \end{aligned}$$

Derivation of the conditional distribution (4.2)

Noting the following equivalent statements:

$$\begin{aligned} ``H_{k} =1, {Y^{*}_{k}} ={y^{*}_{k}}''\iff ``H_{k} =1, Y_{k} ={y^{*}_{k}}'' \end{aligned}$$

and

$$\begin{aligned} ``H_{k} =0, {Y^{*}_{k}} ={y^{*}_{k}}'' \iff ``H_{k} =0, Y_{k} =1-{y^{*}_{k}}'', \end{aligned}$$

we have that

$$\begin{aligned} \begin{aligned}&P(H_{k}=1|Y^{*}_{k} = y^{*}_{k},x_{k})\\&\quad = \frac{P(H_{k}=1,Y^{*}_{k} = y^{*}_{k}|x_{k})}{P(H_{k}=1,Y^{*}_{k} = y^{*}_{k}|x_{k}) +P(H_{k}=0,Y^{*}_{k} = y^{*}_{k}|x_{k})} \\&\quad = \frac{P(H_{k}=1,Y_{k} = y^{*}_{k}|x_{k})}{P(H_{k}=1,Y_{k} = y^{*}_{k}|x_{k}) +P(H_{k}=0,Y_{k} =1- y^{*}_{k}|x_{k})} \\&\quad = \frac{P(H_{k}=1|Y_{k} = y^{*}_{k},x_{k})P(Y_{k} = y^{*}_{k}|x_{k})}{\begin{array}{c}P(H_{k}=1|Y_{k} = y^{*}_{k},x_{k})P(Y_{k} = y^{*}_{k}|x_{k})\\ +P(H_{k}=0|Y_{k} = 1-y^{*}_{k},x_{k})P(Y_{k} = 1-y^{*}_{k}|x_{k})\end{array}} \\&\quad = \frac{\rho (y^{*}_{k})p_{k}^{y^{*}_{k}}(1-p_{k})^{1-y^{*}_{k}}}{\rho (y^{*}_{k})p_{k}^{y^{*}_{k}}(1-p_{k})^{1-y^{*}_{k}} +(1-\rho (1-y^{*}_{k}))p_{k}^{1-y^{*}_{k}}(1-p_{k})^{y^{*}_{k}}} \end{aligned} \end{aligned}$$

That is, (4.2) holds.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fang, J., Yi, G.Y. Bayesian analysis for matrix-variate logistic regression with/without response misclassification. Stat Comput 33, 121 (2023). https://doi.org/10.1007/s11222-023-10286-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11222-023-10286-4

Keywords

Navigation