Bayesian variable selection and estimation in quantile regression using a quantile-specific prior

Dao, Mai; Wang, Min; Ghosh, Souparno; Ye, Keying

doi:10.1007/s00180-021-01181-5

Bayesian variable selection and estimation in quantile regression using a quantile-specific prior

Original paper
Published: 19 January 2022

Volume 37, pages 1339–1368, (2022)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Mai Dao¹,
Min Wang ORCID: orcid.org/0000-0002-9233-7844³,
Souparno Ghosh² &
…
Keying Ye³

990 Accesses
Explore all metrics

Abstract

Asymmetric Laplace (AL) specification has become one of the ideal statistical models for Bayesian quantile regression. In addition to fast convergence of Markov Chain Monte Carlo, AL specification guarantees posterior consistency under model misspecification. However, variable selection under such a specification is a daunting task because, realistically, prior specification of regression parameters should take the quantile levels into consideration. Quantile-specific g-prior has recently been developed for Bayesian variable selection in quantile regression, whereas it comes at a high price of the computational burden due to the intractability of the posterior distributions. In this paper, we develop a novel three-stage computational scheme for the foregoing quantile-specific g-prior, which starts with an expectation-maximization algorithm, followed by Gibbs sampler and ends with an importance re-weighting step that improves the accuracy of approximation. The performance of the proposed procedure is illustrated with simulations and a real-data application. Numerical results suggest that our procedure compares favorably with the Metropolis–Hastings algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian Analysis of Composite Quantile Regression

Article 06 July 2016

Bayesian joint-quantile regression

Article 15 June 2020

Bayesian variable selection in quantile regression using the Savage-Dickey density ratio

Article 02 March 2016

References

Adlouni SE, Salaou G, St-Hilaire A (2018) Regularized Bayesian quantile regression. Commun Stat Simul Comput 47:277–293
Article MathSciNet Google Scholar
Alhamzawi R (2020) Brq: an r package for Bayesian quantile regression. METRON 78:313–328
Article MathSciNet Google Scholar
Alhamzawi R, Algamal ZY (2019) Bayesian bridge quantile regression. Commun Stat Simul Comput 48:944–956
Article MathSciNet Google Scholar
Alhamzawi R, Yu K (2013) Conjugate priors and variable selection for Bayesian quantile regression. Comput Stat Data Anal 64:209–219
Article MathSciNet Google Scholar
Andrews DF, Mallows CL (1974) Scale mixtures of normal distributions. J R Stat Soc Ser B (Methodol) 36:99–102
MathSciNet MATH Google Scholar
Barbieri MM, Berger JO (2004) Optimal predictive model selection. Ann Stat 32:870–897
Article MathSciNet Google Scholar
Benoit DF, Alhamzawi R, Yu K (2013) Bayesian lasso binary quantile regression. Comput Stat 28:2861–2873
Article MathSciNet Google Scholar
Bivand R, Wong DWS (2018) Comparing implementations of global and local indicators of spatial association. Test 27:716–748
Article MathSciNet Google Scholar
Chen CW, Dunson DB, Reed C, Yu K (2013) Bayesian variable selection in quantile regression. Stat Interface 6:261–274
Article MathSciNet Google Scholar
Chen R-B, Chu C-H, Lai T-Y, Wu YN (2011) Stochastic matching pursuit for Bayesian variable selection. Stat Comput 21:247–259
Article MathSciNet Google Scholar
Fasiolo M, Wood SN, Zaffran M, Nedellec R, Goude Y (2020) Fast calibrated additive quantile regression. J Am Stat Assoc 1–11
George EI, McCulloch RE (1997) Approaches for Bayesian variable selection. Stat Sinica 339–373
Geraci M, Bottai M (2007) Quantile regression for longitudinal data using the asymmetric Laplace distribution. Biostatistics 8:140–154
Article Google Scholar
Harrison D, Rubinfeld DL (1978) Hedonic housing prices and the demand for clean air. J Environ Econ Manag 5:81–102
Article Google Scholar
Hofner B, Mayr A, Robinzonov N, Schmid M (2014) Model-based boosting in R: a hands-on tutorial using the R package mboost. Comput Stat 29:3–35
Article MathSciNet Google Scholar
Jin R, Tan A (2021) Fast Markov chain Monte Carlo for high-dimensional Bayesian regression models with shrinkage priors. J Comput Graph Stat 0:1–15
Koenker R (2019) Quantreg: quantile regression. R Package Version 5:51
Google Scholar
Koenker R, Bassett G Jr (1978) Regression quantiles. J Econom Soc 46:33–50
Article MathSciNet Google Scholar
Kozubowski TJ, Podgórski K (2001) Asymmetric Laplace laws and modeling financial data. Math Comput Model 34:1003–1021
Article MathSciNet Google Scholar
Lee KE, Sha N, Dougherty ER, Vannucci M, Mallick BK (2003) Gene selection: a Bayesian variable selection approach. Bioinformatics 19:90–97
Article Google Scholar
Li Q, Xi R, Lin N (2010) Bayesian regularized quantile regression. Bayesian Anal 5:533–556
MathSciNet MATH Google Scholar
Liang F, Paulo R, Molina G, Clyde MA, Berger JO (2008) Mixtures of $g$ priors for Bayesian variable selection. J. Am. Stat. Assoc. 103:410–423
Article MathSciNet Google Scholar
Raftery AE, Lewis SM (1992) [Practical Markov Chain Monte Carlo]: Comment: One long run with diagnostics: implementation strategies for Markov chain Monte Carlo. Stat. Sci. 7:493–497
Google Scholar
Reed C, Yu K (2009) A partially Collapsed Gibbs Sampler for Bayesian Quantile Regression. Department of Mathematical Sciences, Brunel University, Technical Report
Rossell D, Rubio FJ (2018) Tractable Bayesian variable selection: beyond normality. J Am Stat Assoc 113:1742–1758
Article MathSciNet Google Scholar
Smith M, Kohn R (1996) Nonparametric regression using Bayesian variable selection. J Econom 75:317–343
Article Google Scholar
Tian Y, Tang M, Wang L, Tian M (2019) Bayesian bridge-randomized penalized quantile regression estimation for linear regression model with AP ($q$) perturbation. J Stat Comput Simul 89:2951–2979
Article MathSciNet Google Scholar
Tian Y, Tian M, Zhu Q (2014) Linear quantile regression based on EM algorithm. Commun Stat Theory Methods 43:3464–3484
Article MathSciNet Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B (Methodol) 58:267–288
MathSciNet MATH Google Scholar
Tierney L, Kadane JB (1986) Accurate approximations for posterior moments and marginal densities. J Am Stat Assoc 81:82–86
Article MathSciNet Google Scholar
Vehtari A, Gelman A, Simpson D, Carpenter B, Bürkner P-C (2019) Rank-normalization, folding, and localization: an improved $\widehat{R}$ for assessing convergence of MCMC. arXiv:1903.08008
Xi R, Li Y, Hu Y et al (2016) Bayesian quantile regression based on the empirical likelihood with spike and slab priors. Bayesian Anal 11:821–855
Article MathSciNet Google Scholar
Yu K, Chen CW, Reed C, Dunson D (2013) Bayesian variable selection in quantile regression. Stat Interface 6:261–274
Article MathSciNet Google Scholar
Yu K, Moyeed RA (2001) Bayesian quantile regression. Stat Probab Lett 54:437–447
Article MathSciNet Google Scholar
Zellner A (1983) Applications of Bayesian analysis in econometrics. J R Stat Soc Ser D (Stat) 32:23–34
Google Scholar
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol) 67:301–320
Article MathSciNet Google Scholar

Download references

Acknowledgements

We greatly appreciate the two reviewers and an editor for their thoughtful comments and suggestions, which have improved the quality of the manuscript. This work was based on the first author’s dissertation research which was supervised by the corresponding author. The work of Drs. Keying Ye and Min Wang were partially supported by the Internal Research Awards (INTRA) program from the UTSA Vice President for Research, Economic Development, and Knowledge Enterprise at the University of Texas at San Antonio.

Author information

Authors and Affiliations

Department of Mathematics, Statistics, and Physics, Wichita State University, Wichita, KS, 67260, USA
Mai Dao
Department of Statistics, The University of Nebraska-Lincoln, Lincoln, NE, 68588, USA
Souparno Ghosh
Department of Management Science and Statistics, The University of Texas at San Antonio, San Antonio, TX, 78249-0634, USA
Min Wang & Keying Ye

Authors

Mai Dao
View author publications
You can also search for this author inPubMed Google Scholar
Min Wang
View author publications
You can also search for this author inPubMed Google Scholar
Souparno Ghosh
View author publications
You can also search for this author inPubMed Google Scholar
Keying Ye
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Min Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: The EM algorithm

Consider the linear regression model in (1.1) with the ALD errors, i.e., $\epsilon \sim \mathrm {ALD}(0, \sigma , p)$. To implement an EM algorithm to estimate the maximum likelihood estimates (MLEs) of $\varvec{\beta }$ and $\sigma $, we may treat $\varvec{\nu }$ as the missing data. Based on the non-informative prior $\pi (\varvec{\beta }, \sigma ) \propto 1/\sigma $ and the likelihood function of the complete data $(\mathbf{{y}}, \varvec{\nu })$ in (2.1), the complete posterior distribution of $(\varvec{\beta }, \sigma )$ is given by

$$\begin{aligned}&f(\varvec{\beta }, \sigma \mid \varvec{\nu }, \mathbf{{y}}) \propto \sigma ^{-(3n + 2)/2} \Bigg (\prod _{i = 1}^{n} \nu _i^{-1/2}\Bigg ) \\&\quad \times \exp \Bigg \{-\frac{1}{4 \sigma } \bigg [\sum _{i = 1}^{n} \frac{(y_i - \mathbf{{x}}'_i \varvec{\beta }- \xi \nu _i)^2}{\nu _i} + 4p (1 - p) \sum _{i = 1}^{n} \nu _i \bigg ]\Bigg \}. \end{aligned}$$

The log-likelihood of the complete data can be written as

$$\begin{aligned}&\mathrm {L} (\varvec{\beta }, \sigma \mid \varvec{\nu }, \mathbf{{y}}) \propto -\frac{3n + 2}{2} \mathrm {log} \sigma - \frac{1}{2} \sum _{i = 1}^{n} \mathrm {log} \nu _i \\&\quad - \frac{1}{4 \sigma } \sum _{i = 1}^{n} \frac{(y_i - \mathbf{{x}}'_i \varvec{\beta }- \xi \nu _i)^2}{\nu _i} - \zeta \sum _{i = 1}^{n} \nu _i. \end{aligned}$$

By following the work of Tian et al. (2014), we let $\varvec{\beta }^{(0)}$ and $\sigma ^{(0)}$ be the initial values taken from the OLS estimates. We then obtain the MLEs of the model parameters based on the EM algorithm via the following two steps:

1.
E step: Given the $(t - 1)$-th iteration values $\varvec{\beta }^{(t - 1)}$ and $\sigma ^{(t - 1)}$, we obtain the mathematical expectation of the complete data, also known as the Q function of the (t)-th iteration, given by
$$\begin{aligned} Q(\varvec{\beta }, \sigma \mid \varvec{\nu }, \mathbf{{y}})&= \mathrm {E} \Big (\mathrm {L} (\varvec{\beta }, \sigma \mid \mathbf{{y}}, \varvec{\nu }) \mid \mathbf{{y}}, \varvec{\beta }^{(t - 1)}, \sigma ^{(t - 1)}\Big ) \\&\propto -\frac{3n + 2}{2} \mathrm {log} \sigma - \frac{1}{2} \sum _{i = 1}^{n} \mathrm {E} \Big (\mathrm {log} \nu _i \mid \mathbf{{y}}, \varvec{\beta }^{(t - 1)}, \sigma ^{(t - 1)}\Big ) \\&\quad - \frac{1}{4 \sigma } \sum _{i = 1}^{n} (y_i - \mathbf{{x}}'_i \varvec{\beta })^2 \mathrm {E} \Big (\nu _i^{-1} \mid \mathbf{{y}}, \varvec{\beta }^{(t - 1)}, \sigma ^{(t - 1)}\Big ) \\&\quad + \frac{\xi }{2 \sigma } \sum _{i = 1}^{n} (y_i - \mathbf{{x}}'_i \varvec{\beta }) \\&\quad - \bigg (\frac{\xi ^2}{4 \sigma } + \zeta \bigg ) \sum _{i = 1}^{n} \mathrm {E} \Big (\nu _i \mid \mathbf{{y}}, \varvec{\beta }^{(t - 1)}, \sigma ^{(t - 1)}\Big )\\&\propto -\frac{3n + 2}{2} \mathrm {log} \sigma + \frac{\xi }{2 \sigma } \sum _{i = 1}^{n} (y_i - \mathbf{{x}}'_i \varvec{\beta }) - \frac{1}{2} \sum _{i = 1}^{n} \Delta 1_i \\&\quad - \frac{1}{4 \sigma } \sum _{i = 1}^{n} \Delta 2_i \\&\quad - \frac{1}{4 \sigma } \sum _{i = 1}^{n} (y_i - \mathbf{{x}}'_i \varvec{\beta })^2 \Delta 3_i, \end{aligned}$$
where
$$\begin{aligned} \Delta 1_i&= \mathrm {E} \Big (\mathrm {log} \nu _i \mid \varvec{\beta }^{(t - 1)}, \sigma ^{(t - 1)}, \mathbf{{y}}\Big ), \\ \Delta 2_i&= \mathrm {E} \Big (\nu _i \mid \varvec{\beta }^{(t - 1)}, \sigma ^{(t - 1)}, \mathbf{{y}}\Big ), \\ \Delta 3_i&= \mathrm {E} \Big (\nu _i^{-1} \mid \varvec{\beta }^{(t - 1)}, \sigma ^{(t - 1)}, \mathbf{{y}}\Big ). \end{aligned}$$
To compute $\Delta 1_i$, $\Delta 2_i$, and $\Delta 3_i$ for $i = 1, \cdots , n$, we first obtain the conditional pdf $f(\varvec{\nu }\mid \varvec{\beta }, \sigma , \mathbf{{y}})$. We observe that
$$\begin{aligned} f(\varvec{\nu }\mid \mathbf{{y}})&\propto \Bigg (\prod _{i = 1}^{n} \nu _i^{-1/2}\Bigg ) \exp \Bigg \{-\frac{1}{4 \sigma } \bigg [\sum _{i = 1}^{n} \frac{(y_i - \mathbf{{x}}'_i \varvec{\beta }- \xi \nu _i)^2}{\nu _i} \\&\quad + 4p (1 - p) \sum _{i = 1}^{n} \nu _i \bigg ]\Bigg \} \\&\propto \Bigg (\prod _{i = 1}^{n} \nu _i^{-1/2}\Bigg ) \exp \Bigg \{-\frac{1}{2} \sum _{i = 1}^{n} \bigg [\frac{(y_i - \mathbf{{x}}'_i \varvec{\beta })^2}{2 \sigma } \nu _i^{-1} + \frac{1}{2 \sigma } \nu _i \bigg ]\Bigg \}. \end{aligned}$$
Since $\nu _i$’s are independent of each other for $i = 1, \cdots , n$, the marginal posterior distribution of $\nu _i$ given $y_i$ is given by
$$\begin{aligned} f(\nu _i \mid y_i) \propto \nu _i^{-1/2} \exp \Bigg \{-\frac{1}{2} \bigg [\frac{(y_i - \mathbf{{x}}'_i \varvec{\beta })^2}{2 \sigma } \nu _i^{-1} + \frac{1}{2 \sigma } \nu _i \bigg ]\Bigg \}, \end{aligned}$$
which follows a GIG distribution, denoted by $\mathrm {GIG} \Bigl (1/2, (y_i - \mathbf{{x}}'_i \varvec{\beta })^2/(2 \sigma ), 1/(2 \sigma ) \Bigr )$. Here, if $x \sim \mathrm {GIG}(\lambda , \chi , \psi )$, then the pdf of x is given by
$$\begin{aligned} f(x) = \frac{\chi ^{-\lambda }(\sqrt{\chi \psi })^\lambda }{2 K_\lambda (\sqrt{\chi \psi })} x^{\lambda - 1} \exp \Bigg \{-\frac{1}{2} \Big (\chi x^{-1} + \psi x\Big ) \Bigg \}, \quad x > 0, \end{aligned}$$
where $K_\lambda (\cdot )$ denotes a modified Bessel function of the third kind with an index $\lambda $ and the parameters such that $\chi > 0$, and $\psi \ge 0$ if $\lambda < 0$. When $\chi > 0$, $\psi > 0$, and $\alpha \in R$, we have
$$\begin{aligned} \mathrm {E}(X^\alpha ) = \bigg (\frac{\chi }{\psi }\bigg )^{\alpha /2} \frac{K_{\lambda + \alpha }(\sqrt{\chi \psi })}{K_{\lambda }(\sqrt{\chi \psi })} \quad \mathrm {and} \quad \mathrm {E}(\mathrm {log} X) = \frac{d\mathrm {E}(X^\alpha )}{d\alpha } \Bigr |_{\alpha = 0}. \end{aligned}$$
Therefore, we obtain the following results given by
$$\begin{aligned} \Delta 1_i&= \mathrm {E} \Big (\mathrm {log} \nu _i \mid \varvec{\beta }^{(t - 1)}, \sigma ^{(t - 1)} , \mathbf{{y}}\Big ) = \frac{d\mathrm {E} \big (\nu _i^\alpha \mid \mathbf{{y}}, \varvec{\beta }^{(t - 1)}, \sigma ^{(t - 1)} \big )}{d\alpha } \Bigr |_{\alpha = 0}, \\ \Delta 2_i&= \mathrm {E} \Big (\nu _i \mid \varvec{\beta }^{(t - 1)}, \sigma ^{(t - 1)}, \mathbf{{y}}\Big ) = \big \vert y_i - \mathbf{{x}}'_i \varvec{\beta }^{(t - 1)} \big \vert + 2 \sigma ^{(t - 1)}, \\ \Delta 3_i&= \mathrm {E} \Big (\nu _i^{-1} \mid \varvec{\beta }^{(t - 1)}, \sigma ^{(t - 1)}, \mathbf{{y}}\Big ) = \frac{1}{\big \vert y_i - \mathbf{{x}}'_i \varvec{\beta }^{(t - 1)} \big \vert }. \end{aligned}$$
2.
M step: We update the $\varvec{\beta }^{(t)}$ and $\sigma ^{(t)}$ values by maximizing the Q function $Q(\varvec{\beta }, \sigma \mid \varvec{\beta }^{(t - 1)}, \sigma ^{(t - 1)})$. We first take the derivative with respect to $\varvec{\beta }$ to obtain the MLE of $\varvec{\beta }$ given by
$$\begin{aligned} \frac{\partial Q}{\partial \varvec{\beta }} = -\frac{\xi }{2 \sigma } \sum _{i = 1}^{n} \mathbf{{x}}_i + \frac{1}{2 \sigma } \sum _{i = 1}^{n} \Delta 3_i (y_i - \mathbf{{x}}'_i \varvec{\beta }) \mathbf{{x}}_i {\mathop {=}\limits ^{set}} 0, \end{aligned}$$
which indicates that
$$\begin{aligned} \sum _{i = 1}^{n} \Delta 3_i \mathbf{{x}}_i \mathbf{{x}}'_i \varvec{\beta }= \sum _{i = 1}^{n} \Delta 3_i \mathbf{{x}}_i y_i - \xi \sum _{i = 1}^{n} \mathbf{{x}}_i. \end{aligned}$$
Letting $\varvec{\nabla 3} = (1/\Delta 3_1, \cdots , 1/\Delta 3_n)',$ $\mathbf{{W}}^{(t - 1)} = \mathrm {diag} (\Delta 3_1, \cdots , \Delta 3_n)$, and $\varvec{1}_n = (1, \cdots , 1)'$, we can rewrite (2) in a matrix form as
$$\begin{aligned} \mathbf{{X}}' \mathbf{{W}}^{(t - 1)} \mathbf{{X}}\varvec{\beta }= \mathbf{{X}}' \mathbf{{W}}^{(t - 1)} \mathbf{{y}}- \xi \mathbf{{X}}' \varvec{1}_n. \end{aligned}$$
Thus, we obtain an estimate of $\varvec{\beta }$ given by
$$\begin{aligned} {\hat{\varvec{\beta }}}^{(t)} = \Big ( \mathbf{{X}}' \mathbf{{W}}^{(t - 1)} \mathbf{{X}}\Big )^{-1} \mathbf{{X}}' \mathbf{{W}}^{(t - 1)} (\mathbf{{y}}- \xi \varvec{\nabla 3}). \end{aligned}$$
(6.1)
By taking the derivative with respect to $\sigma $ to obtain the MLE expression of $\sigma $, we have
$$\begin{aligned} \frac{\partial Q}{\partial \sigma }&= -\frac{3n + 2}{2 \sigma } - \frac{\xi }{2 \sigma ^2} \sum _{i = 1}^{n} (y_i - \mathbf{{x}}'_i \varvec{\beta }) + \frac{1}{4 \sigma ^2} \sum _{i = 1}^{n} \Delta 2_i \\&\quad + \frac{1}{4 \sigma ^2} \sum _{i = 1}^{n} \Delta 3_i (y_i - \mathbf{{x}}'_i \varvec{\beta })^2 {\mathop {=}\limits ^{set}} 0. \end{aligned}$$
Thus, we obtain an estimate of $\sigma $ give by
$$\begin{aligned} {\hat{\sigma }}^{(t)}= & {} \frac{1}{2(3n + 2)}\nonumber \\&\left[ \sum _{i = 1}^{n} \Delta 2_i + \sum _{i = 1}^{n} \Delta 3_i (y_i - \mathbf{{x}}'_i \varvec{\beta }^{(t)})^2 - 2 \xi \sum _{i = 1}^{n} (y_i - \mathbf{{x}}'_i \varvec{\beta }^{(t)})\right] . \end{aligned}$$
(6.2)

We repeatedly run the EM algorithm until a suitable convergence rule has been satisfied; i.e. $\Vert \varvec{\beta }^{(t)} - \varvec{\beta }^{(t-1)}\Vert $ and $\Vert \sigma ^{(t)} - \sigma ^{(t-1)}\Vert $ are sufficiently small, for example. Then we set the final iterative values of the EM algorithm as the posterior modes, denoted by ${\tilde{\varvec{\beta }}}$ and ${{\tilde{\sigma }}}$, for $\varvec{\beta }$ and $\sigma $, respectively.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dao, M., Wang, M., Ghosh, S. et al. Bayesian variable selection and estimation in quantile regression using a quantile-specific prior. Comput Stat 37, 1339–1368 (2022). https://doi.org/10.1007/s00180-021-01181-5

Download citation

Received: 01 June 2021
Accepted: 15 November 2021
Published: 19 January 2022
Issue Date: July 2022
DOI: https://doi.org/10.1007/s00180-021-01181-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian variable selection and estimation in quantile regression using a quantile-specific prior

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Bayesian Analysis of Composite Quantile Regression

Bayesian joint-quantile regression

Bayesian variable selection in quantile regression using the Savage-Dickey density ratio

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: The EM algorithm

Appendix: The EM algorithm

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now