Abstract
Asymmetric Laplace (AL) specification has become one of the ideal statistical models for Bayesian quantile regression. In addition to fast convergence of Markov Chain Monte Carlo, AL specification guarantees posterior consistency under model misspecification. However, variable selection under such a specification is a daunting task because, realistically, prior specification of regression parameters should take the quantile levels into consideration. Quantile-specific g-prior has recently been developed for Bayesian variable selection in quantile regression, whereas it comes at a high price of the computational burden due to the intractability of the posterior distributions. In this paper, we develop a novel three-stage computational scheme for the foregoing quantile-specific g-prior, which starts with an expectation-maximization algorithm, followed by Gibbs sampler and ends with an importance re-weighting step that improves the accuracy of approximation. The performance of the proposed procedure is illustrated with simulations and a real-data application. Numerical results suggest that our procedure compares favorably with the Metropolis–Hastings algorithm.





Similar content being viewed by others
References
Adlouni SE, Salaou G, St-Hilaire A (2018) Regularized Bayesian quantile regression. Commun Stat Simul Comput 47:277–293
Alhamzawi R (2020) Brq: an r package for Bayesian quantile regression. METRON 78:313–328
Alhamzawi R, Algamal ZY (2019) Bayesian bridge quantile regression. Commun Stat Simul Comput 48:944–956
Alhamzawi R, Yu K (2013) Conjugate priors and variable selection for Bayesian quantile regression. Comput Stat Data Anal 64:209–219
Andrews DF, Mallows CL (1974) Scale mixtures of normal distributions. J R Stat Soc Ser B (Methodol) 36:99–102
Barbieri MM, Berger JO (2004) Optimal predictive model selection. Ann Stat 32:870–897
Benoit DF, Alhamzawi R, Yu K (2013) Bayesian lasso binary quantile regression. Comput Stat 28:2861–2873
Bivand R, Wong DWS (2018) Comparing implementations of global and local indicators of spatial association. Test 27:716–748
Chen CW, Dunson DB, Reed C, Yu K (2013) Bayesian variable selection in quantile regression. Stat Interface 6:261–274
Chen R-B, Chu C-H, Lai T-Y, Wu YN (2011) Stochastic matching pursuit for Bayesian variable selection. Stat Comput 21:247–259
Fasiolo M, Wood SN, Zaffran M, Nedellec R, Goude Y (2020) Fast calibrated additive quantile regression. J Am Stat Assoc 1–11
George EI, McCulloch RE (1997) Approaches for Bayesian variable selection. Stat Sinica 339–373
Geraci M, Bottai M (2007) Quantile regression for longitudinal data using the asymmetric Laplace distribution. Biostatistics 8:140–154
Harrison D, Rubinfeld DL (1978) Hedonic housing prices and the demand for clean air. J Environ Econ Manag 5:81–102
Hofner B, Mayr A, Robinzonov N, Schmid M (2014) Model-based boosting in R: a hands-on tutorial using the R package mboost. Comput Stat 29:3–35
Jin R, Tan A (2021) Fast Markov chain Monte Carlo for high-dimensional Bayesian regression models with shrinkage priors. J Comput Graph Stat 0:1–15
Koenker R (2019) Quantreg: quantile regression. R Package Version 5:51
Koenker R, Bassett G Jr (1978) Regression quantiles. J Econom Soc 46:33–50
Kozubowski TJ, Podgórski K (2001) Asymmetric Laplace laws and modeling financial data. Math Comput Model 34:1003–1021
Lee KE, Sha N, Dougherty ER, Vannucci M, Mallick BK (2003) Gene selection: a Bayesian variable selection approach. Bioinformatics 19:90–97
Li Q, Xi R, Lin N (2010) Bayesian regularized quantile regression. Bayesian Anal 5:533–556
Liang F, Paulo R, Molina G, Clyde MA, Berger JO (2008) Mixtures of \(g\) priors for Bayesian variable selection. J. Am. Stat. Assoc. 103:410–423
Raftery AE, Lewis SM (1992) [Practical Markov Chain Monte Carlo]: Comment: One long run with diagnostics: implementation strategies for Markov chain Monte Carlo. Stat. Sci. 7:493–497
Reed C, Yu K (2009) A partially Collapsed Gibbs Sampler for Bayesian Quantile Regression. Department of Mathematical Sciences, Brunel University, Technical Report
Rossell D, Rubio FJ (2018) Tractable Bayesian variable selection: beyond normality. J Am Stat Assoc 113:1742–1758
Smith M, Kohn R (1996) Nonparametric regression using Bayesian variable selection. J Econom 75:317–343
Tian Y, Tang M, Wang L, Tian M (2019) Bayesian bridge-randomized penalized quantile regression estimation for linear regression model with AP (\(q\)) perturbation. J Stat Comput Simul 89:2951–2979
Tian Y, Tian M, Zhu Q (2014) Linear quantile regression based on EM algorithm. Commun Stat Theory Methods 43:3464–3484
Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B (Methodol) 58:267–288
Tierney L, Kadane JB (1986) Accurate approximations for posterior moments and marginal densities. J Am Stat Assoc 81:82–86
Vehtari A, Gelman A, Simpson D, Carpenter B, Bürkner P-C (2019) Rank-normalization, folding, and localization: an improved \(\widehat{R}\) for assessing convergence of MCMC. arXiv:1903.08008
Xi R, Li Y, Hu Y et al (2016) Bayesian quantile regression based on the empirical likelihood with spike and slab priors. Bayesian Anal 11:821–855
Yu K, Chen CW, Reed C, Dunson D (2013) Bayesian variable selection in quantile regression. Stat Interface 6:261–274
Yu K, Moyeed RA (2001) Bayesian quantile regression. Stat Probab Lett 54:437–447
Zellner A (1983) Applications of Bayesian analysis in econometrics. J R Stat Soc Ser D (Stat) 32:23–34
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol) 67:301–320
Acknowledgements
We greatly appreciate the two reviewers and an editor for their thoughtful comments and suggestions, which have improved the quality of the manuscript. This work was based on the first author’s dissertation research which was supervised by the corresponding author. The work of Drs. Keying Ye and Min Wang were partially supported by the Internal Research Awards (INTRA) program from the UTSA Vice President for Research, Economic Development, and Knowledge Enterprise at the University of Texas at San Antonio.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: The EM algorithm
Appendix: The EM algorithm
Consider the linear regression model in (1.1) with the ALD errors, i.e., \(\epsilon \sim \mathrm {ALD}(0, \sigma , p)\). To implement an EM algorithm to estimate the maximum likelihood estimates (MLEs) of \(\varvec{\beta }\) and \(\sigma \), we may treat \(\varvec{\nu }\) as the missing data. Based on the non-informative prior \(\pi (\varvec{\beta }, \sigma ) \propto 1/\sigma \) and the likelihood function of the complete data \((\mathbf{{y}}, \varvec{\nu })\) in (2.1), the complete posterior distribution of \((\varvec{\beta }, \sigma )\) is given by
The log-likelihood of the complete data can be written as
By following the work of Tian et al. (2014), we let \(\varvec{\beta }^{(0)}\) and \(\sigma ^{(0)}\) be the initial values taken from the OLS estimates. We then obtain the MLEs of the model parameters based on the EM algorithm via the following two steps:
-
1.
E step: Given the \((t - 1)\)-th iteration values \(\varvec{\beta }^{(t - 1)}\) and \(\sigma ^{(t - 1)}\), we obtain the mathematical expectation of the complete data, also known as the Q function of the (t)-th iteration, given by
$$\begin{aligned} Q(\varvec{\beta }, \sigma \mid \varvec{\nu }, \mathbf{{y}})&= \mathrm {E} \Big (\mathrm {L} (\varvec{\beta }, \sigma \mid \mathbf{{y}}, \varvec{\nu }) \mid \mathbf{{y}}, \varvec{\beta }^{(t - 1)}, \sigma ^{(t - 1)}\Big ) \\&\propto -\frac{3n + 2}{2} \mathrm {log} \sigma - \frac{1}{2} \sum _{i = 1}^{n} \mathrm {E} \Big (\mathrm {log} \nu _i \mid \mathbf{{y}}, \varvec{\beta }^{(t - 1)}, \sigma ^{(t - 1)}\Big ) \\&\quad - \frac{1}{4 \sigma } \sum _{i = 1}^{n} (y_i - \mathbf{{x}}'_i \varvec{\beta })^2 \mathrm {E} \Big (\nu _i^{-1} \mid \mathbf{{y}}, \varvec{\beta }^{(t - 1)}, \sigma ^{(t - 1)}\Big ) \\&\quad + \frac{\xi }{2 \sigma } \sum _{i = 1}^{n} (y_i - \mathbf{{x}}'_i \varvec{\beta }) \\&\quad - \bigg (\frac{\xi ^2}{4 \sigma } + \zeta \bigg ) \sum _{i = 1}^{n} \mathrm {E} \Big (\nu _i \mid \mathbf{{y}}, \varvec{\beta }^{(t - 1)}, \sigma ^{(t - 1)}\Big )\\&\propto -\frac{3n + 2}{2} \mathrm {log} \sigma + \frac{\xi }{2 \sigma } \sum _{i = 1}^{n} (y_i - \mathbf{{x}}'_i \varvec{\beta }) - \frac{1}{2} \sum _{i = 1}^{n} \Delta 1_i \\&\quad - \frac{1}{4 \sigma } \sum _{i = 1}^{n} \Delta 2_i \\&\quad - \frac{1}{4 \sigma } \sum _{i = 1}^{n} (y_i - \mathbf{{x}}'_i \varvec{\beta })^2 \Delta 3_i, \end{aligned}$$where
$$\begin{aligned} \Delta 1_i&= \mathrm {E} \Big (\mathrm {log} \nu _i \mid \varvec{\beta }^{(t - 1)}, \sigma ^{(t - 1)}, \mathbf{{y}}\Big ), \\ \Delta 2_i&= \mathrm {E} \Big (\nu _i \mid \varvec{\beta }^{(t - 1)}, \sigma ^{(t - 1)}, \mathbf{{y}}\Big ), \\ \Delta 3_i&= \mathrm {E} \Big (\nu _i^{-1} \mid \varvec{\beta }^{(t - 1)}, \sigma ^{(t - 1)}, \mathbf{{y}}\Big ). \end{aligned}$$To compute \(\Delta 1_i\), \(\Delta 2_i\), and \(\Delta 3_i\) for \(i = 1, \cdots , n\), we first obtain the conditional pdf \(f(\varvec{\nu }\mid \varvec{\beta }, \sigma , \mathbf{{y}})\). We observe that
$$\begin{aligned} f(\varvec{\nu }\mid \mathbf{{y}})&\propto \Bigg (\prod _{i = 1}^{n} \nu _i^{-1/2}\Bigg ) \exp \Bigg \{-\frac{1}{4 \sigma } \bigg [\sum _{i = 1}^{n} \frac{(y_i - \mathbf{{x}}'_i \varvec{\beta }- \xi \nu _i)^2}{\nu _i} \\&\quad + 4p (1 - p) \sum _{i = 1}^{n} \nu _i \bigg ]\Bigg \} \\&\propto \Bigg (\prod _{i = 1}^{n} \nu _i^{-1/2}\Bigg ) \exp \Bigg \{-\frac{1}{2} \sum _{i = 1}^{n} \bigg [\frac{(y_i - \mathbf{{x}}'_i \varvec{\beta })^2}{2 \sigma } \nu _i^{-1} + \frac{1}{2 \sigma } \nu _i \bigg ]\Bigg \}. \end{aligned}$$Since \(\nu _i\)’s are independent of each other for \(i = 1, \cdots , n\), the marginal posterior distribution of \(\nu _i\) given \(y_i\) is given by
$$\begin{aligned} f(\nu _i \mid y_i) \propto \nu _i^{-1/2} \exp \Bigg \{-\frac{1}{2} \bigg [\frac{(y_i - \mathbf{{x}}'_i \varvec{\beta })^2}{2 \sigma } \nu _i^{-1} + \frac{1}{2 \sigma } \nu _i \bigg ]\Bigg \}, \end{aligned}$$which follows a GIG distribution, denoted by \(\mathrm {GIG} \Bigl (1/2, (y_i - \mathbf{{x}}'_i \varvec{\beta })^2/(2 \sigma ), 1/(2 \sigma ) \Bigr )\). Here, if \(x \sim \mathrm {GIG}(\lambda , \chi , \psi )\), then the pdf of x is given by
$$\begin{aligned} f(x) = \frac{\chi ^{-\lambda }(\sqrt{\chi \psi })^\lambda }{2 K_\lambda (\sqrt{\chi \psi })} x^{\lambda - 1} \exp \Bigg \{-\frac{1}{2} \Big (\chi x^{-1} + \psi x\Big ) \Bigg \}, \quad x > 0, \end{aligned}$$where \(K_\lambda (\cdot )\) denotes a modified Bessel function of the third kind with an index \(\lambda \) and the parameters such that \(\chi > 0\), and \(\psi \ge 0\) if \(\lambda < 0\). When \(\chi > 0\), \(\psi > 0\), and \(\alpha \in R\), we have
$$\begin{aligned} \mathrm {E}(X^\alpha ) = \bigg (\frac{\chi }{\psi }\bigg )^{\alpha /2} \frac{K_{\lambda + \alpha }(\sqrt{\chi \psi })}{K_{\lambda }(\sqrt{\chi \psi })} \quad \mathrm {and} \quad \mathrm {E}(\mathrm {log} X) = \frac{d\mathrm {E}(X^\alpha )}{d\alpha } \Bigr |_{\alpha = 0}. \end{aligned}$$Therefore, we obtain the following results given by
$$\begin{aligned} \Delta 1_i&= \mathrm {E} \Big (\mathrm {log} \nu _i \mid \varvec{\beta }^{(t - 1)}, \sigma ^{(t - 1)} , \mathbf{{y}}\Big ) = \frac{d\mathrm {E} \big (\nu _i^\alpha \mid \mathbf{{y}}, \varvec{\beta }^{(t - 1)}, \sigma ^{(t - 1)} \big )}{d\alpha } \Bigr |_{\alpha = 0}, \\ \Delta 2_i&= \mathrm {E} \Big (\nu _i \mid \varvec{\beta }^{(t - 1)}, \sigma ^{(t - 1)}, \mathbf{{y}}\Big ) = \big \vert y_i - \mathbf{{x}}'_i \varvec{\beta }^{(t - 1)} \big \vert + 2 \sigma ^{(t - 1)}, \\ \Delta 3_i&= \mathrm {E} \Big (\nu _i^{-1} \mid \varvec{\beta }^{(t - 1)}, \sigma ^{(t - 1)}, \mathbf{{y}}\Big ) = \frac{1}{\big \vert y_i - \mathbf{{x}}'_i \varvec{\beta }^{(t - 1)} \big \vert }. \end{aligned}$$ -
2.
M step: We update the \(\varvec{\beta }^{(t)}\) and \(\sigma ^{(t)}\) values by maximizing the Q function \(Q(\varvec{\beta }, \sigma \mid \varvec{\beta }^{(t - 1)}, \sigma ^{(t - 1)})\). We first take the derivative with respect to \(\varvec{\beta }\) to obtain the MLE of \(\varvec{\beta }\) given by
$$\begin{aligned} \frac{\partial Q}{\partial \varvec{\beta }} = -\frac{\xi }{2 \sigma } \sum _{i = 1}^{n} \mathbf{{x}}_i + \frac{1}{2 \sigma } \sum _{i = 1}^{n} \Delta 3_i (y_i - \mathbf{{x}}'_i \varvec{\beta }) \mathbf{{x}}_i {\mathop {=}\limits ^{set}} 0, \end{aligned}$$which indicates that
$$\begin{aligned} \sum _{i = 1}^{n} \Delta 3_i \mathbf{{x}}_i \mathbf{{x}}'_i \varvec{\beta }= \sum _{i = 1}^{n} \Delta 3_i \mathbf{{x}}_i y_i - \xi \sum _{i = 1}^{n} \mathbf{{x}}_i. \end{aligned}$$Letting \(\varvec{\nabla 3} = (1/\Delta 3_1, \cdots , 1/\Delta 3_n)',\) \(\mathbf{{W}}^{(t - 1)} = \mathrm {diag} (\Delta 3_1, \cdots , \Delta 3_n)\), and \(\varvec{1}_n = (1, \cdots , 1)'\), we can rewrite (2) in a matrix form as
$$\begin{aligned} \mathbf{{X}}' \mathbf{{W}}^{(t - 1)} \mathbf{{X}}\varvec{\beta }= \mathbf{{X}}' \mathbf{{W}}^{(t - 1)} \mathbf{{y}}- \xi \mathbf{{X}}' \varvec{1}_n. \end{aligned}$$Thus, we obtain an estimate of \(\varvec{\beta }\) given by
$$\begin{aligned} {\hat{\varvec{\beta }}}^{(t)} = \Big ( \mathbf{{X}}' \mathbf{{W}}^{(t - 1)} \mathbf{{X}}\Big )^{-1} \mathbf{{X}}' \mathbf{{W}}^{(t - 1)} (\mathbf{{y}}- \xi \varvec{\nabla 3}). \end{aligned}$$(6.1)By taking the derivative with respect to \(\sigma \) to obtain the MLE expression of \(\sigma \), we have
$$\begin{aligned} \frac{\partial Q}{\partial \sigma }&= -\frac{3n + 2}{2 \sigma } - \frac{\xi }{2 \sigma ^2} \sum _{i = 1}^{n} (y_i - \mathbf{{x}}'_i \varvec{\beta }) + \frac{1}{4 \sigma ^2} \sum _{i = 1}^{n} \Delta 2_i \\&\quad + \frac{1}{4 \sigma ^2} \sum _{i = 1}^{n} \Delta 3_i (y_i - \mathbf{{x}}'_i \varvec{\beta })^2 {\mathop {=}\limits ^{set}} 0. \end{aligned}$$Thus, we obtain an estimate of \(\sigma \) give by
$$\begin{aligned} {\hat{\sigma }}^{(t)}= & {} \frac{1}{2(3n + 2)}\nonumber \\&\left[ \sum _{i = 1}^{n} \Delta 2_i + \sum _{i = 1}^{n} \Delta 3_i (y_i - \mathbf{{x}}'_i \varvec{\beta }^{(t)})^2 - 2 \xi \sum _{i = 1}^{n} (y_i - \mathbf{{x}}'_i \varvec{\beta }^{(t)})\right] . \end{aligned}$$(6.2)
We repeatedly run the EM algorithm until a suitable convergence rule has been satisfied; i.e. \(\Vert \varvec{\beta }^{(t)} - \varvec{\beta }^{(t-1)}\Vert \) and \(\Vert \sigma ^{(t)} - \sigma ^{(t-1)}\Vert \) are sufficiently small, for example. Then we set the final iterative values of the EM algorithm as the posterior modes, denoted by \({\tilde{\varvec{\beta }}}\) and \({{\tilde{\sigma }}}\), for \(\varvec{\beta }\) and \(\sigma \), respectively.
Rights and permissions
About this article
Cite this article
Dao, M., Wang, M., Ghosh, S. et al. Bayesian variable selection and estimation in quantile regression using a quantile-specific prior. Comput Stat 37, 1339–1368 (2022). https://doi.org/10.1007/s00180-021-01181-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-021-01181-5