Skip to main content
Log in

Bayesian variable selection and estimation in quantile regression using a quantile-specific prior

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Asymmetric Laplace (AL) specification has become one of the ideal statistical models for Bayesian quantile regression. In addition to fast convergence of Markov Chain Monte Carlo, AL specification guarantees posterior consistency under model misspecification. However, variable selection under such a specification is a daunting task because, realistically, prior specification of regression parameters should take the quantile levels into consideration. Quantile-specific g-prior has recently been developed for Bayesian variable selection in quantile regression, whereas it comes at a high price of the computational burden due to the intractability of the posterior distributions. In this paper, we develop a novel three-stage computational scheme for the foregoing quantile-specific g-prior, which starts with an expectation-maximization algorithm, followed by Gibbs sampler and ends with an importance re-weighting step that improves the accuracy of approximation. The performance of the proposed procedure is illustrated with simulations and a real-data application. Numerical results suggest that our procedure compares favorably with the Metropolis–Hastings algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Adlouni SE, Salaou G, St-Hilaire A (2018) Regularized Bayesian quantile regression. Commun Stat Simul Comput 47:277–293

    Article  MathSciNet  Google Scholar 

  • Alhamzawi R (2020) Brq: an r package for Bayesian quantile regression. METRON 78:313–328

    Article  MathSciNet  Google Scholar 

  • Alhamzawi R, Algamal ZY (2019) Bayesian bridge quantile regression. Commun Stat Simul Comput 48:944–956

    Article  MathSciNet  Google Scholar 

  • Alhamzawi R, Yu K (2013) Conjugate priors and variable selection for Bayesian quantile regression. Comput Stat Data Anal 64:209–219

    Article  MathSciNet  Google Scholar 

  • Andrews DF, Mallows CL (1974) Scale mixtures of normal distributions. J R Stat Soc Ser B (Methodol) 36:99–102

    MathSciNet  MATH  Google Scholar 

  • Barbieri MM, Berger JO (2004) Optimal predictive model selection. Ann Stat 32:870–897

    Article  MathSciNet  Google Scholar 

  • Benoit DF, Alhamzawi R, Yu K (2013) Bayesian lasso binary quantile regression. Comput Stat 28:2861–2873

    Article  MathSciNet  Google Scholar 

  • Bivand R, Wong DWS (2018) Comparing implementations of global and local indicators of spatial association. Test 27:716–748

    Article  MathSciNet  Google Scholar 

  • Chen CW, Dunson DB, Reed C, Yu K (2013) Bayesian variable selection in quantile regression. Stat Interface 6:261–274

    Article  MathSciNet  Google Scholar 

  • Chen R-B, Chu C-H, Lai T-Y, Wu YN (2011) Stochastic matching pursuit for Bayesian variable selection. Stat Comput 21:247–259

    Article  MathSciNet  Google Scholar 

  • Fasiolo M, Wood SN, Zaffran M, Nedellec R, Goude Y (2020) Fast calibrated additive quantile regression. J Am Stat Assoc 1–11

  • George EI, McCulloch RE (1997) Approaches for Bayesian variable selection. Stat Sinica 339–373

  • Geraci M, Bottai M (2007) Quantile regression for longitudinal data using the asymmetric Laplace distribution. Biostatistics 8:140–154

    Article  Google Scholar 

  • Harrison D, Rubinfeld DL (1978) Hedonic housing prices and the demand for clean air. J Environ Econ Manag 5:81–102

    Article  Google Scholar 

  • Hofner B, Mayr A, Robinzonov N, Schmid M (2014) Model-based boosting in R: a hands-on tutorial using the R package mboost. Comput Stat 29:3–35

    Article  MathSciNet  Google Scholar 

  • Jin R, Tan A (2021) Fast Markov chain Monte Carlo for high-dimensional Bayesian regression models with shrinkage priors. J Comput Graph Stat 0:1–15

  • Koenker R (2019) Quantreg: quantile regression. R Package Version 5:51

    Google Scholar 

  • Koenker R, Bassett G Jr (1978) Regression quantiles. J Econom Soc 46:33–50

    Article  MathSciNet  Google Scholar 

  • Kozubowski TJ, Podgórski K (2001) Asymmetric Laplace laws and modeling financial data. Math Comput Model 34:1003–1021

    Article  MathSciNet  Google Scholar 

  • Lee KE, Sha N, Dougherty ER, Vannucci M, Mallick BK (2003) Gene selection: a Bayesian variable selection approach. Bioinformatics 19:90–97

    Article  Google Scholar 

  • Li Q, Xi R, Lin N (2010) Bayesian regularized quantile regression. Bayesian Anal 5:533–556

    MathSciNet  MATH  Google Scholar 

  • Liang F, Paulo R, Molina G, Clyde MA, Berger JO (2008) Mixtures of \(g\) priors for Bayesian variable selection. J. Am. Stat. Assoc. 103:410–423

    Article  MathSciNet  Google Scholar 

  • Raftery AE, Lewis SM (1992) [Practical Markov Chain Monte Carlo]: Comment: One long run with diagnostics: implementation strategies for Markov chain Monte Carlo. Stat. Sci. 7:493–497

    Google Scholar 

  • Reed C, Yu K (2009) A partially Collapsed Gibbs Sampler for Bayesian Quantile Regression. Department of Mathematical Sciences, Brunel University, Technical Report

  • Rossell D, Rubio FJ (2018) Tractable Bayesian variable selection: beyond normality. J Am Stat Assoc 113:1742–1758

    Article  MathSciNet  Google Scholar 

  • Smith M, Kohn R (1996) Nonparametric regression using Bayesian variable selection. J Econom 75:317–343

    Article  Google Scholar 

  • Tian Y, Tang M, Wang L, Tian M (2019) Bayesian bridge-randomized penalized quantile regression estimation for linear regression model with AP (\(q\)) perturbation. J Stat Comput Simul 89:2951–2979

    Article  MathSciNet  Google Scholar 

  • Tian Y, Tian M, Zhu Q (2014) Linear quantile regression based on EM algorithm. Commun Stat Theory Methods 43:3464–3484

    Article  MathSciNet  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B (Methodol) 58:267–288

    MathSciNet  MATH  Google Scholar 

  • Tierney L, Kadane JB (1986) Accurate approximations for posterior moments and marginal densities. J Am Stat Assoc 81:82–86

    Article  MathSciNet  Google Scholar 

  • Vehtari A, Gelman A, Simpson D, Carpenter B, Bürkner P-C (2019) Rank-normalization, folding, and localization: an improved \(\widehat{R}\) for assessing convergence of MCMC. arXiv:1903.08008

  • Xi R, Li Y, Hu Y et al (2016) Bayesian quantile regression based on the empirical likelihood with spike and slab priors. Bayesian Anal 11:821–855

    Article  MathSciNet  Google Scholar 

  • Yu K, Chen CW, Reed C, Dunson D (2013) Bayesian variable selection in quantile regression. Stat Interface 6:261–274

    Article  MathSciNet  Google Scholar 

  • Yu K, Moyeed RA (2001) Bayesian quantile regression. Stat Probab Lett 54:437–447

    Article  MathSciNet  Google Scholar 

  • Zellner A (1983) Applications of Bayesian analysis in econometrics. J R Stat Soc Ser D (Stat) 32:23–34

    Google Scholar 

  • Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol) 67:301–320

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We greatly appreciate the two reviewers and an editor for their thoughtful comments and suggestions, which have improved the quality of the manuscript. This work was based on the first author’s dissertation research which was supervised by the corresponding author. The work of Drs. Keying Ye and Min Wang were partially supported by the Internal Research Awards (INTRA) program from the UTSA Vice President for Research, Economic Development, and Knowledge Enterprise at the University of Texas at San Antonio.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Min Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: The EM algorithm

Appendix: The EM algorithm

Consider the linear regression model in (1.1) with the ALD errors, i.e., \(\epsilon \sim \mathrm {ALD}(0, \sigma , p)\). To implement an EM algorithm to estimate the maximum likelihood estimates (MLEs) of \(\varvec{\beta }\) and \(\sigma \), we may treat \(\varvec{\nu }\) as the missing data. Based on the non-informative prior \(\pi (\varvec{\beta }, \sigma ) \propto 1/\sigma \) and the likelihood function of the complete data \((\mathbf{{y}}, \varvec{\nu })\) in (2.1), the complete posterior distribution of \((\varvec{\beta }, \sigma )\) is given by

$$\begin{aligned}&f(\varvec{\beta }, \sigma \mid \varvec{\nu }, \mathbf{{y}}) \propto \sigma ^{-(3n + 2)/2} \Bigg (\prod _{i = 1}^{n} \nu _i^{-1/2}\Bigg ) \\&\quad \times \exp \Bigg \{-\frac{1}{4 \sigma } \bigg [\sum _{i = 1}^{n} \frac{(y_i - \mathbf{{x}}'_i \varvec{\beta }- \xi \nu _i)^2}{\nu _i} + 4p (1 - p) \sum _{i = 1}^{n} \nu _i \bigg ]\Bigg \}. \end{aligned}$$

The log-likelihood of the complete data can be written as

$$\begin{aligned}&\mathrm {L} (\varvec{\beta }, \sigma \mid \varvec{\nu }, \mathbf{{y}}) \propto -\frac{3n + 2}{2} \mathrm {log} \sigma - \frac{1}{2} \sum _{i = 1}^{n} \mathrm {log} \nu _i \\&\quad - \frac{1}{4 \sigma } \sum _{i = 1}^{n} \frac{(y_i - \mathbf{{x}}'_i \varvec{\beta }- \xi \nu _i)^2}{\nu _i} - \zeta \sum _{i = 1}^{n} \nu _i. \end{aligned}$$

By following the work of Tian et al. (2014), we let \(\varvec{\beta }^{(0)}\) and \(\sigma ^{(0)}\) be the initial values taken from the OLS estimates. We then obtain the MLEs of the model parameters based on the EM algorithm via the following two steps:

  1. 1.

    E step: Given the \((t - 1)\)-th iteration values \(\varvec{\beta }^{(t - 1)}\) and \(\sigma ^{(t - 1)}\), we obtain the mathematical expectation of the complete data, also known as the Q function of the (t)-th iteration, given by

    $$\begin{aligned} Q(\varvec{\beta }, \sigma \mid \varvec{\nu }, \mathbf{{y}})&= \mathrm {E} \Big (\mathrm {L} (\varvec{\beta }, \sigma \mid \mathbf{{y}}, \varvec{\nu }) \mid \mathbf{{y}}, \varvec{\beta }^{(t - 1)}, \sigma ^{(t - 1)}\Big ) \\&\propto -\frac{3n + 2}{2} \mathrm {log} \sigma - \frac{1}{2} \sum _{i = 1}^{n} \mathrm {E} \Big (\mathrm {log} \nu _i \mid \mathbf{{y}}, \varvec{\beta }^{(t - 1)}, \sigma ^{(t - 1)}\Big ) \\&\quad - \frac{1}{4 \sigma } \sum _{i = 1}^{n} (y_i - \mathbf{{x}}'_i \varvec{\beta })^2 \mathrm {E} \Big (\nu _i^{-1} \mid \mathbf{{y}}, \varvec{\beta }^{(t - 1)}, \sigma ^{(t - 1)}\Big ) \\&\quad + \frac{\xi }{2 \sigma } \sum _{i = 1}^{n} (y_i - \mathbf{{x}}'_i \varvec{\beta }) \\&\quad - \bigg (\frac{\xi ^2}{4 \sigma } + \zeta \bigg ) \sum _{i = 1}^{n} \mathrm {E} \Big (\nu _i \mid \mathbf{{y}}, \varvec{\beta }^{(t - 1)}, \sigma ^{(t - 1)}\Big )\\&\propto -\frac{3n + 2}{2} \mathrm {log} \sigma + \frac{\xi }{2 \sigma } \sum _{i = 1}^{n} (y_i - \mathbf{{x}}'_i \varvec{\beta }) - \frac{1}{2} \sum _{i = 1}^{n} \Delta 1_i \\&\quad - \frac{1}{4 \sigma } \sum _{i = 1}^{n} \Delta 2_i \\&\quad - \frac{1}{4 \sigma } \sum _{i = 1}^{n} (y_i - \mathbf{{x}}'_i \varvec{\beta })^2 \Delta 3_i, \end{aligned}$$

    where

    $$\begin{aligned} \Delta 1_i&= \mathrm {E} \Big (\mathrm {log} \nu _i \mid \varvec{\beta }^{(t - 1)}, \sigma ^{(t - 1)}, \mathbf{{y}}\Big ), \\ \Delta 2_i&= \mathrm {E} \Big (\nu _i \mid \varvec{\beta }^{(t - 1)}, \sigma ^{(t - 1)}, \mathbf{{y}}\Big ), \\ \Delta 3_i&= \mathrm {E} \Big (\nu _i^{-1} \mid \varvec{\beta }^{(t - 1)}, \sigma ^{(t - 1)}, \mathbf{{y}}\Big ). \end{aligned}$$

    To compute \(\Delta 1_i\), \(\Delta 2_i\), and \(\Delta 3_i\) for \(i = 1, \cdots , n\), we first obtain the conditional pdf \(f(\varvec{\nu }\mid \varvec{\beta }, \sigma , \mathbf{{y}})\). We observe that

    $$\begin{aligned} f(\varvec{\nu }\mid \mathbf{{y}})&\propto \Bigg (\prod _{i = 1}^{n} \nu _i^{-1/2}\Bigg ) \exp \Bigg \{-\frac{1}{4 \sigma } \bigg [\sum _{i = 1}^{n} \frac{(y_i - \mathbf{{x}}'_i \varvec{\beta }- \xi \nu _i)^2}{\nu _i} \\&\quad + 4p (1 - p) \sum _{i = 1}^{n} \nu _i \bigg ]\Bigg \} \\&\propto \Bigg (\prod _{i = 1}^{n} \nu _i^{-1/2}\Bigg ) \exp \Bigg \{-\frac{1}{2} \sum _{i = 1}^{n} \bigg [\frac{(y_i - \mathbf{{x}}'_i \varvec{\beta })^2}{2 \sigma } \nu _i^{-1} + \frac{1}{2 \sigma } \nu _i \bigg ]\Bigg \}. \end{aligned}$$

    Since \(\nu _i\)’s are independent of each other for \(i = 1, \cdots , n\), the marginal posterior distribution of \(\nu _i\) given \(y_i\) is given by

    $$\begin{aligned} f(\nu _i \mid y_i) \propto \nu _i^{-1/2} \exp \Bigg \{-\frac{1}{2} \bigg [\frac{(y_i - \mathbf{{x}}'_i \varvec{\beta })^2}{2 \sigma } \nu _i^{-1} + \frac{1}{2 \sigma } \nu _i \bigg ]\Bigg \}, \end{aligned}$$

    which follows a GIG distribution, denoted by \(\mathrm {GIG} \Bigl (1/2, (y_i - \mathbf{{x}}'_i \varvec{\beta })^2/(2 \sigma ), 1/(2 \sigma ) \Bigr )\). Here, if \(x \sim \mathrm {GIG}(\lambda , \chi , \psi )\), then the pdf of x is given by

    $$\begin{aligned} f(x) = \frac{\chi ^{-\lambda }(\sqrt{\chi \psi })^\lambda }{2 K_\lambda (\sqrt{\chi \psi })} x^{\lambda - 1} \exp \Bigg \{-\frac{1}{2} \Big (\chi x^{-1} + \psi x\Big ) \Bigg \}, \quad x > 0, \end{aligned}$$

    where \(K_\lambda (\cdot )\) denotes a modified Bessel function of the third kind with an index \(\lambda \) and the parameters such that \(\chi > 0\), and \(\psi \ge 0\) if \(\lambda < 0\). When \(\chi > 0\), \(\psi > 0\), and \(\alpha \in R\), we have

    $$\begin{aligned} \mathrm {E}(X^\alpha ) = \bigg (\frac{\chi }{\psi }\bigg )^{\alpha /2} \frac{K_{\lambda + \alpha }(\sqrt{\chi \psi })}{K_{\lambda }(\sqrt{\chi \psi })} \quad \mathrm {and} \quad \mathrm {E}(\mathrm {log} X) = \frac{d\mathrm {E}(X^\alpha )}{d\alpha } \Bigr |_{\alpha = 0}. \end{aligned}$$

    Therefore, we obtain the following results given by

    $$\begin{aligned} \Delta 1_i&= \mathrm {E} \Big (\mathrm {log} \nu _i \mid \varvec{\beta }^{(t - 1)}, \sigma ^{(t - 1)} , \mathbf{{y}}\Big ) = \frac{d\mathrm {E} \big (\nu _i^\alpha \mid \mathbf{{y}}, \varvec{\beta }^{(t - 1)}, \sigma ^{(t - 1)} \big )}{d\alpha } \Bigr |_{\alpha = 0}, \\ \Delta 2_i&= \mathrm {E} \Big (\nu _i \mid \varvec{\beta }^{(t - 1)}, \sigma ^{(t - 1)}, \mathbf{{y}}\Big ) = \big \vert y_i - \mathbf{{x}}'_i \varvec{\beta }^{(t - 1)} \big \vert + 2 \sigma ^{(t - 1)}, \\ \Delta 3_i&= \mathrm {E} \Big (\nu _i^{-1} \mid \varvec{\beta }^{(t - 1)}, \sigma ^{(t - 1)}, \mathbf{{y}}\Big ) = \frac{1}{\big \vert y_i - \mathbf{{x}}'_i \varvec{\beta }^{(t - 1)} \big \vert }. \end{aligned}$$
  2. 2.

    M step: We update the \(\varvec{\beta }^{(t)}\) and \(\sigma ^{(t)}\) values by maximizing the Q function \(Q(\varvec{\beta }, \sigma \mid \varvec{\beta }^{(t - 1)}, \sigma ^{(t - 1)})\). We first take the derivative with respect to \(\varvec{\beta }\) to obtain the MLE of \(\varvec{\beta }\) given by

    $$\begin{aligned} \frac{\partial Q}{\partial \varvec{\beta }} = -\frac{\xi }{2 \sigma } \sum _{i = 1}^{n} \mathbf{{x}}_i + \frac{1}{2 \sigma } \sum _{i = 1}^{n} \Delta 3_i (y_i - \mathbf{{x}}'_i \varvec{\beta }) \mathbf{{x}}_i {\mathop {=}\limits ^{set}} 0, \end{aligned}$$

    which indicates that

    $$\begin{aligned} \sum _{i = 1}^{n} \Delta 3_i \mathbf{{x}}_i \mathbf{{x}}'_i \varvec{\beta }= \sum _{i = 1}^{n} \Delta 3_i \mathbf{{x}}_i y_i - \xi \sum _{i = 1}^{n} \mathbf{{x}}_i. \end{aligned}$$

    Letting \(\varvec{\nabla 3} = (1/\Delta 3_1, \cdots , 1/\Delta 3_n)',\) \(\mathbf{{W}}^{(t - 1)} = \mathrm {diag} (\Delta 3_1, \cdots , \Delta 3_n)\), and \(\varvec{1}_n = (1, \cdots , 1)'\), we can rewrite (2) in a matrix form as

    $$\begin{aligned} \mathbf{{X}}' \mathbf{{W}}^{(t - 1)} \mathbf{{X}}\varvec{\beta }= \mathbf{{X}}' \mathbf{{W}}^{(t - 1)} \mathbf{{y}}- \xi \mathbf{{X}}' \varvec{1}_n. \end{aligned}$$

    Thus, we obtain an estimate of \(\varvec{\beta }\) given by

    $$\begin{aligned} {\hat{\varvec{\beta }}}^{(t)} = \Big ( \mathbf{{X}}' \mathbf{{W}}^{(t - 1)} \mathbf{{X}}\Big )^{-1} \mathbf{{X}}' \mathbf{{W}}^{(t - 1)} (\mathbf{{y}}- \xi \varvec{\nabla 3}). \end{aligned}$$
    (6.1)

    By taking the derivative with respect to \(\sigma \) to obtain the MLE expression of \(\sigma \), we have

    $$\begin{aligned} \frac{\partial Q}{\partial \sigma }&= -\frac{3n + 2}{2 \sigma } - \frac{\xi }{2 \sigma ^2} \sum _{i = 1}^{n} (y_i - \mathbf{{x}}'_i \varvec{\beta }) + \frac{1}{4 \sigma ^2} \sum _{i = 1}^{n} \Delta 2_i \\&\quad + \frac{1}{4 \sigma ^2} \sum _{i = 1}^{n} \Delta 3_i (y_i - \mathbf{{x}}'_i \varvec{\beta })^2 {\mathop {=}\limits ^{set}} 0. \end{aligned}$$

    Thus, we obtain an estimate of \(\sigma \) give by

    $$\begin{aligned} {\hat{\sigma }}^{(t)}= & {} \frac{1}{2(3n + 2)}\nonumber \\&\left[ \sum _{i = 1}^{n} \Delta 2_i + \sum _{i = 1}^{n} \Delta 3_i (y_i - \mathbf{{x}}'_i \varvec{\beta }^{(t)})^2 - 2 \xi \sum _{i = 1}^{n} (y_i - \mathbf{{x}}'_i \varvec{\beta }^{(t)})\right] . \end{aligned}$$
    (6.2)

We repeatedly run the EM algorithm until a suitable convergence rule has been satisfied; i.e. \(\Vert \varvec{\beta }^{(t)} - \varvec{\beta }^{(t-1)}\Vert \) and \(\Vert \sigma ^{(t)} - \sigma ^{(t-1)}\Vert \) are sufficiently small, for example. Then we set the final iterative values of the EM algorithm as the posterior modes, denoted by \({\tilde{\varvec{\beta }}}\) and \({{\tilde{\sigma }}}\), for \(\varvec{\beta }\) and \(\sigma \), respectively.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dao, M., Wang, M., Ghosh, S. et al. Bayesian variable selection and estimation in quantile regression using a quantile-specific prior. Comput Stat 37, 1339–1368 (2022). https://doi.org/10.1007/s00180-021-01181-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-021-01181-5

Keywords

Navigation