Skip to main content
Log in

Minimizing variable selection criteria by Markov chain Monte Carlo

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Regression models with a large number of predictors arise in diverse fields of social sciences and natural sciences. For proper interpretation, we often would like to identify a smaller subset of the variables that shows the strongest information. In such a large size of candidate predictors setting, one would encounter a computationally cumbersome search in practice by optimizing some criteria for selecting variables, such as AIC, \(C_{P}\) and BIC, through all possible subsets. In this paper, we present two efficient optimization algorithms vis Markov chain Monte Carlo (MCMC) approach for searching the global optimal subset. Simulated examples as well as one real data set exhibit that our proposed MCMC algorithms did find better solutions than other popular search methods in terms of minimizing a given criterion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19(6):716–723

    Article  MathSciNet  MATH  Google Scholar 

  • Bakin S (1999) Adaptive regression and model selection in data mining problems. Ph.D. thesis, Australian National University, Canberra

  • Breiman L (1995) Better subset regression using the nonnegative garrote. Technometrics 37(4):373–384

    Article  MathSciNet  MATH  Google Scholar 

  • Candes E, Tao T (2007) The dantzig selector: statistical estimation when \(p\) is much larger than \(n\). Ann Stat 35:2313–2351

    Article  MathSciNet  MATH  Google Scholar 

  • Chiang A, Beck J, Yen HJ, Tayeh M, Scheetz T, Swiderski R, Nishimura D, Braun T, Kim KY, Huang J, Elbedour K, Carmi R, Slusarski D, Casavant T, Stone E, Sheffield V (2006) Homozygosity mapping with snp arrays identifies trim32, an e3 ubiquitin ligase, as a bardet-biedl syndrome gene (bbs11). Proc Natl Acad Sci 103(16):6287–6292

    Article  Google Scholar 

  • Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32(2):407–499

    Article  MathSciNet  MATH  Google Scholar 

  • Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360

    Article  MathSciNet  MATH  Google Scholar 

  • George EI, McCulloch RE (1993) Variable selection via gibbs sampling. J Am Stat Assoc 88(423):881–889

    Article  Google Scholar 

  • George EI, McCulloch RE (1997) Approaches for bayesian variable selection. Stat Sin 7(2):339–373

    MATH  Google Scholar 

  • Huang J, Ma S, Zhang CH (2008) Adaptive lasso for sparse high-dimensional regression models. Stat Sin 18(4):1603–1618

    MathSciNet  MATH  Google Scholar 

  • Irizarry R, Hobbs B, Collin F, Beazer-Barclay Y, Antonellis K, Scherf U, Speed T (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4(2):249–264

    Article  MATH  Google Scholar 

  • Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671680

    Article  MathSciNet  MATH  Google Scholar 

  • Kohn R, Smith M, Chan D (2001) Nonparametric regression using linear combinations of basis functions. Stat Comput 11:313–322

    Article  MathSciNet  Google Scholar 

  • Liang F, Paulo R, Molina G, Clyde MA, Berger JO (2008) Mixtures of g priors for bayesian variable selection. J Am Stat Assoc 103(481):410–423

    Article  MathSciNet  MATH  Google Scholar 

  • Mallows C (1973) Some comments on \(c_{P}\). Technometrics 15(4):661–675

    MATH  Google Scholar 

  • Miller A (2002) Subset selection in regression, 2nd edn. Chapman and Hall/CRC, Boca Raton

    Book  MATH  Google Scholar 

  • Muller P, Quintana FA (2004) Nonparametric bayesian data analysis. Stat Sci 19(1):95–110

    Article  MathSciNet  MATH  Google Scholar 

  • Rocha G, Zhao P (2006) Lasso Matlab codes. http://www.stat.berkeley.edu/twiki/Research/YuGroup/Software

  • Scheetz T, Kim KY, Swiderski R, Philp A, Braun T, Knudtson K, Dorrance A, DiBona G, Huang J, Casavant T, Sheffield V, Stone E (2006) Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proc Natl Acad Sci 103(39):14,429–14,434

    Article  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464

    Article  MathSciNet  MATH  Google Scholar 

  • Shiryaev A (1996) Probability, 2nd edn. Springer, New York

    Book  MATH  Google Scholar 

  • Smith M, Kohn R (1996) Nonparametric regression using bayesian variable selection. J Econom 75(2):317–343

    Article  MATH  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288

    MathSciNet  MATH  Google Scholar 

  • Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K (2005) Sparsity and smoothness via the fused lasso. J R Stat Soc Ser B 67(1):91–108

    Article  MathSciNet  MATH  Google Scholar 

  • Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B 68(1):49–67

    Article  MathSciNet  MATH  Google Scholar 

  • Zellner A (1986) On assessing prior distributions and bayesian regression analysis with g-prior distributions. In: Goel PK, Zellner A (eds) Bayesian Inference and Decision Techniques Essays in Honor of Bruno de Finetti. North Holland, Amsterdam, pp 233–243

  • Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101(476):1418–1429

    Article  MathSciNet  MATH  Google Scholar 

  • Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B 67(2):301–320

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ting-Li Chen.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 118 KB)

Appendices

Appendix 1: Proofs

Proof of Theorem 1

Assume \(Q_{q,i}\), \(Q_{q,j}\in \mathcal {Q}_{q}\). Let \(E_{i,j}=Q_{q,i}\cap Q_{q,j}\) and \(n_{i,j}=q-|E_{i,j}|\). According to MCMC1 algorithm, we have

$$\begin{aligned} p_{ij}=\left\{ \begin{array}{lll} \sum _{\eta \in Q_{q,i}}\frac{1}{q}\cdot \frac{1}{\tilde{z}_{Q_{q,i}\setminus \{\eta \}}}\cdot f(\text {RSS}_{Q_{q,i}}) &{}\quad \text{ if } n_{i,j}= 0, \\ \frac{1}{q}\cdot \frac{1}{\tilde{z}_{E_{i,j}}}\cdot f(\text {RSS}_{Q_{q,j}}) &{}\quad \text{ if } n_{i,j}= 1, \\ 0 &{}\quad \text{ otherwise, } \end{array} \right. \end{aligned}$$

where \(\tilde{z}_{B}=\sum _{\kappa \notin B} f(\text {RSS}_{B\cup \{\kappa \}})\), \(B\in \mathcal {Q}_{q-1}\). For the convergence, by Ergodic Theorem (see “Appendix 2”), it suffices to show that there exists an \(n_{0}\) such that \(p_{ij}^{(n_{0})}>0\) for all ij. First, consider the case \(j=i\),

$$\begin{aligned} p_{ii}^{(q)}&\ge \prod _{k=1}^{q} \Pr \left( x_{k}=Q_{q,i}\mid x_{k-1}=Q_{q,i}\right) \nonumber \\&= p^{q}_{ii} \nonumber \\&= \left[ \sum _{\eta \in Q_{q,i}}\frac{1}{q}\cdot \frac{1}{\tilde{z}_{Q_{q,i}\setminus \{\eta \}}} \cdot f(\text {RSS}_{Q_{q,i}})\right] ^{q} ~~~>0.\nonumber \end{aligned}$$

For \(i \ne j\), let

$$\begin{aligned} Q^{*}_{q,i}=Q_{q,i}{\setminus }E_{i,j},~~ Q^{*}_{q,j}=Q_{q,j}{\setminus }E_{i,j}, \end{aligned}$$

and define

$$\begin{aligned} Q^{*(k)} = Q^{*}_{q,i}(k+1:n_{i,j}) \cup E_{i,j} \cup Q^{*}_{q,j}(1:k) \quad \quad \quad 0\le k \le n_{i,j}, \end{aligned}$$

where \(Q^{*}_{q,j}(1:k)\) denotes the first k variables in \(Q^{*}_{q,j}\) and \(Q^{*}_{q,i}(k+1:n_{i,i})\) denotes the set \(Q^{*}_{q,i}\) excluding the first k variables. In this setup, for \(1\le k\le n_{i,j}\), \(q-|Q^{*(k-1)} \cap Q^{*(k)}|=1\), which leads to

$$\begin{aligned} \Pr \left( x_{k}=Q^{*(k)}\mid x_{k-1}=Q^{*(k-1)}\right) >0. \end{aligned}$$

Therefore,

$$\begin{aligned} p_{ij}^{(q)}&=\sum _{\alpha }p^{(n_{i,j})}_{i\alpha }p^{(q-n_{i,j})}_{\alpha j} \nonumber \\&\ge p^{(n_{i,j})}_{i j}p^{(q-n_{i,j})}_{jj} \nonumber \\&\ge \prod _{k=1}^{n_{i,j}} \Pr \left( x_{k}=Q^{*(k)}\mid x_{k-1}=Q^{*(k-1)}\right) \cdot p^{q-n_{i,j}}_{jj} >0,\nonumber \end{aligned}$$

where \(p^{(0)}_{\alpha j}:=1\) for all \(\alpha \). Since \(p_{ij}^{(q)}>0\) for all ij, \(p_{ij}^{(m)}\)’s converge by Ergodic Theorem. Next, we will show that \(p_{ij}^{(m)}\) converges to \(f(\text {RSS}_{Q_{q,j}})/z\) for \(j=1,\ldots ,C^{p}_{q}\). For \(n_{i,j}\ne 1\), it is clear that

$$\begin{aligned} \frac{1}{z} \cdot f(\text {RSS}_{Q_{q,i}}) \cdot p_{ij} =\frac{1}{z}\cdot f(\text {RSS}_{Q_{q,j}})\cdot p_{ji}. \end{aligned}$$

For \(n_{i,j}=1\),

$$\begin{aligned} \frac{1}{z} \cdot f(\text {RSS}_{Q_{q,i}}) \cdot p_{ij}&= \frac{1}{z} \cdot f(\text {RSS}_{Q_{q,i}}) \cdot \frac{1}{q}\cdot \frac{1}{\tilde{z}_{E_{i,j}}}\cdot f(\text {RSS}_{Q_{q,j}})\nonumber \\&= \frac{1}{z} \cdot f(\text {RSS}_{Q_{q,j}}) \cdot \frac{1}{q}\cdot \frac{1}{\tilde{z}_{E_{j,i}}}\cdot f(\text {RSS}_{Q_{q,i}})\nonumber \\&= \frac{1}{z} \cdot f(\text {RSS}_{Q_{q,j}}) \cdot p_{ji}.\nonumber \end{aligned}$$

By Remark 3 (see “Appendix 2”) , we prove \(\pi _{j}=f(\text {RSS}_{Q_{q,j}})/z\), \(j=1,\ldots ,C^{p}_{q}\).   \(\square \)

Proof of Theorem 2

The proof is very similar to that of MCMC1. Assume \(Q_{q,i}\), \(Q_{q,j}\in \mathcal {Q}_{q}\). Let \(E_{i,j}=Q_{q,i}\cap Q_{q,j}\) and \(n_{i,j}=q-|E_{i,j}|\). According to MCMC2 algorithm, we have

$$\begin{aligned} p_{ij}=\left\{ \begin{array}{lll} \sum _{\eta \in Q_{q,i}} \frac{1}{z_{Q_{q,i}}} \cdot f(\text {RSS}_{Q_{q,i}\setminus \{\eta \}}) \cdot \frac{1}{\tilde{z}_{Q_{q,i}\setminus \{\eta \}}}\cdot f(\text {RSS}_{Q_{q,i}}) &{}\qquad \text{ if } n_{i,j}= 0, \\ \frac{1}{z_{Q_{q,i}}} \cdot f(\text {RSS}_{E_{i,j}})\cdot \frac{1}{\tilde{z}_{E_{i,j}}}\cdot f(\text {RSS}_{Q_{q,j}}) &{}\qquad \text{ if } n_{i,j}= 1, \\ 0&{} \qquad \text{ otherwise, } \end{array} \right. \end{aligned}$$

where \(z_{A}=\sum _{\zeta \in A}f(\text {RSS}_{A\setminus \{\zeta \}})\) for \(A\in \mathcal {Q}_{q}\), and \(\tilde{z}_{B}=\sum _{\kappa \notin B} f(\text {RSS}_{B\cup \{\kappa \}})\) for \(B\in \mathcal {Q}_{q-1}\). To apply Ergodic Theorem, we claim that \(p_{ij}^{(q)}>0\) in the following. For \(j=i\),

$$\begin{aligned} p_{ii}^{(q)} \ge \left[ \sum _{\eta \in Q_{q,i}} \frac{1}{z_{Q_{q,i}}} \cdot f(\text {RSS}_{Q_{q,i}\setminus \{\eta \}}) \cdot \frac{1}{\tilde{z}_{Q_{q,i}\setminus \{\eta \}}}\cdot f(\text {RSS}_{Q_{q,i}})\right] ^{q} ~~~>0. \end{aligned}$$

For \(i\ne j\), following the same arguments and notations of the proof of Theorem 1, we have

$$\begin{aligned} p_{ij}^{(q)} \ge \prod _{k=1}^{n_{i,j}} \Pr \left( x_{k}=Q^{*(k)}\mid x_{k-1}=Q^{*(k-1)}\right) \cdot p^{q-n_{i,j}}_{jj} >0. \end{aligned}$$

Since \(p_{ij}^{(q)}>0\) for all ij, \(p_{ij}^{(m)}\)’s converge by Ergodic Theorem. Next, we will show that \(p_{ij}^{(m)}\) converges to \(\frac{1}{z} \cdot \sum _{\zeta \in Q_{q,j}}f(\text {RSS}_{Q_{q,j}\setminus \{\zeta \}}) \cdot f(\text {RSS}_{Q_{q,j}})\) for \(j=1,\ldots ,C^{p}_{q}\). For \(n_{i,j}\ne 1\), it is trivial that

$$\begin{aligned} \frac{z_{Q_{q,i}}}{z}\cdot f(\text {RSS}_{Q_{q,i}})\cdot p_{ij} =\frac{z_{Q_{q,j}}}{z}\cdot f(\text {RSS}_{Q_{q,j}})\cdot p_{ji}. \end{aligned}$$

For \(n_{i,j}=1\),

$$\begin{aligned}&\frac{z_{Q_{q,i}}}{z} \cdot f(\text {RSS}_{Q_{q,i}}) \cdot p_{ij} \nonumber \\&\quad = \frac{z_{Q_{q,i}}}{z} \cdot f(\text {RSS}_{Q_{q,i}}) \cdot \frac{1}{z_{Q_{q,i}}} \cdot f(\text {RSS}_{E_{i,j}})\cdot \frac{1}{\tilde{z}_{E_{i,j}}}\cdot f(\text {RSS}_{Q_{q,j}})\nonumber \\&\quad = \frac{z_{Q_{q,j}}}{z} \cdot f(\text {RSS}_{Q_{q,j}}) \cdot \frac{1}{z_{Q_{q,j}}} \cdot f(\text {RSS}_{E_{j,i}})\cdot \frac{1}{\tilde{z}_{E_{j,i}}}\cdot f(\text {RSS}_{Q_{q,i}})\nonumber \\&\quad = \frac{z_{Q_{q,j}}}{z} \cdot f(\text {RSS}_{Q_{q,j}}) \cdot p_{ji}.\nonumber \end{aligned}$$

By Remark 3 (see “Appendix 2”), we have

$$\begin{aligned} \pi _{j}&= \frac{z_{Q_{q,j}}}{z} \cdot f(\text {RSS}_{Q_{q,j}}) \nonumber \\&= \frac{1}{z} \cdot \sum _{\zeta \in Q_{q,j}}f(\text {RSS}_{Q_{q,j}\setminus \{\zeta \}}) \cdot f(\text {RSS}_{Q_{q,j}})\,\,\,\,j=1,\ldots ,C^{p}_{q}. \nonumber \end{aligned}$$

\(\square \)

Appendix 2

Ergodic Theorem

Let \(\varGamma =[p_{ij}]\) be the transition matrix of a chain with a finite state space \(\mathbf S =\{1, 2, \ldots , S\}\). If there is an \(n_{0}\) such that

$$\begin{aligned} \min _{i,\,j}p_{ij}^{(n_{0})}>0, \end{aligned}$$

then there is a sequence of numbers \(\pi _{1}, \ldots , \pi _{S}\) such that

$$\begin{aligned} \pi _{j}>0,\,\,\,\,\sum _{j}\pi _{j}=1 \end{aligned}$$
(5)

and

$$\begin{aligned} p_{ij}^{(m)}\rightarrow \pi _{j},\,\,\,\,m\rightarrow \infty \end{aligned}$$

for every \(i\in \mathbf S \). Moreover, the sequence \(\pi _{1}, \ldots , \pi _{S}\) is the only stationary probability distribution with transition matrix \(\varGamma \), that is, the set \(\pi _{1}, \ldots , \pi _{S}\) is the only one which satisfies both (5) and the following equations

$$\begin{aligned} \pi _{j}=\sum _{\alpha }\pi _{\alpha }p_{\alpha j},\,\,\,\,j=1,\ldots ,S. \nonumber \end{aligned}$$

Proof

Vide Shiryaev (1996, p. 118–120). \(\square \)

Remark 3

If there is a sequence of numbers \(\tilde{\pi }_{1}, \ldots , \tilde{\pi }_{S}\) which satisfies (5) and the equations

$$\begin{aligned} \tilde{\pi }_{i} p_{ij}=\tilde{\pi }_{j}p_{ji},\,\,\,\,i,j=1,\ldots ,S, \nonumber \end{aligned}$$

then \(\tilde{\pi }_{i}=\pi _{i}\) for \(i=1,\ldots ,S\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chin, YS., Chen, TL. Minimizing variable selection criteria by Markov chain Monte Carlo. Comput Stat 31, 1263–1286 (2016). https://doi.org/10.1007/s00180-016-0649-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-016-0649-3

Keywords

Navigation