Sparse Bayesian variable selection in kernel probit model for analyzing high-dimensional data

Yang, Aijun; Tian, Yuzhu; Li, Yunxian; Lin, Jinguan

doi:10.1007/s00180-019-00917-8

Sparse Bayesian variable selection in kernel probit model for analyzing high-dimensional data

Original paper
Published: 13 August 2019

Volume 35, pages 245–258, (2020)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Aijun Yang¹,
Yuzhu Tian²,
Yunxian Li³ &
…
Jinguan Lin⁴

317 Accesses
2 Citations
Explore all metrics

Abstract

In this paper, we developed a sparse Bayesian variable selection in kernel probit model for high-dimensional data classification. Particularly we assigned a correlation prior distribution on the model size and a sparse prior distribution on the regression parameters. MCMC-based computation algorithms are outlined to generate samples from the posterior distributions. Simulation and real data studies show that in terms of the accuracy of variable selection and classification, our proposed method performs better than the other five Bayesian methods without the correlation term in the prior or those involving only one shrinkage parameter.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature Screening for Nonparametric and Semiparametric Models with Ultrahigh-Dimensional Covariates

Article 17 November 2017

Robust and sparse multinomial regression in high dimensions

Article 16 April 2023

Selective inference via marginal screening for high dimensional classification

Article 27 August 2019

References

Albert J, Chib S (1993) Bayesian analysis of binary and polychotomous response data. J Am Stat Assoc 88:669–679
Article MathSciNet Google Scholar
Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96:6745–6750
Article Google Scholar
Araki T, Ikeda K, Akaho S (2015) An efficient sampling algorithm with adaptations for Bayesian variable selection. Neural Netw 61:22–31
Article Google Scholar
Armagan A, Dunson DB, Lee J (2013) Generalized double Pareto shrinkage. Statistica Sinica 3(1):119–143
MathSciNet MATH Google Scholar
Ben-Dor A et al (2000) Tissue classification with gene expression profiles. J Comput Biol 7:559–583
Article Google Scholar
Bradley P, Mangasarian O (1998) Feature selection via concave minimization and support vector machines. In: Proceedings of the 15th international conference on machine learning, pp 82–90
Chakraborty S, Mallick BK, Ghosh M (2013) Bayesian hierarchical kernel machines for nonlinear regression and classification. In: Damien P, Dellaportas P, Polson NG, Stephens DA (eds) Bayesian theory and applications (A tribute to Sir Adrian Smith). Oxford University Press, Oxford, pp 50–69
Chapter Google Scholar
Chhikara R, Folks L (1989) The inverse gaussian distribution: theory, methodology and applications. Marcel Dekker, New York
MATH Google Scholar
Crawford L, Wood KC, Zhou X, Mukherjee S (2017) Bayesian approximate kernel regression with variable selection. J Am Stat Assoc 113:1710–1721. https://doi.org/10.1080/01621459.2017.1361830
Article MathSciNet MATH Google Scholar
Dettling M (2004) BagBoosting for tumor classification with gene expression data. Bioinformatics 20:3583–3593
Article Google Scholar
Devroye L (1986) Non-uniform random variate generation. Springer, New York
Book Google Scholar
Dougherty ER (2001) Small sample issues for microarray-based classification. Comp Funct Genom 2:28–34
Article Google Scholar
George EI, McCulloch RE (1993) Variable selection via Gibbs sampling. J Am Stat Assoc 88:881–889
Article Google Scholar
Gelfand A, Smith AFM (1990) Sampling based approaches to calculating marginal densities. J Am Stat Assoc 85:398–409
Article MathSciNet Google Scholar
Golub TR et al (1999) Molecular classification of cancer:class discovery and class prediction by gene expression monitoring. Science 286:531–537
Article Google Scholar
Guyon I, Weston J, Barnhill S, Vapnik V et al (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422
Article Google Scholar
Lamnisos D, Grin JE, Mark Steel FJ (2009) Transdimensional sampling algorithms for Bayesian variable selection in classification problems with many more variables than observations. J Comput Gr Stat 18:592–612
Article MathSciNet Google Scholar
Lee KE et al (2003) Gene selection: a Bayesian variable selection approach. Bioinformatics 19:90–97
Article Google Scholar
Mallick BK, Ghosh D, Ghosh M (2005) Bayesian classification of tumors using gene expression data. J R Stat Soc B 67:219–232
Article Google Scholar
Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equations of state calculations by fast computing machines. J Chem Phys 21:1087–1092
Article Google Scholar
Notterman D et al (2001) Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotidearrays. Cancer Res 61:3124–3130
Google Scholar
Panagiotelisa A, Smith M (2008) Bayesian identification, selection and estimation of semiparametric functions in high dimensional additive models. J Econom 143:291–316
Article MathSciNet Google Scholar
Park K, Casella G (2008) The Bayesian lasso. J Am Stat Assoc 103:681–686
Article MathSciNet Google Scholar
Shailubhai K et al (2000) Uroguanylin treatment suppresses polyp formation in the Apc(Min/+) mouse and induces apoptosis in human colon adenocarcinoma cells via cyclic GMP. Cancer Res 60:5151–5157
Google Scholar
Tolosi L, Lengauer T (2011) Classification with correlated features: unreliability of feature ranking and solutions. Bioinformatics 27:1986–1994
Article Google Scholar
Troyanskaya OG et al (2002) Nonparametric methods for identifying differentially expressed genes in microarray data. Bioinformatics 18:1454–1461
Article Google Scholar
Vapnik VN (1995) The nature of statistical learning theory. Springer, New York
Book Google Scholar
Wahba G (1990) Spline models for observational data. SIAM, Philadelphia
Book Google Scholar
Wang L, Zhu J, Zou H (2008) Hybrid huberized support vector machines for microarray classification and gene selection. Bioinformatics 24:412–419
Article Google Scholar
Yang AJ, Xiang J, Yang HQ, Lin JG (2018a) Sparse Bayesian variable selection in probit model for forecasting U.S. recessions using a large set of predictors. Comput Econ 51:1123–1138
Article Google Scholar
Yang AJ, Jiang XJ, Shu LJ, Liu PF (2018b) Sparse bayesian kernel multinomial probit regression model for high-dimensional data classification. Commun Stat-Theory Methods 48:165–176. https://doi.org/10.1080/03610926.2018.1463385
Article MathSciNet Google Scholar
Yang AJ, Xiang J, Shu LJ, Yang HQ (2018c) Sparse bayesian variable selection with correlation prior for forecasting macroeconomic variable using highly correlated predictors. Comput Econ 51:323–338
Article Google Scholar
Yuan M, Lin Y (2005) Efficient empirical Bayes variable selection and estimation in linear models. J Am Stat Assoc 472:1215–1225
Article MathSciNet Google Scholar
Zhang Z, Dai G, Jordan MI (2011) Bayesian generalized kernel mixed models. J Mach Learn Res 12:111–139
MathSciNet MATH Google Scholar
Zhou X, Wang X, Wong S (2004a) A Bayesian approach to nonlinear probit gene selection and classification. J Frankl Inst 341:137–156
Article MathSciNet Google Scholar
Zhou X, Liu K, Wong S (2004b) Cancer classification and prediction using logistic regression with Bayesian gene selection. J Biomed Inf 37:249–259
Article Google Scholar

Download references

Acknowledgements

The authors gratefully acknowledge the financial support of the Humanities and Social Science Foundation of Ministry of Education of China (18YJC910001), the Natural Science Foundation of China (11501294,11501167,11571073), the University Philosophy and Social Science Research Project of Jiangsu Province (2018SJA0130) and the Jiangsu Qinglan Project(2017).

Author information

Authors and Affiliations

College of Economics and Management, Nanjing Forestry University, Nanjing, China
Aijun Yang
School of Mathematics and Statistics, Henan University of Science and Technology, Luoyang, China
Yuzhu Tian
School of Finance, Yunnan University of Finance and Economics, Kunming, China
Yunxian Li
School of Statistics and Mathematics, Nanjing Audit University, Nanjing, China
Jinguan Lin

Authors

Aijun Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yuzhu Tian
View author publications
You can also search for this author in PubMed Google Scholar
Yunxian Li
View author publications
You can also search for this author in PubMed Google Scholar
Jinguan Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aijun Yang.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, A., Tian, Y., Li, Y. et al. Sparse Bayesian variable selection in kernel probit model for analyzing high-dimensional data. Comput Stat 35, 245–258 (2020). https://doi.org/10.1007/s00180-019-00917-8

Download citation

Received: 01 February 2018
Accepted: 09 August 2019
Published: 13 August 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s00180-019-00917-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sparse Bayesian variable selection in kernel probit model for analyzing high-dimensional data

Abstract

Access this article

Similar content being viewed by others

Feature Screening for Nonparametric and Semiparametric Models with Ultrahigh-Dimensional Covariates

Robust and sparse multinomial regression in high dimensions

Selective inference via marginal screening for high dimensional classification

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sparse Bayesian variable selection in kernel probit model for analyzing high-dimensional data

Abstract

Access this article

Similar content being viewed by others

Feature Screening for Nonparametric and Semiparametric Models with Ultrahigh-Dimensional Covariates

Robust and sparse multinomial regression in high dimensions

Selective inference via marginal screening for high dimensional classification

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation