Ultra-high dimensional variable screening via Gram–Schmidt orthogonalization

Wang, Huiwen; Liu, Ruiping; Wang, Shanshan; Wang, Zhichao; Saporta, Gilbert

doi:10.1007/s00180-020-00963-7

Ultra-high dimensional variable screening via Gram–Schmidt orthogonalization

Original Paper
Published: 07 February 2020

Volume 35, pages 1153–1170, (2020)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Huiwen Wang^1,2,
Ruiping Liu¹,
Shanshan Wang ORCID: orcid.org/0000-0002-7205-3844^1,3,
Zhichao Wang¹ &
…
Gilbert Saporta⁴

375 Accesses
2 Citations
Explore all metrics

Abstract

Independence screening procedure plays a vital role in variable selection when the number of variables is massive. However, high dimensionality of the data may bring in many challenges, such as multicollinearity or high correlation (possibly spurious) between the covariates, which results in marginal correlation being unreliable as a measure of association between the covariates and the response. We propose a novel and simple screening procedure called Gram–Schmidt screening (GSS) by integrating the classical Gram–Schmidt orthogonalization and the sure independence screening technique, which takes into account high correlations between the covariates in a data-driven way. GSS could successfully discriminate between the relevant and the irrelevant variables to achieve a high true positive rate without including many irrelevant and redundant variables, which offers a new perspective for screening method when the covariates are highly correlated. The practical performance of GSS was shown by comparative simulation studies and analysis of two real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unified mean-variance feature screening for ultrahigh-dimensional regression

Article 17 January 2022

Robust sure independence screening for ultrahigh dimensional non-normal data

Article 15 October 2014

Model-free feature screening for ultrahigh-dimensional data conditional on some variables

Article 17 January 2017

References

Björck Å (1994) Numerics of Gram–Schmidt orthogonalization. Linear Algebra Appl 197(198):297–316
MathSciNet MATH Google Scholar
Candès E, Tao T (2007) The Dantzig selector: statistical estimation when \(p\) is much larger than \(n\). Ann Stat 35(6):2313–2351
MathSciNet MATH Google Scholar
Chen J, Chen Z (2008) Extended Bayesian information criteria for model selection with large model spaces. Biometrika 95(3):759–771
MathSciNet MATH Google Scholar
Chen S, Billings SA, Luo W (1989) Orthogonal least squares methods and their application to non-linear system identification. Int J Control 50(5):1873–1896
MATH Google Scholar
Chen S, Cowan CF, Grant PM (1991) Orthogonal least squares learning algorithm for radial basis function networks. IEEE Trans Neural Netw 2(2):302–309
Google Scholar
Cho H, Fryzlewicz P (2012) High dimensional variable selection via tilting. J R Stat Soc Ser B (Stat Methodol) 74(3):593–622
MathSciNet MATH Google Scholar
Chong I-G, Jun C-H (2005) Performance of some variable selection methods when multicollinearity is present. Chemom Intell Lab Syst 78(1–2):103–112
Google Scholar
Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 3(02):185–205
Google Scholar
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360
MathSciNet MATH Google Scholar
Fan J, Lv J (2008) Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc Ser B (Stat Methodol) 70(5):849–911
MathSciNet MATH Google Scholar
Fan J, Song R (2010) Sure independence screening in generalized linear models with NP-dimensionality. Ann Stat 38(6):3567–3604
MathSciNet MATH Google Scholar
Fan J, Samworth R, Wu Y (2009) Ultrahigh dimensional feature selection: beyond the linear model. J Mach Learn Res 10(9):2013–2038
MathSciNet MATH Google Scholar
Fisher R (1921) On the probable error of a coefficient of correlation deduced from a small sample. Metron 1(4):3–32
Google Scholar
Huang J, Ma S, Zhang CH (2006) Adaptive LASSO for sparse high-dimensional regression. Stat Sin 18(4):1603–1618
MathSciNet MATH Google Scholar
Ing C-K, Lai TL (2011) A stepwise regression method and consistent model selection for high-dimensional sparse linear models. Stat Sin 21(4):1473–1513
MathSciNet MATH Google Scholar
Korenberg M, Billings S, Liu Y, Mcilroy P (1988) Orthogonal parameter estimation algorithm for non-linear stochastic systems. Int J Control 48(1):193–210
MATH Google Scholar
Leon SJ, Björck Å, Gander W (2013) Gram–Schmidt orthogonalization: 100 years and more. Numer Linear Algebra Appl 20(3):492–532
MathSciNet MATH Google Scholar
Li G, Peng H, Zhang J, Zhu L (2012a) Robust rank correlation based screening. Ann Stat 40(3):1846–1877
MathSciNet MATH Google Scholar
Li R, Zhong W, Zhu L (2012b) Feature screening via distance correlation learning. J Am Stat Assoc 107(499):1129–1139
MathSciNet MATH Google Scholar
Mangold WD, Bean L, Adams D (2003) The impact of intercollegiate athletics on graduation rates among major NCAA division I universities: implications for college persistence theory and practice. J High Educ 74(5):540–562
Google Scholar
Oussar Y, Dreyfus G (2000) Initialization by selection for wavelet network training. Neurocomputing 34(1):131–143
MATH Google Scholar
Scheetz TE, Stone EM (2006) Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proc Natl Acad Sci USA 103(39):14429–14434
Google Scholar
Segal MR, Dahlquist KD, Conklin BR (2003) Regression approaches for microarray data analysis. J Comput Biol 10(6):961–980
Google Scholar
Stoppiglia H, Dreyfus G, Dubois R, Oussar Y (2003) Ranking a random feature for variable and feature selection. J Mach Learn Res 3(3):1399–1414
MATH Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the LASSO. J R Stat Soc Ser B (Methodol) 58(1):267–288
MathSciNet MATH Google Scholar
Wang H (2009) Forward regression for ultra-high dimensional variable screening. J Am Stat Assoc 104(488):1512–1524
MathSciNet MATH Google Scholar
Wang S, Xiang L (2017) Two-layer EM algorithm for ALD mixture regression models: a new solution to composite quantile regression. Comput Stat Data Anal 115(11):136–154
MathSciNet MATH Google Scholar
Wang H, Li B, Leng C (2009) Shrinkage tuning parameter selection with a diverging number of parameters. J R Stat Soc Ser B (Stat Methodol) 71(3):671–683
MathSciNet MATH Google Scholar
Zhao Y-P, Li Z-Q, Xi P-P, Liang D, Sun L, Chen T-H (2017) Gram–Schmidt process based incremental extreme learning machine. Neurocomputing 241(7):1–17
Google Scholar
Zhu L-P, Li L, Li R, Zhu L-X (2011) Model-free feature screening for ultrahigh-dimensional data. J Am Stat Assoc 106(496):1464–1475
MathSciNet MATH Google Scholar
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol) 67(2):301–320
MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors want to extend grateful thanks to the Editors and the reviewers whose comments have greatly improved the scope and presentation of the paper, and to Prof. Yuhong Yang and Yingying Ma for their valuable suggestions. This work was supported by the National Science Foundation of China (Grant Nos. 71420107025, 11701023).

Author information

Authors and Affiliations

School of Economics and Management, Beihang University, Beijing, China
Huiwen Wang, Ruiping Liu, Shanshan Wang & Zhichao Wang
Beijing Advanced Innovation Center for Big Data and Brain Computing, Beijing, China
Huiwen Wang
Beijing Key Laboratory of Emergence Support Simulation Technologies for City Operations, Beijing, China
Shanshan Wang
Cedric, Conservatoire National des Arts et Métiers, Paris, France
Gilbert Saporta

Authors

Huiwen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ruiping Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shanshan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhichao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Gilbert Saporta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shanshan Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, H., Liu, R., Wang, S. et al. Ultra-high dimensional variable screening via Gram–Schmidt orthogonalization. Comput Stat 35, 1153–1170 (2020). https://doi.org/10.1007/s00180-020-00963-7

Download citation

Received: 15 August 2018
Accepted: 28 January 2020
Published: 07 February 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s00180-020-00963-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Ultra-high dimensional variable screening via Gram–Schmidt orthogonalization

Abstract

Access this article

Similar content being viewed by others

Unified mean-variance feature screening for ultrahigh-dimensional regression

Robust sure independence screening for ultrahigh dimensional non-normal data

Model-free feature screening for ultrahigh-dimensional data conditional on some variables

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (txt 13366 KB)

Supplementary material 2 (txt 0 KB)

Supplementary material 3 (txt 1 KB)

Supplementary material 4 (r 3 KB)

Supplementary material 5 (txt 11 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Ultra-high dimensional variable screening via Gram–Schmidt orthogonalization

Abstract

Access this article

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation