Skip to main content
Log in

Variable selection using axis-aligned random projections for partial least-squares regression

  • Original Paper
  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

In high-dimensional data modeling, variable selection plays a crucial role in improving predictive accuracy and enhancing model interpretability through sparse representation. Unfortunately, certain variable selection methods encounter challenges such as insufficient model sparsity, high computational overhead, and difficulties in handling large-scale data. Recently, axis-aligned random projection techniques have been applied to address these issues by selecting variables. However, these techniques have seen limited application in handling complex data within the regression framework. In this study, we propose a novel method, sparse partial least squares via axis-aligned random projection, designed for the analysis of high-dimensional data. Initially, axis-aligned random projection is utilized to obtain a sparse loading vector, significantly reducing computational complexity. Subsequently, partial least squares regression is conducted within the subspace of the top-ranked significant variables. The submatrices are iteratively updated until an optimal sparse partial least squares model is achieved. Comparative analysis with some state-of-the-art high-dimensional regression methods demonstrates that the proposed method exhibits superior predictive performance. To illustrate its effectiveness, we apply the method to four cases, including one simulated dataset and three real-world datasets. The results show the proposed method’s ability to identify important variables in all four cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1

Similar content being viewed by others

References

  • Ahn, S.C., Bae, J.: Forecasting with Partial Least Squares When a Large Number of Predictors are Available. Available at SSRN 4248450 (2022)

  • Anderlucci, L., Fortunato, F., Montanari, A.: High-dimensional clustering via Random Projections. J. Classif. 39, 1–26 (2022)

  • Brown, P.J., Fearn, T., Vannucci, M.: Bayesian wavelet regression on curves with application to a spectroscopic calibration problem. J. Am. Stat. Assoc. 96(454), 398–408 (2001)

    Article  MathSciNet  Google Scholar 

  • Cannings, T.I., Samworth, R.J.: Random-projection ensemble classification. J. R. Stat. Soc. Ser. B Stat. Methodol. 79(4), 959–1035 (2017)

    Article  MathSciNet  Google Scholar 

  • Chun, H., Keleş, S.: Sparse partial least squares regression for simultaneous dimension reduction and variable selection. J. R. Stat. Soc. Ser. B Stat. Methodol. 72(1), 3–25 (2010)

    Article  MathSciNet  Google Scholar 

  • Cook, R.D., Forzani, L.: Partial least squares prediction in high-dimensional regression. Ann. Stat. 47(2), 884–908 (2019)

    Article  MathSciNet  Google Scholar 

  • De Jong, S.: SIMPLS: an alternative approach to partial least squares regression. Chemom. Intell. Lab. Syst. 18(3), 251–263 (1993)

    Article  Google Scholar 

  • Ding, Y., et al.: Variable selection and regularization via arbitrary rectangle-range generalized elastic net. Stat. Comput. 33(3), 72 (2023)

    Article  MathSciNet  Google Scholar 

  • Gataric, M., Wang, T., Samworth, R.J.: Sparse principal component analysis via axis-aligned random projections. J. R. Stat. Soc. Ser. B Stat. Methodol. 82(2), 329–359 (2020)

    Article  MathSciNet  Google Scholar 

  • Heinze, C., McWilliams, B., Meinshausen, N.: Dual-loco: distributing statistical estimation using random projections. In: Artificial Intelligence and Statistics, PMLR (2016)

  • Huang, X., et al.: Modeling the relationship between LVAD support time and gene expression changes in the human heart by penalized partial least squares. Bioinformatics 20(6), 888–894 (2004)

    Article  Google Scholar 

  • Johnson, W.B., Lindenstrauss, J., Schechtman, G.: Extensions of Lipschitz maps into Banach spaces. Isr. J. Math. 54(2), 129–138 (1986)

    Article  MathSciNet  Google Scholar 

  • Lê Cao, K.-A., et al.: A sparse PLS for variable selection when integrating omics data. Stat. Appl. Genet. Mol. Biol. 7(1) 35 (2008)

  • Lê Cao, K.-A., et al.: Sparse canonical methods for biological data integration: application to a cross-platform study. BMC Bioinform. 10, 1–17 (2009)

  • Lee, Y., et al.: Variable selection using conditional AIC for linear mixed models with data-driven transformations. Stat. Comput. 33(1), 27 (2023)

    Article  MathSciNet  Google Scholar 

  • Li, W., et al.: A PLS-based pruning algorithm for simplified long-short term memory neural network in time series prediction. Knowl. Based Syst. 254, 109608 (2022)

    Article  Google Scholar 

  • Lin, Y.W., et al.: Fisher optimal subspace shrinkage for block variable selection with applications to NIR spectroscopic analysis. Chemom. Intell. Lab. Syst. 159, 196–204 (2016)

    Article  Google Scholar 

  • Lin, Y.W., et al.: Ordered homogeneity pursuit lasso for group variable selection with applications to spectroscopic data. Chemom. Intell. Lab. Syst. 168, 62–71 (2017)

    Article  Google Scholar 

  • Mahoney, M.W.: Randomized algorithms for matrices and data. Found. Trends® Mach. Learn. 3(2), 123–224 (2011)

    Google Scholar 

  • Maillard, O., Munos, R.: Compressed least-squares regression. Adv. Neural Inf. Process. Syst. 22, 1213–1221 (2009)

  • McWilliams, B., et al.: LOCO: distributing ridge regression with random projections. Stat. 1050, 26–50 (2014)

  • Mukhopadhyay, M., Dunson, D.B.: Targeted random projection for prediction from high-dimensional features. J. Am. Stat. Assoc. 115(532), 1998–2010 (2020)

    Article  MathSciNet  Google Scholar 

  • O’Neill, M., Burke, K.: Variable selection using a smooth information criterion for distributional regression models. Stat. Comput. 33(3), 71 (2023)

    Article  MathSciNet  Google Scholar 

  • Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 58(1), 267–288 (1996)

    MathSciNet  Google Scholar 

  • Vempala, S.S.: The random projection method, vol. 65. American Mathematical Society (2005)

  • Wang, T., et al.: Sharp-SSL: Selective High-Dimensional Axis-Aligned Random Projections for Semi-supervised Learning. arXiv preprint arXiv:2304.09154 (2023)

  • Wold, H.: Estimation of principal components and related models by iterative least squares. In P. R. Krishnajah (Ed.), Multivariate analysis, New York: Academic Press, pp. 391–420 (1966)

  • Woodruff, D.P.: Sketching as a tool for numerical linear algebra. Found. Trends® Theor. Comput. Sci. 10.1—-10.2, 1–157 (2014)

    Article  MathSciNet  Google Scholar 

  • Xie, Z., Chen, X.: Subsampling for partial least-squares regression via an influence function. Knowl. Based Syst. 245, 108661 (2022)

    Article  Google Scholar 

  • Yang, F., et al.: How to reduce dimension with PCA and random projections? IEEE Trans. Inf. Theory 67(12), 8154–8189 (2021)

    Article  MathSciNet  Google Scholar 

  • Yun, Y.-H., et al.: An efficient method of wavelength interval selection based on random frog for multivariate spectral calibration. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 111, 31–36 (2013)

    Article  Google Scholar 

  • Zhang, J., Wu, R., Chen, X.: Sparse Sliced Inverse Regression via Random Projection. arXiv preprint arXiv:2305.05141 (2023)

  • Zhu, G., Zhihua, S.: Envelope-based sparse partial least squares. Ann. Stat. 48(1), 161–182 (2020)

Download references

Acknowledgements

This work is financially supported by the National Natural Science Foundation Committee of PR China (Grants Nos. 11801105, 1236010386, 71901074), the Guangxi Science and Technology Project (Grant No. Guike AD19245106), the Guangdong Basic and Applied Basic Research Foundation (Grant No. 2022A1515110315), and the Fundamental Research Grant Scheme of Malaysia (Grant No. FRGS/1/2021/STG06/SYUC/03/1).

Author information

Authors and Affiliations

Authors

Contributions

Youwu Lin: Conceptualization, methodology. Xin Zeng: Software, writing—original draft. Pei Wang: Supervision, writing— reviewing and editing. Shuai Huang: Investigation, writing— original draft. Kok Lay Teo: Supervision, writing—reviewing and editing.

Corresponding author

Correspondence to Pei Wang.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interests regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 392 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, Y., Zeng, X., Wang, P. et al. Variable selection using axis-aligned random projections for partial least-squares regression. Stat Comput 34, 105 (2024). https://doi.org/10.1007/s11222-024-10417-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11222-024-10417-5

Keywords

Navigation