skip to main content
10.1145/3613330.3613342acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicdltConference Proceedingsconference-collections
research-article

Selection of regularization model for linear regression under high-dimensional data

Published: 28 September 2023 Publication History

Abstract

The data collected in current practical applications in various fields is gradually developing towards the direction of ultra-high-dimensional and large-scale, and a considerable portion of traditional analysis methods significantly reduce the processing efficiency of high-dimensional data. Therefore, it is essential to establish methods that focus on processing high dimensional data. In this paper, the Elastic-net model is selected as the basic regularization model for processing high-dimensional sparse data, and a penalty factor is added to enhance its ability to retain key features. To reduce the computational burden brought by high-dimensional data, we propose applying the "two-step" procedure of SSR+PCD screening rule and fitting method to the model containing penalty factors. In terms of the selection of tuning parameters, the traditional Cross-validation is replaced by Information Criterion, and the application of Information Criterion is extended to the regularization model with screening rules, so as to broaden the application range of Information Criterion. Through data simulation studies, we confirm the rationality of penalty factor addition and the ability of the selected Information Criterion to choose tuning parameters under this model, and an example is given to illustrate its application in processing high-dimensional gene expression data.

References

[1]
Tibshirani R. Regression shrinkage and selection via the lasso[J]. Journal of the Royal Statistical Society, Series B, 1996, 58(1).
[2]
Hui Z, Hastie T. Regularization and variable selection via the elastic net[J]. Journal of the Royal Statistical Society, 2005, 67(5).
[3]
Zou, Hui. The Adaptive Lasso and Its Oracle Properties[J]. Publications of the American Statistical Association, 2006, 101(476).
[4]
Zou H, Zhang H H. On the adaptive elastic-net with a diverging number of parameters[J]. Annals of Statistics, 2009, 37(4).
[5]
Ghaoui L E, Viallon V, Rabbani T. Safe Feature Elimination in Sparse Supervised Learning[J]. Pacific Journal of Optimization, 2010, 8(4).
[6]
Tibshirani R, Bien J, Friedman J,et al. Strong rules for discarding predictors in lasso-type problems[J]. Journal of the Royal Statistical Society, 2012, 74(2).
[7]
Jie W, Wonka P, Ye J. Lasso Screening Rules via Dual Polytope Projection[J]. Journal of Machine Learning Research, 2015, 16(1).
[8]
Zeng Y, Yang T, Breheny P. Hybrid safe-strong rules for efficient optimization in lasso-type problems[J]. Computational Statistics and Data Analysis, 2021, 153(1).
[9]
Kim Y, Kwon S, Choi H. Consistent Model Selection Criteria on High Dimensions[M]. JMLR.org, 2012.
[10]
Wei S, Wang J, Fang Y. Consistent selection of tuning parameters via variable selection stability[J]. The Journal of Machine Learning Research, 2013.
[11]
H. Wang, R. Z. Li and C. Tsia, Tuning parameter selectors for the smoothly clipped absolute deviation method [J], Biometrika, 2007, 94(3).
[12]
Wang H, Leng L C. Shrinkage tuning parameter selection with a diverging number of parameters[J]. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2009, 71(3).
[13]
Chen J. Extended Bayesian information criteria for model selection with large model spaces[J]. Biometrika, 2008, 95(3).
[14]
T. Wang and L. Zhu, Consistent tuning parameter selection in high dimensional sparse linear regression [J], Journal of Multivariate Analysis, 2011, 102.
[15]
Li Y, Wu Y, Jin B. Consistent tuning parameter selection in high-dimensional group-penalized regression[J]. Science China Mathematics, 2019, 62(4).

Index Terms

  1. Selection of regularization model for linear regression under high-dimensional data

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICDLT '23: Proceedings of the 2023 7th International Conference on Deep Learning Technologies
    July 2023
    115 pages
    ISBN:9798400707520
    DOI:10.1145/3613330
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 September 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Elastic-net model
    2. Feature Selection
    3. High-dimensional data
    4. Information Criterion
    5. Regularization

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICDLT 2023

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 29
      Total Downloads
    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media