ABSTRACT
The data collected in current practical applications in various fields is gradually developing towards the direction of ultra-high-dimensional and large-scale, and a considerable portion of traditional analysis methods significantly reduce the processing efficiency of high-dimensional data. Therefore, it is essential to establish methods that focus on processing high dimensional data. In this paper, the Elastic-net model is selected as the basic regularization model for processing high-dimensional sparse data, and a penalty factor is added to enhance its ability to retain key features. To reduce the computational burden brought by high-dimensional data, we propose applying the "two-step" procedure of SSR+PCD screening rule and fitting method to the model containing penalty factors. In terms of the selection of tuning parameters, the traditional Cross-validation is replaced by Information Criterion, and the application of Information Criterion is extended to the regularization model with screening rules, so as to broaden the application range of Information Criterion. Through data simulation studies, we confirm the rationality of penalty factor addition and the ability of the selected Information Criterion to choose tuning parameters under this model, and an example is given to illustrate its application in processing high-dimensional gene expression data.
- Tibshirani R. Regression shrinkage and selection via the lasso[J]. Journal of the Royal Statistical Society, Series B, 1996, 58(1).Google ScholarCross Ref
- Hui Z, Hastie T. Regularization and variable selection via the elastic net[J]. Journal of the Royal Statistical Society, 2005, 67(5).Google Scholar
- Zou, Hui. The Adaptive Lasso and Its Oracle Properties[J]. Publications of the American Statistical Association, 2006, 101(476).Google ScholarCross Ref
- Zou H, Zhang H H. On the adaptive elastic-net with a diverging number of parameters[J]. Annals of Statistics, 2009, 37(4).Google ScholarCross Ref
- Ghaoui L E, Viallon V, Rabbani T. Safe Feature Elimination in Sparse Supervised Learning[J]. Pacific Journal of Optimization, 2010, 8(4).Google Scholar
- Tibshirani R, Bien J, Friedman J,et al. Strong rules for discarding predictors in lasso-type problems[J]. Journal of the Royal Statistical Society, 2012, 74(2).Google Scholar
- Jie W, Wonka P, Ye J. Lasso Screening Rules via Dual Polytope Projection[J]. Journal of Machine Learning Research, 2015, 16(1).Google Scholar
- Zeng Y, Yang T, Breheny P. Hybrid safe-strong rules for efficient optimization in lasso-type problems[J]. Computational Statistics and Data Analysis, 2021, 153(1).Google ScholarCross Ref
- Kim Y, Kwon S, Choi H. Consistent Model Selection Criteria on High Dimensions[M]. JMLR.org, 2012.Google Scholar
- Wei S, Wang J, Fang Y. Consistent selection of tuning parameters via variable selection stability[J]. The Journal of Machine Learning Research, 2013.Google Scholar
- H. Wang, R. Z. Li and C. Tsia, Tuning parameter selectors for the smoothly clipped absolute deviation method [J], Biometrika, 2007, 94(3).Google ScholarCross Ref
- Wang H, Leng L C. Shrinkage tuning parameter selection with a diverging number of parameters[J]. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2009, 71(3).Google ScholarCross Ref
- Chen J. Extended Bayesian information criteria for model selection with large model spaces[J]. Biometrika, 2008, 95(3).Google ScholarCross Ref
- T. Wang and L. Zhu, Consistent tuning parameter selection in high dimensional sparse linear regression [J], Journal of Multivariate Analysis, 2011, 102.Google Scholar
- Li Y, Wu Y, Jin B. Consistent tuning parameter selection in high-dimensional group-penalized regression[J]. Science China Mathematics, 2019, 62(4).Google ScholarCross Ref
Index Terms
- Selection of regularization model for linear regression under high-dimensional data
Recommendations
Sparse regularization based feature selection: A survey
AbstractFeature selection, as an essential preprocessing tool, aims to identify a subset of crucial features by eliminating redundant and noisy features according to a predefined criterion. In recent years, sparse learning has received considerable ...
Efficient model selection for regularized linear discriminant analysis
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge managementClassical Linear Discriminant Analysis (LDA) is not applicable for small sample size problems due to the singularity of the scatter matrices involved. Regularized LDA (RLDA) provides a simple strategy to overcome the singularity problem by applying a ...
A Variance Minimization Criterion to Feature Selection Using Laplacian Regularization
In many information processing tasks, one is often confronted with very high-dimensional data. Feature selection techniques are designed to find the meaningful feature subset of the original features which can facilitate clustering, classification, and ...
Comments