ABSTRACT
The $L_1 $ regularization (Lasso) has proven to be a versatile tool to select relevant features and estimate the model coefficients simultaneously and has been widely used in many research areas such as genomes studies, finance, and biomedical imaging. Despite its popularity, it is very challenging to guarantee the feature selection consistency of Lasso especially when the dimension of the data is huge. One way to improve the feature selection consistency is to select an ideal tuning parameter. Traditional tuning criteria mainly focus on minimizing the estimated prediction error or maximizing the posterior model probability, such as cross-validation and BIC, which may either be time-consuming or fail to control the false discovery rate (FDR) when the number of features is extremely large. The other way is to introduce pseudo-features to learn the importance of the original ones. Recently, the Knockoff filter is proposed to control the FDR when performing feature selection. However, its performance is sensitive to the choice of the expected FDR threshold. Motivated by these ideas, we propose a new method using pseudo-features to obtain an ideal tuning parameter. In particular, we present the E fficient T uning of Lasso (ET-Lasso ) to separate active and inactive features by adding permuted features as pseudo-features in linear models. The pseudo-features are constructed to be inactive by nature, which can be used to obtain a cutoff to select the tuning parameter that separates active and inactive features. Experimental studies on both simulations and real-world data applications are provided to show that ET-Lasso can effectively and efficiently select active features under a wide range of scenarios.
- Hirotugu Akaike. 1974. A new look at the statistical model identification. IEEE transactions on automatic control , Vol. 19, 6 (1974), 716--723.Google ScholarCross Ref
- Francis R Bach. 2008. Bolasso: model consistent lasso estimation through the bootstrap. In Proceedings of the 25th international conference on Machine learning. ACM, 33--40. Google ScholarDigital Library
- Rina Foygel Barber, Emmanuel J Candès, et almbox. 2015. Controlling the false discovery rate via knockoffs. The Annals of Statistics , Vol. 43, 5 (2015), 2055--2085.Google ScholarCross Ref
- Amir Beck and Marc Teboulle. 2009. Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Transactions on Image Processing , Vol. 18, 11 (2009), 2419--2434. Google ScholarDigital Library
- Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, Jonathan Eckstein, et almbox. 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning , Vol. 3, 1 (2011), 1--122. Google ScholarDigital Library
- Emmanuel Candes, Yingying Fan, Lucas Janson, and Jinchi Lv. 2018. Panning for gold:model-Xknockoffs for high dimensional controlled variable selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , Vol. 80, 3 (2018), 551--577.Google ScholarCross Ref
- David L Donoho. 2000. High-dimensional data analysis: The curses and blessings of dimensionality. AMS Math Challenges Lecture , Vol. 1 (2000), 32.Google Scholar
- Jianqing Fan and Runze Li. 2006. Statistical challenges with high dimensionality: Feature selection in knowledge discovery. Proceedings of the International Congress of Mathematicians , Vol. 3 (2006), 595--622.Google Scholar
- Jianqing Fan and Jinchi Lv. 2008. Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , Vol. 70, 5 (2008), 849--911.Google ScholarCross Ref
- Jianqing Fan and Jinchi Lv. 2011. Nonconcave penalized likelihood with NP-dimensionality. IEEE Transactions on Information Theory , Vol. 57, 8 (2011), 5467--5484. Google ScholarDigital Library
- Jerome Friedman, Trevor Hastie, Holger Höfling, and Robert Tibshirani. 2007. Pathwise coordinate optimization. The Annals of Applied Statistics , Vol. 1, 2 (2007), 302--332.Google ScholarCross Ref
- Jean-Jacques Fuchs. 2005. Recovery of exact sparse representations in the presence of bounded noise. IEEE Transactions on Information Theory , Vol. 51, 10 (2005), 3601--3608. Google ScholarDigital Library
- Karan Gadiya. 2019. FIFA 19 complete player dataset. Data scraped from https://sofifa.com/. Hosted at https://www.kaggle.com/karangadiya/fifa19.Google Scholar
- Miron B Kursa, Witold R Rudnicki, et almbox. 2010. Feature selection with the Boruta package. J Stat Softw , Vol. 36, 11 (2010), 1--13.Google ScholarCross Ref
- Chinghway Lim and Bin Yu. 2016. Estimation stability with cross-validation (ESCV). Journal of Computational and Graphical Statistics , Vol. 25, 2 (2016), 464--492.Google ScholarCross Ref
- Xiaohui Luo, Leonard A Stefanski, and Dennis D Boos. 2006. Tuning variable selection procedures by adding noise. Technometrics , Vol. 48, 2 (2006), 165--175.Google ScholarCross Ref
- Nicolai Meinshausen, Bin Yu, et almbox. 2009. Lasso-type recovery of sparse representations for high-dimensional data. The Annals of Statistics , Vol. 37, 1 (2009), 246--270.Google ScholarCross Ref
- Yu Nesterov. 2013. Gradient methods for minimizing composite functions. Mathematical Programming , Vol. 140, 1 (2013), 125--161.Google ScholarCross Ref
- Thanh-Tung Nguyen, Joshua Zhexue Huang, and Thuy Thi Nguyen. 2015. Unbiased feature selection in learning random forests for high-dimensional data. The Scientific World Journal , Vol. 2015 (2015).Google Scholar
- Galen Reeves and Michael C Gastpar. 2013. Approximate sparsity pattern recovery: Information-theoretic lower bounds. IEEE Transactions on Information Theory , Vol. 59, 6 (2013), 3451--3465. Google ScholarDigital Library
- Witold R Rudnicki, Mariusz Wrzesie'n, and Wiesław Paja. 2015. All relevant feature selection methods and applications. In Feature Selection for Data and Pattern Recognition. Springer, 11--28.Google Scholar
- Marco Sandri and Paola Zuccolotto. 2008. A bias correction algorithm for the Gini variable importance measure in classification trees. Journal of Computational and Graphical Statistics , Vol. 17, 3 (2008), 611--628.Google ScholarCross Ref
- Gideon Schwarz et almbox. 1978. Estimating the dimension of a model. The annals of statistics , Vol. 6, 2 (1978), 461--464.Google Scholar
- Shai Shalev-Shwartz and Ambuj Tewari. 2011. Stochastic methods for $ell_1$-regularized loss minimization. Journal of Machine Learning Research , Vol. 12, Jun (2011), 1865--1892. Google ScholarDigital Library
- Mervyn Stone. 1974. Cross-validation and multinomial prediction. Biometrika , Vol. 61, 3 (1974), 509--515.Google ScholarCross Ref
- Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) , Vol. 58, 1 (1996), 267--288.Google ScholarCross Ref
- Joel A Tropp. 2006. Just relax: Convex programming methods for identifying sparse signals in noise. IEEE transactions on information theory , Vol. 52, 3 (2006), 1030--1051. Google ScholarDigital Library
- Martin J Wainwright. 2009. Sharp thresholds for High-Dimensional and noisy sparsity recovery using $ell_1$-Constrained Quadratic Programming (Lasso). IEEE transactions on information theory , Vol. 55, 5 (2009), 2183--2202. Google ScholarDigital Library
- Hansheng Wang. 2009. Forward regression for ultra-high dimensional variable screening. J. Amer. Statist. Assoc. , Vol. 104, 488 (2009), 1512--1524.Google ScholarCross Ref
- Hansheng Wang, Bo Li, and Chenlei Leng. 2009. Shrinkage tuning parameter selection with a diverging number of parameters. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , Vol. 71, 3 (2009), 671--683.Google ScholarCross Ref
- Hansheng Wang, Runze Li, and Chih-Ling Tsai. 2007. Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika , Vol. 94, 3 (2007), 553--568.Google ScholarCross Ref
- Yujun Wu, Dennis D Boos, and Leonard A Stefanski. 2007. Controlling variable selection by the addition of pseudovariables. J. Amer. Statist. Assoc. , Vol. 102, 477 (2007), 235--243.Google ScholarCross Ref
- Yi Yu and Yang Feng. 2014. Modified cross-validation for penalized high-dimensional linear regression models. Journal of Computational and Graphical Statistics , Vol. 23, 4 (2014), 1009--1027.Google ScholarCross Ref
- Shuheng Zhou. 2009. Thresholding procedures for high dimensional variable selection and statistical estimation. In Advances in Neural Information Processing Systems. 2304--2312. Google ScholarDigital Library
Index Terms
- ET-Lasso: A New Efficient Tuning of Lasso-type Regularization for High-Dimensional Data
Recommendations
High-order covariate interacted Lasso for feature selection
High-order covariates interaction is considered into Lasso-type variable selection.We evaluate the significance of feature by considering their neighborhood dependency.Having too few features in not necessarily a good feature selection result.Some ...
Sparse regularization based feature selection: A survey
AbstractFeature selection, as an essential preprocessing tool, aims to identify a subset of crucial features by eliminating redundant and noisy features according to a predefined criterion. In recent years, sparse learning has received considerable ...
A multi-stage framework for Dantzig selector and LASSO
We consider the following sparse signal recovery (or feature selection) problem: given a design matrix X ∈ Rn×m (m ≫ n) and a noisy observation vector y ∈ Rn satisfying y = Xβ* + ε where ε is the noise vector following a Gaussian distribution N(0,σ2I), ...
Comments