skip to main content
10.1145/3292500.3330910acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

ET-Lasso: A New Efficient Tuning of Lasso-type Regularization for High-Dimensional Data

Authors Info & Claims
Published:25 July 2019Publication History

ABSTRACT

The $L_1 $ regularization (Lasso) has proven to be a versatile tool to select relevant features and estimate the model coefficients simultaneously and has been widely used in many research areas such as genomes studies, finance, and biomedical imaging. Despite its popularity, it is very challenging to guarantee the feature selection consistency of Lasso especially when the dimension of the data is huge. One way to improve the feature selection consistency is to select an ideal tuning parameter. Traditional tuning criteria mainly focus on minimizing the estimated prediction error or maximizing the posterior model probability, such as cross-validation and BIC, which may either be time-consuming or fail to control the false discovery rate (FDR) when the number of features is extremely large. The other way is to introduce pseudo-features to learn the importance of the original ones. Recently, the Knockoff filter is proposed to control the FDR when performing feature selection. However, its performance is sensitive to the choice of the expected FDR threshold. Motivated by these ideas, we propose a new method using pseudo-features to obtain an ideal tuning parameter. In particular, we present the E fficient T uning of Lasso (ET-Lasso ) to separate active and inactive features by adding permuted features as pseudo-features in linear models. The pseudo-features are constructed to be inactive by nature, which can be used to obtain a cutoff to select the tuning parameter that separates active and inactive features. Experimental studies on both simulations and real-world data applications are provided to show that ET-Lasso can effectively and efficiently select active features under a wide range of scenarios.

References

  1. Hirotugu Akaike. 1974. A new look at the statistical model identification. IEEE transactions on automatic control , Vol. 19, 6 (1974), 716--723.Google ScholarGoogle ScholarCross RefCross Ref
  2. Francis R Bach. 2008. Bolasso: model consistent lasso estimation through the bootstrap. In Proceedings of the 25th international conference on Machine learning. ACM, 33--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Rina Foygel Barber, Emmanuel J Candès, et almbox. 2015. Controlling the false discovery rate via knockoffs. The Annals of Statistics , Vol. 43, 5 (2015), 2055--2085.Google ScholarGoogle ScholarCross RefCross Ref
  4. Amir Beck and Marc Teboulle. 2009. Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Transactions on Image Processing , Vol. 18, 11 (2009), 2419--2434. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, Jonathan Eckstein, et almbox. 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning , Vol. 3, 1 (2011), 1--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Emmanuel Candes, Yingying Fan, Lucas Janson, and Jinchi Lv. 2018. Panning for gold:model-Xknockoffs for high dimensional controlled variable selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , Vol. 80, 3 (2018), 551--577.Google ScholarGoogle ScholarCross RefCross Ref
  7. David L Donoho. 2000. High-dimensional data analysis: The curses and blessings of dimensionality. AMS Math Challenges Lecture , Vol. 1 (2000), 32.Google ScholarGoogle Scholar
  8. Jianqing Fan and Runze Li. 2006. Statistical challenges with high dimensionality: Feature selection in knowledge discovery. Proceedings of the International Congress of Mathematicians , Vol. 3 (2006), 595--622.Google ScholarGoogle Scholar
  9. Jianqing Fan and Jinchi Lv. 2008. Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , Vol. 70, 5 (2008), 849--911.Google ScholarGoogle ScholarCross RefCross Ref
  10. Jianqing Fan and Jinchi Lv. 2011. Nonconcave penalized likelihood with NP-dimensionality. IEEE Transactions on Information Theory , Vol. 57, 8 (2011), 5467--5484. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jerome Friedman, Trevor Hastie, Holger Höfling, and Robert Tibshirani. 2007. Pathwise coordinate optimization. The Annals of Applied Statistics , Vol. 1, 2 (2007), 302--332.Google ScholarGoogle ScholarCross RefCross Ref
  12. Jean-Jacques Fuchs. 2005. Recovery of exact sparse representations in the presence of bounded noise. IEEE Transactions on Information Theory , Vol. 51, 10 (2005), 3601--3608. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Karan Gadiya. 2019. FIFA 19 complete player dataset. Data scraped from https://sofifa.com/. Hosted at https://www.kaggle.com/karangadiya/fifa19.Google ScholarGoogle Scholar
  14. Miron B Kursa, Witold R Rudnicki, et almbox. 2010. Feature selection with the Boruta package. J Stat Softw , Vol. 36, 11 (2010), 1--13.Google ScholarGoogle ScholarCross RefCross Ref
  15. Chinghway Lim and Bin Yu. 2016. Estimation stability with cross-validation (ESCV). Journal of Computational and Graphical Statistics , Vol. 25, 2 (2016), 464--492.Google ScholarGoogle ScholarCross RefCross Ref
  16. Xiaohui Luo, Leonard A Stefanski, and Dennis D Boos. 2006. Tuning variable selection procedures by adding noise. Technometrics , Vol. 48, 2 (2006), 165--175.Google ScholarGoogle ScholarCross RefCross Ref
  17. Nicolai Meinshausen, Bin Yu, et almbox. 2009. Lasso-type recovery of sparse representations for high-dimensional data. The Annals of Statistics , Vol. 37, 1 (2009), 246--270.Google ScholarGoogle ScholarCross RefCross Ref
  18. Yu Nesterov. 2013. Gradient methods for minimizing composite functions. Mathematical Programming , Vol. 140, 1 (2013), 125--161.Google ScholarGoogle ScholarCross RefCross Ref
  19. Thanh-Tung Nguyen, Joshua Zhexue Huang, and Thuy Thi Nguyen. 2015. Unbiased feature selection in learning random forests for high-dimensional data. The Scientific World Journal , Vol. 2015 (2015).Google ScholarGoogle Scholar
  20. Galen Reeves and Michael C Gastpar. 2013. Approximate sparsity pattern recovery: Information-theoretic lower bounds. IEEE Transactions on Information Theory , Vol. 59, 6 (2013), 3451--3465. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Witold R Rudnicki, Mariusz Wrzesie'n, and Wiesław Paja. 2015. All relevant feature selection methods and applications. In Feature Selection for Data and Pattern Recognition. Springer, 11--28.Google ScholarGoogle Scholar
  22. Marco Sandri and Paola Zuccolotto. 2008. A bias correction algorithm for the Gini variable importance measure in classification trees. Journal of Computational and Graphical Statistics , Vol. 17, 3 (2008), 611--628.Google ScholarGoogle ScholarCross RefCross Ref
  23. Gideon Schwarz et almbox. 1978. Estimating the dimension of a model. The annals of statistics , Vol. 6, 2 (1978), 461--464.Google ScholarGoogle Scholar
  24. Shai Shalev-Shwartz and Ambuj Tewari. 2011. Stochastic methods for $ell_1$-regularized loss minimization. Journal of Machine Learning Research , Vol. 12, Jun (2011), 1865--1892. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Mervyn Stone. 1974. Cross-validation and multinomial prediction. Biometrika , Vol. 61, 3 (1974), 509--515.Google ScholarGoogle ScholarCross RefCross Ref
  26. Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) , Vol. 58, 1 (1996), 267--288.Google ScholarGoogle ScholarCross RefCross Ref
  27. Joel A Tropp. 2006. Just relax: Convex programming methods for identifying sparse signals in noise. IEEE transactions on information theory , Vol. 52, 3 (2006), 1030--1051. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Martin J Wainwright. 2009. Sharp thresholds for High-Dimensional and noisy sparsity recovery using $ell_1$-Constrained Quadratic Programming (Lasso). IEEE transactions on information theory , Vol. 55, 5 (2009), 2183--2202. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Hansheng Wang. 2009. Forward regression for ultra-high dimensional variable screening. J. Amer. Statist. Assoc. , Vol. 104, 488 (2009), 1512--1524.Google ScholarGoogle ScholarCross RefCross Ref
  30. Hansheng Wang, Bo Li, and Chenlei Leng. 2009. Shrinkage tuning parameter selection with a diverging number of parameters. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , Vol. 71, 3 (2009), 671--683.Google ScholarGoogle ScholarCross RefCross Ref
  31. Hansheng Wang, Runze Li, and Chih-Ling Tsai. 2007. Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika , Vol. 94, 3 (2007), 553--568.Google ScholarGoogle ScholarCross RefCross Ref
  32. Yujun Wu, Dennis D Boos, and Leonard A Stefanski. 2007. Controlling variable selection by the addition of pseudovariables. J. Amer. Statist. Assoc. , Vol. 102, 477 (2007), 235--243.Google ScholarGoogle ScholarCross RefCross Ref
  33. Yi Yu and Yang Feng. 2014. Modified cross-validation for penalized high-dimensional linear regression models. Journal of Computational and Graphical Statistics , Vol. 23, 4 (2014), 1009--1027.Google ScholarGoogle ScholarCross RefCross Ref
  34. Shuheng Zhou. 2009. Thresholding procedures for high dimensional variable selection and statistical estimation. In Advances in Neural Information Processing Systems. 2304--2312. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. ET-Lasso: A New Efficient Tuning of Lasso-type Regularization for High-Dimensional Data

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
        July 2019
        3305 pages
        ISBN:9781450362016
        DOI:10.1145/3292500

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 25 July 2019

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        KDD '19 Paper Acceptance Rate110of1,200submissions,9%Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader