research-article

ET-Lasso: A New Efficient Tuning of Lasso-type Regularization for High-Dimensional Data

Authors:
Songshan Yang

Pennsylvania State University, University Park, PA, USA

Pennsylvania State University, University Park, PA, USA
View Profile

,
Jiawei Wen

Pennsylvania State University, University Park, PA, USA

Pennsylvania State University, University Park, PA, USA
View Profile

,
Xiang Zhan

Pennsylvania State University, Hershey, PA, USA

Pennsylvania State University, Hershey, PA, USA
View Profile

,
Daniel Kifer

Pennsylvania State University, University Park, PA, USA

Pennsylvania State University, University Park, PA, USA
View Profile

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningJuly 2019Pages 607–616https://doi.org/10.1145/3292500.3330910

Published:25 July 2019Publication History

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 607–616

ABSTRACT

The $L_1 $ regularization (Lasso) has proven to be a versatile tool to select relevant features and estimate the model coefficients simultaneously and has been widely used in many research areas such as genomes studies, finance, and biomedical imaging. Despite its popularity, it is very challenging to guarantee the feature selection consistency of Lasso especially when the dimension of the data is huge. One way to improve the feature selection consistency is to select an ideal tuning parameter. Traditional tuning criteria mainly focus on minimizing the estimated prediction error or maximizing the posterior model probability, such as cross-validation and BIC, which may either be time-consuming or fail to control the false discovery rate (FDR) when the number of features is extremely large. The other way is to introduce pseudo-features to learn the importance of the original ones. Recently, the Knockoff filter is proposed to control the FDR when performing feature selection. However, its performance is sensitive to the choice of the expected FDR threshold. Motivated by these ideas, we propose a new method using pseudo-features to obtain an ideal tuning parameter. In particular, we present the E fficient T uning of Lasso (ET-Lasso ) to separate active and inactive features by adding permuted features as pseudo-features in linear models. The pseudo-features are constructed to be inactive by nature, which can be used to obtain a cutoff to select the tuning parameter that separates active and inactive features. Experimental studies on both simulations and real-world data applications are provided to show that ET-Lasso can effectively and efficiently select active features under a wide range of scenarios.

References

Hirotugu Akaike. 1974. A new look at the statistical model identification. IEEE transactions on automatic control , Vol. 19, 6 (1974), 716--723.Google ScholarCross Ref
Francis R Bach. 2008. Bolasso: model consistent lasso estimation through the bootstrap. In Proceedings of the 25th international conference on Machine learning. ACM, 33--40. Google ScholarDigital Library
Rina Foygel Barber, Emmanuel J Candès, et almbox. 2015. Controlling the false discovery rate via knockoffs. The Annals of Statistics , Vol. 43, 5 (2015), 2055--2085.Google ScholarCross Ref
Amir Beck and Marc Teboulle. 2009. Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Transactions on Image Processing , Vol. 18, 11 (2009), 2419--2434. Google ScholarDigital Library
Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, Jonathan Eckstein, et almbox. 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning , Vol. 3, 1 (2011), 1--122. Google ScholarDigital Library
Emmanuel Candes, Yingying Fan, Lucas Janson, and Jinchi Lv. 2018. Panning for gold:model-Xknockoffs for high dimensional controlled variable selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , Vol. 80, 3 (2018), 551--577.Google ScholarCross Ref
David L Donoho. 2000. High-dimensional data analysis: The curses and blessings of dimensionality. AMS Math Challenges Lecture , Vol. 1 (2000), 32.Google Scholar
Jianqing Fan and Runze Li. 2006. Statistical challenges with high dimensionality: Feature selection in knowledge discovery. Proceedings of the International Congress of Mathematicians , Vol. 3 (2006), 595--622.Google Scholar
Jianqing Fan and Jinchi Lv. 2008. Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , Vol. 70, 5 (2008), 849--911.Google ScholarCross Ref
Jianqing Fan and Jinchi Lv. 2011. Nonconcave penalized likelihood with NP-dimensionality. IEEE Transactions on Information Theory , Vol. 57, 8 (2011), 5467--5484. Google ScholarDigital Library
Jerome Friedman, Trevor Hastie, Holger Höfling, and Robert Tibshirani. 2007. Pathwise coordinate optimization. The Annals of Applied Statistics , Vol. 1, 2 (2007), 302--332.Google ScholarCross Ref
Jean-Jacques Fuchs. 2005. Recovery of exact sparse representations in the presence of bounded noise. IEEE Transactions on Information Theory , Vol. 51, 10 (2005), 3601--3608. Google ScholarDigital Library
Karan Gadiya. 2019. FIFA 19 complete player dataset. Data scraped from https://sofifa.com/. Hosted at https://www.kaggle.com/karangadiya/fifa19.Google Scholar
Miron B Kursa, Witold R Rudnicki, et almbox. 2010. Feature selection with the Boruta package. J Stat Softw , Vol. 36, 11 (2010), 1--13.Google ScholarCross Ref
Chinghway Lim and Bin Yu. 2016. Estimation stability with cross-validation (ESCV). Journal of Computational and Graphical Statistics , Vol. 25, 2 (2016), 464--492.Google ScholarCross Ref
Xiaohui Luo, Leonard A Stefanski, and Dennis D Boos. 2006. Tuning variable selection procedures by adding noise. Technometrics , Vol. 48, 2 (2006), 165--175.Google ScholarCross Ref
Nicolai Meinshausen, Bin Yu, et almbox. 2009. Lasso-type recovery of sparse representations for high-dimensional data. The Annals of Statistics , Vol. 37, 1 (2009), 246--270.Google ScholarCross Ref
Yu Nesterov. 2013. Gradient methods for minimizing composite functions. Mathematical Programming , Vol. 140, 1 (2013), 125--161.Google ScholarCross Ref
Thanh-Tung Nguyen, Joshua Zhexue Huang, and Thuy Thi Nguyen. 2015. Unbiased feature selection in learning random forests for high-dimensional data. The Scientific World Journal , Vol. 2015 (2015).Google Scholar
Galen Reeves and Michael C Gastpar. 2013. Approximate sparsity pattern recovery: Information-theoretic lower bounds. IEEE Transactions on Information Theory , Vol. 59, 6 (2013), 3451--3465. Google ScholarDigital Library
Witold R Rudnicki, Mariusz Wrzesie'n, and Wiesław Paja. 2015. All relevant feature selection methods and applications. In Feature Selection for Data and Pattern Recognition. Springer, 11--28.Google Scholar
Marco Sandri and Paola Zuccolotto. 2008. A bias correction algorithm for the Gini variable importance measure in classification trees. Journal of Computational and Graphical Statistics , Vol. 17, 3 (2008), 611--628.Google ScholarCross Ref
Gideon Schwarz et almbox. 1978. Estimating the dimension of a model. The annals of statistics , Vol. 6, 2 (1978), 461--464.Google Scholar
Shai Shalev-Shwartz and Ambuj Tewari. 2011. Stochastic methods for $ell_1$-regularized loss minimization. Journal of Machine Learning Research , Vol. 12, Jun (2011), 1865--1892. Google ScholarDigital Library
Mervyn Stone. 1974. Cross-validation and multinomial prediction. Biometrika , Vol. 61, 3 (1974), 509--515.Google ScholarCross Ref
Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) , Vol. 58, 1 (1996), 267--288.Google ScholarCross Ref
Joel A Tropp. 2006. Just relax: Convex programming methods for identifying sparse signals in noise. IEEE transactions on information theory , Vol. 52, 3 (2006), 1030--1051. Google ScholarDigital Library
Martin J Wainwright. 2009. Sharp thresholds for High-Dimensional and noisy sparsity recovery using $ell_1$-Constrained Quadratic Programming (Lasso). IEEE transactions on information theory , Vol. 55, 5 (2009), 2183--2202. Google ScholarDigital Library
Hansheng Wang. 2009. Forward regression for ultra-high dimensional variable screening. J. Amer. Statist. Assoc. , Vol. 104, 488 (2009), 1512--1524.Google ScholarCross Ref
Hansheng Wang, Bo Li, and Chenlei Leng. 2009. Shrinkage tuning parameter selection with a diverging number of parameters. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , Vol. 71, 3 (2009), 671--683.Google ScholarCross Ref
Hansheng Wang, Runze Li, and Chih-Ling Tsai. 2007. Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika , Vol. 94, 3 (2007), 553--568.Google ScholarCross Ref
Yujun Wu, Dennis D Boos, and Leonard A Stefanski. 2007. Controlling variable selection by the addition of pseudovariables. J. Amer. Statist. Assoc. , Vol. 102, 477 (2007), 235--243.Google ScholarCross Ref
Yi Yu and Yang Feng. 2014. Modified cross-validation for penalized high-dimensional linear regression models. Journal of Computational and Graphical Statistics , Vol. 23, 4 (2014), 1009--1027.Google ScholarCross Ref
Shuheng Zhou. 2009. Thresholding procedures for high dimensional variable selection and statistical estimation. In Advances in Neural Information Processing Systems. 2304--2312. Google ScholarDigital Library

Index Terms

ET-Lasso: A New Efficient Tuning of Lasso-type Regularization for High-Dimensional Data
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms
      1. Feature selection
      2. Regularization

Recommendations

High-order covariate interacted Lasso for feature selection

High-order covariates interaction is considered into Lasso-type variable selection.We evaluate the significance of feature by considering their neighborhood dependency.Having too few features in not necessarily a good feature selection result.Some ...
Read More
Sparse regularization based feature selection: A survey
Abstract
Feature selection, as an essential preprocessing tool, aims to identify a subset of crucial features by eliminating redundant and noisy features according to a predefined criterion. In recent years, sparse learning has received considerable ...
Read More
A multi-stage framework for Dantzig selector and LASSO

We consider the following sparse signal recovery (or feature selection) problem: given a design matrix X ∈ R^n×m (m ≫ n) and a noisy observation vector y ∈ Rⁿ satisfying y = Xβ* + ε where ε is the noise vector following a Gaussian distribution N(0,σ²I), ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
July 2019
3305 pages
ISBN:9781450362016
DOI:10.1145/3292500
General Chairs:
Ankur Teredesai
KenSci
,
Vipin Kumar
University of Minnesota
,
Program Chairs:
Ying Li
EV Analysis Corporation
,
Rómer Rosales
LinkedIn
,
Evimaria Terzi
Boston University
,
George Karypis
University of Minnesota
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 July 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
automatic tuning parameter selection
feature selection
high-dimensional data
lasso
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '19 Paper Acceptance Rate110of1,200submissions,9%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 462
  Total Downloads
- Downloads (Last 12 months)12
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

ET-Lasso: A New Efficient Tuning of Lasso-type Regularization for High-Dimensional Data

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

High-order covariate interacted Lasso for feature selection

Sparse regularization based feature selection: A survey

A multi-stage framework for Dantzig selector and LASSO

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

ET-Lasso: A New Efficient Tuning of Lasso-type Regularization for High-Dimensional Data

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

High-order covariate interacted Lasso for feature selection

Sparse regularization based feature selection: A survey

A multi-stage framework for Dantzig selector and LASSO

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media