ABSTRACT
We show that on an enhancement of the capacity of the function space used in regression, LASSO simultaneously decreases bias and variance of statistical models obtained in machine learning from training data, if the balance between minimization of the mean-squared error and the L1-regularization term is optimal. Further, if minimization of the mean-squared error is dominant, this seems to explain the occurrence of a double descent in the modern interpolation regime of machine learning. Our main method is a decomposition of mean squared error plus complexity into bias, variance and an unavoidable irreducible error inherent to the problem.
- T. Hastie, R. Tibshirani, M. Wainwright, Statistical learning with sparsity: the lasso and generalizations, CRC press, 2015.Google ScholarCross Ref
- V. N. Vapnik, The Nature of Statistical Learning Theory, Springer, 1995.Google ScholarCross Ref
- S. Geman, E. Bienenstock, R. Doursat, Neural computation 4 (1992), 1-58.Google Scholar
- S. Spigler, M. Geiger, S. d'Ascoli, L. Sagun, G. Biroli, M. Wyart, Journal of Physics A: Mathematical and Theoretical 52 (2019), 474001.Google ScholarCross Ref
- M. Belkin, D. Hsu, S. Ma, S. Mandal, Proceedings of the National Academy of Sciences 116 (2019), 15849-15854.Google ScholarCross Ref
- B. Ghojogh, M. Crowley, arXiv:1905.12787 (2019).Google Scholar
- B. Adlam, J. Pennington, arXiv:2011.03321 (2020).Google Scholar
- J. Merker, G. Schuldt, Proceedings of ICoMS 2020, ACM (2020), DOI: 10.1145/3409915.3409920Google ScholarDigital Library
- J. Merker, Journal of Advances in Applied Mathematics 2 (2017), 109-114.Google ScholarCross Ref
- Why LASSO Seems to Simultaneously Decrease Bias and Variance in Machine Learning
Recommendations
Generalized LASSO with under-determined regularization matrices
This paper studies the intrinsic connection between a generalized LASSO and a basic LASSO formulation. The former is the extended version of the latter by introducing a regularization matrix to the coefficients. We show that when the regularization ...
Comparative study of computational algorithms for the Lasso with high-dimensional, highly correlated data
Variable selection is important in high-dimensional data analysis. The Lasso regression is useful since it possesses sparsity, soft-decision rule, and computational efficiency. However, since the Lasso penalized likelihood contains a nondifferentiable ...
A multi-stage framework for Dantzig selector and LASSO
We consider the following sparse signal recovery (or feature selection) problem: given a design matrix X ∈ Rn×m (m ≫ n) and a noisy observation vector y ∈ Rn satisfying y = Xβ* + ε where ε is the noise vector following a Gaussian distribution N(0,σ2I), ...
Comments