Abstract
We consider the generic regularized optimization problem \( \hat w \)(λ) = arg minw ∑ m k=1 L(y k , x T k w)+λJ(w). We derive a general characterization of the properties of (loss L, penalty J) pairs which give piecewise linear coefficient paths. Such pairs allow us to efficiently generate the full regularized coefficient paths.We illustrate how we can use our results to build robust, efficient and adaptable modeling tools.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
D. Donoho and I. Johnstone. Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81:425–455, 1994.
D. Donoho, I. Johnstone, J. Hoch, and A. Stern. Maximum entropy and the nearly black object. Journal of Royal Statistical Society B, 54:41–81, 1992.
D. Donoho, I. Johnstone, G. Kerkyachairan, and D. Picard. Wavelet shrinkage: asymptopia? Journal of Royal Statistical Society B, 57:301–337, 1995.
B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. Annals of Statistics, 32(2), 2004.
Y. Freund and R. Schapire. Experiments with a new boosting algorithm. In Proceedingsof the Thirteenth International Conference on Machine Learning. Morgan Kauffman, San Francisco, 1996.
J. Friedman, T. Hastie, S. Rosset, R. Tibshirani, and J. Zhu. Discussion of three boosting papers. Annals of Statistics, 32(1), 2004. The three papers are by (1) W. Jiang (2) G. Lugosi and N. Vayatis (3) T. Zhang.
G. Furnival and R. Wilson. Regression by leaps and bounds. Technometrics, 16:499–511, 1974.
T. Hastie, S. Rosset, R. Tibshirani, and J. Zhu. The entire regularization path for the support vector machine. Journal of Machine Learning Research, 2004. Tentatively accepted.
A. Hoerl and R. Kennard. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(3):55–67, 1970.
A. Ng. Feature selection, l1 vs. l2 regularization, and rotational invariance. In Proceedings of the Twenty-First International Conference on Machine Learning. Banff, Canada, 2004.
S. Rosset and J. Zhu. Piecewise linear regularized solution paths. Technical Report, Department of Statistics, Stanford University, California, U.S.A., 2003.
S. Rosset, J. Zhu, and T. Hastie. Boosting as a regularized path to a maximum margin classifier. Journal of Machine Learning Research, 5:941–973, 2004.
R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of Royal Statistical Society B, 58(1), 1996.
V. Vapnik. The nature of statistical learning. Springer, 1995.
G. Wahba. Spline models for observational data, volume 59 of CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, 1990.
J. Zhu, S. Rosset, T. Hastie, and R. Tibshirani. 1-norm support vector machines. In Proceedings of Neural Information Processing Systems, 2003.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Rosset, S., Zhu, J. (2006). Sparse, Flexible and Efficient Modeling using L 1 Regularization. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds) Feature Extraction. Studies in Fuzziness and Soft Computing, vol 207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-35488-8_17
Download citation
DOI: https://doi.org/10.1007/978-3-540-35488-8_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35487-1
Online ISBN: 978-3-540-35488-8
eBook Packages: EngineeringEngineering (R0)