Robust Model Selection with LARS Based on S-estimators

Agostinelli, Claudio; Salibian-Barrera, Matias

doi:10.1007/978-3-7908-2604-3_6

Claudio Agostinelli³ &
Matias Salibian-Barrera⁴

6352 Accesses

Abstract

We consider the problem of selecting a parsimonious subset of explanatory variables from a potentially large collection of covariates. We are concerned with the case when data quality may be unreliable (e.g. there might be outliers among the observations). When the number of available covariates is moderately large, fitting all possible subsets is not a feasible option. Sequential methods like forward or backward selection are generally “greedy” and may fail to include important predictors when these are correlated. To avoid this problem Efron et al. (2004) proposed the Least Angle Regression algorithm to produce an ordered list of the available covariates (sequencing) according to their relevance. We introduce outlier robust versions of the LARS algorithm based on S-estimators for regression (Rousseeuw and Yohai (1984)). This algorithm is computationally efficient and suitable even when the number of variables exceeds the sample size. Simulation studies show that it is also robust to the presence of outliers in the data and compares favourably to previous proposals in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Statistical estimation in the presence of possibly incorrect model assumptions

Article 01 September 2017

Performance of Some Improved Estimators and their Robust Versions in Presence of Multicollinearity and Outliers

Article 04 February 2025

A Consistent Likelihood-Based Variable Selection Method in Normal Multivariate Linear Regression

References

AGOSTINELLI, C. (2002a): Robust model selection in regression via weighted likelihood methodology. Statistics and Probability Letters, 56 289-300.
Article MathSciNet MATH Google Scholar
AGOSTINELLI, C. (2002b): Robust stepwise regression. Journal of Applied Statistics, 29(6) 825-840.
Article MathSciNet MATH Google Scholar
AGOSTINELLI, C. and MARKATOU, M. (2005): M. Robust model selection by cross-validation via weighted likelihood. Unpublished manuscript.
Google Scholar
AKAIKE, H. (1970): Statistical predictor identification. Annals of the Institute of Statistical Mathematics, 22 203-217.
Article MathSciNet MATH Google Scholar
EFRON, B., HASTIE, T., JOHNSTONE, I. and TIBSHIRANI, R. (2004): Least angle regression. The Annals of Statistics 32(2), 407-499.
Article MathSciNet MATH Google Scholar
HAMPEL, F.R. (1983): Some aspects of model choice in robust statistics. In: Proceedings of the 44th Session of the ISI, volume 2, 767-771. Madrid.
Google Scholar
HASTIE, T., TIBSHIRANI, R. and FRIEDMAN, J. (2001): The Elements of Statistical Learning. Springer-Verlag, New York.
MATH Google Scholar
KHAN, J.A., VAN AELST, S., and ZAMAR, R.H. (2007a): Building a robust linear model with forward selection and stepwise procedures. Computational Statistics and Data Analysis 52, 239-248.
Article MathSciNet MATH Google Scholar
KHAN, J.A., VAN AELST, S., and ZAMAR, R.H. (2007b): Robust Linear Model Selection Based on Least Angle Regression. Journal of the American Statistical Association 102, 1289-1299.
Article MathSciNet MATH Google Scholar
MALLOWS, C.L. (1973): Some comments on C _p. Technometrics 15, 661-675.
Article MATH Google Scholar
MARONNA, R.A., MARTIN, D.R. and YOHAI, V.J. (2006): Robust Statistics: Theory and Methods. Wiley, Ney York.
Google Scholar
McCANN, L. and WELSCH, R.E. (2007): Robust variable selection using least angle regression and elemental set sampling. Computational Statistical and Data Analysis 52, 249-257.
Article MathSciNet MATH Google Scholar
MILLER, A.J. (2002): Subset selection in regression. Chapman-Hall, New York.
Book MATH Google Scholar
MORGENTHALER, S., WELSCH, R.E. and ZENIDE, A. (2003): Algorithms for robust model selection in linear regression. In: M. Hubert, G. Pison, A. Struyf and S. Van Aelst (Eds.): Theory and Applications of Recent Robust Methods. Brikhäuser-Verlag, Basel, 195-206.
Google Scholar
MÜLLER, S. and WELSH, A. H. (2005): Outlier robust model selection in linear regression. Journal of the American Statistical Association 100, 1297-1310.
Article MathSciNet MATH Google Scholar
QIAN, G. and KÜNSCH, H.R. (1998): On model selection via stochastic complexity in robust linear regression. Journal of Statistical Planning and Inference 75, 91-116.
Article MathSciNet MATH Google Scholar
RONCHETTI, E. (1985): Robust model selection in regression. Statistics and Probability Letters 3, 21-23.
Article MathSciNet Google Scholar
RONCHETTI, E. (1997): Robustness aspects of model choice. Statistica Sinica 7, 327-338.
MathSciNet MATH Google Scholar
RONCHETTI, E. and STAUDTE, R.G. (1994): A robust version of Mallows’ C _p. Journal of the American Statistical Association 89, 550-559.
Article MathSciNet MATH Google Scholar
RONCHETTI, E., FIELD, C. and BLANCHARD, W. (1997): Robust linear model selection by cross-validation. Journal of the American Statistical Association 92, 1017-1023.
Article MathSciNet MATH Google Scholar
ROUSSEEUW, P.J. and YOHAI, V.J. (1984). Robust regression by means of S-estimators. In: J. Franke, W. Hardle and D. Martin (Eds.): Robust and Nonlinear Time Series, Lecture Notes in Statistics 26. Springer-Verlag, Berlin, 256-272.
Google Scholar
SALIBIAN-BARRERA, M. and VAN AELST, S. (2008): Robust model selection using fast and robust bootstrap. Computational Statistics and Data Analysis 52 5121-5135.
Article MathSciNet MATH Google Scholar
SALIBIAN-BARRERA, M. and ZAMAR, R.H. (2002): Bootstrapping robust estimates of regression. The Annals of Statistics 30, 556-582.
Article MathSciNet MATH Google Scholar
SCHWARTZ, G. (1978): Estimating the dimensions of a model. The Annals of Statistics 6, 461-464.
Article MathSciNet Google Scholar
SOMMER, S. and STAUDTE, R.G. (1995): Robust variable selection in regression in the presence of outliers and leverage points. Australian Journal of Statistics 37, 323-336.
Article MathSciNet MATH Google Scholar
TIBSHIRANI, R. (1996): Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B: Methodological 58, 267-288.
MathSciNet MATH Google Scholar
WEISBERG, S. (1985): Applied linear regression. Wiley, New York.
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Statistica, Ca’ Foscari University, Venice, Italy
Claudio Agostinelli
Department of Statistics, The University of British Columbia, Vancouver, BC, Canada
Matias Salibian-Barrera

Authors

Claudio Agostinelli
View author publications
You can also search for this author in PubMed Google Scholar
Matias Salibian-Barrera
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Claudio Agostinelli .

Editor information

Editors and Affiliations

Centre de Recherche INRIA Paris-Rocquenc, Domaine de Voluceau, Le Chesnay cedex, 78153, France
Yves Lechevallier
, chaire de statistique appliquée, CNAM, rue Saint Martin 292, Paris, 75003, France
Gilbert Saporta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Agostinelli, C., Salibian-Barrera, M. (2010). Robust Model Selection with LARS Based on S-estimators. In: Lechevallier, Y., Saporta, G. (eds) Proceedings of COMPSTAT'2010. Physica-Verlag HD. https://doi.org/10.1007/978-3-7908-2604-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-7908-2604-3_6
Published: 30 September 2010
Publisher Name: Physica-Verlag HD
Print ISBN: 978-3-7908-2603-6
Online ISBN: 978-3-7908-2604-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics