Abstract
We consider the problem of selecting a parsimonious subset of explanatory variables from a potentially large collection of covariates. We are concerned with the case when data quality may be unreliable (e.g. there might be outliers among the observations). When the number of available covariates is moderately large, fitting all possible subsets is not a feasible option. Sequential methods like forward or backward selection are generally “greedy” and may fail to include important predictors when these are correlated. To avoid this problem Efron et al. (2004) proposed the Least Angle Regression algorithm to produce an ordered list of the available covariates (sequencing) according to their relevance. We introduce outlier robust versions of the LARS algorithm based on S-estimators for regression (Rousseeuw and Yohai (1984)). This algorithm is computationally efficient and suitable even when the number of variables exceeds the sample size. Simulation studies show that it is also robust to the presence of outliers in the data and compares favourably to previous proposals in the literature.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
AGOSTINELLI, C. (2002a): Robust model selection in regression via weighted likelihood methodology. Statistics and Probability Letters, 56 289-300.
AGOSTINELLI, C. (2002b): Robust stepwise regression. Journal of Applied Statistics, 29(6) 825-840.
AGOSTINELLI, C. and MARKATOU, M. (2005): M. Robust model selection by cross-validation via weighted likelihood. Unpublished manuscript.
AKAIKE, H. (1970): Statistical predictor identification. Annals of the Institute of Statistical Mathematics, 22 203-217.
EFRON, B., HASTIE, T., JOHNSTONE, I. and TIBSHIRANI, R. (2004): Least angle regression. The Annals of Statistics 32(2), 407-499.
HAMPEL, F.R. (1983): Some aspects of model choice in robust statistics. In: Proceedings of the 44th Session of the ISI, volume 2, 767-771. Madrid.
HASTIE, T., TIBSHIRANI, R. and FRIEDMAN, J. (2001): The Elements of Statistical Learning. Springer-Verlag, New York.
KHAN, J.A., VAN AELST, S., and ZAMAR, R.H. (2007a): Building a robust linear model with forward selection and stepwise procedures. Computational Statistics and Data Analysis 52, 239-248.
KHAN, J.A., VAN AELST, S., and ZAMAR, R.H. (2007b): Robust Linear Model Selection Based on Least Angle Regression. Journal of the American Statistical Association 102, 1289-1299.
MALLOWS, C.L. (1973): Some comments on C p . Technometrics 15, 661-675.
MARONNA, R.A., MARTIN, D.R. and YOHAI, V.J. (2006): Robust Statistics: Theory and Methods. Wiley, Ney York.
McCANN, L. and WELSCH, R.E. (2007): Robust variable selection using least angle regression and elemental set sampling. Computational Statistical and Data Analysis 52, 249-257.
MILLER, A.J. (2002): Subset selection in regression. Chapman-Hall, New York.
MORGENTHALER, S., WELSCH, R.E. and ZENIDE, A. (2003): Algorithms for robust model selection in linear regression. In: M. Hubert, G. Pison, A. Struyf and S. Van Aelst (Eds.): Theory and Applications of Recent Robust Methods. Brikhäuser-Verlag, Basel, 195-206.
MÜLLER, S. and WELSH, A. H. (2005): Outlier robust model selection in linear regression. Journal of the American Statistical Association 100, 1297-1310.
QIAN, G. and KÜNSCH, H.R. (1998): On model selection via stochastic complexity in robust linear regression. Journal of Statistical Planning and Inference 75, 91-116.
RONCHETTI, E. (1985): Robust model selection in regression. Statistics and Probability Letters 3, 21-23.
RONCHETTI, E. (1997): Robustness aspects of model choice. Statistica Sinica 7, 327-338.
RONCHETTI, E. and STAUDTE, R.G. (1994): A robust version of Mallows’ C p . Journal of the American Statistical Association 89, 550-559.
RONCHETTI, E., FIELD, C. and BLANCHARD, W. (1997): Robust linear model selection by cross-validation. Journal of the American Statistical Association 92, 1017-1023.
ROUSSEEUW, P.J. and YOHAI, V.J. (1984). Robust regression by means of S-estimators. In: J. Franke, W. Hardle and D. Martin (Eds.): Robust and Nonlinear Time Series, Lecture Notes in Statistics 26. Springer-Verlag, Berlin, 256-272.
SALIBIAN-BARRERA, M. and VAN AELST, S. (2008): Robust model selection using fast and robust bootstrap. Computational Statistics and Data Analysis 52 5121-5135.
SALIBIAN-BARRERA, M. and ZAMAR, R.H. (2002): Bootstrapping robust estimates of regression. The Annals of Statistics 30, 556-582.
SCHWARTZ, G. (1978): Estimating the dimensions of a model. The Annals of Statistics 6, 461-464.
SOMMER, S. and STAUDTE, R.G. (1995): Robust variable selection in regression in the presence of outliers and leverage points. Australian Journal of Statistics 37, 323-336.
TIBSHIRANI, R. (1996): Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B: Methodological 58, 267-288.
WEISBERG, S. (1985): Applied linear regression. Wiley, New York.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Agostinelli, C., Salibian-Barrera, M. (2010). Robust Model Selection with LARS Based on S-estimators. In: Lechevallier, Y., Saporta, G. (eds) Proceedings of COMPSTAT'2010. Physica-Verlag HD. https://doi.org/10.1007/978-3-7908-2604-3_6
Download citation
DOI: https://doi.org/10.1007/978-3-7908-2604-3_6
Published:
Publisher Name: Physica-Verlag HD
Print ISBN: 978-3-7908-2603-6
Online ISBN: 978-3-7908-2604-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)