Abstract
A procedure for detecting outliers in regression problems based on information provided by boosting trees is proposed. Boosting is meant for dealing with observations that are hard to predict, by giving them extra weights. In the present paper, such observations are considered to be possible outliers, and a procedure is proposed that uses the boosting results to diagnose which observations could be outliers. The key idea is to select the most frequently resampled observation along the boosting iterations and reiterate boosting after removing it. A lot of well-known bench data sets are considered and a comparative study against two classical competitors allows to show the value of the method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
BREIMAN, L., FRIEDMAN, J. H., OLSHEN, R. A. and STONE, C. J. (1984): Classification And Regression Trees. Chapman & Hall.
CHEZE, N. and POGGI, J-M. (2005): Outlier Detection by Boosting Regression Trees. Preprint 2005–17, Orsay. www.math.u-psud.fr/biblio/ppo/2005/
CHEZE, N., POGGI, J-M. and PORTIER, B. (2003): Partial and Recombined Estimators for Nonlinear Additive Models. Stat. Inf. Stoch. Proc., 6, 155–197.
DRUCKER, H. (1997): Improving Regressors using Boosting Techniques. In: Proc. of the 14th Int. Conf. on Machine Learning. Morgan Kaufmann, 107–115.
FREUND, Y. and SCHAPIRE, R. E. (1997): A Decision-Theoretic Generalization of On-line Learning and an Application to Boosting. Journal of Computer and System Sciences, 55,1, 119–139.
GEY, S. and POGGI, J-M. (2006): Boosting and Instability for Regression Trees. Computational Statistics & Data Analysis, 50,2, 533–550.
ROUSSEEUW, P.J. and LEROY, A. (1987): Robust regression and outlier detection. Wiley.
VERBOVEN, S. and HUBERT, M. (2005): LIBRA: a MATLAB library for robust analysis. Chemometrics and Intelligent Laboratory Systems, 75, 127–136.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin · Heidelberg
About this paper
Cite this paper
Cheze, N., Poggi, JM. (2006). Iterated Boosting for Outlier Detection. In: Batagelj, V., Bock, HH., Ferligoj, A., Žiberna, A. (eds) Data Science and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg . https://doi.org/10.1007/3-540-34416-0_23
Download citation
DOI: https://doi.org/10.1007/3-540-34416-0_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34415-5
Online ISBN: 978-3-540-34416-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)