Abstract
We consider the problem of deleting bad influential observations (outliers) in linear regression models. The problem is formulated as a Quadratic Mixed Integer Programming (QMIP) problem, where penalty costs for discarding outliers are used into the objective function. The optimum solution defines a robust regression estimator called penalized trimmed squares (PTS). Due to the high computational complexity of the resulting QMIP problem, the proposed robust procedure is computationally suitable for small sample data. The computational performance and the effectiveness of the new procedure are improved significantly by using the idea of ε-Insensitive loss function from support vectors machine regression. Small errors are ignored, and the mathematical formula gains the sparseness property. The good performance of the ε-Insensitive PTS (IPTS) estimator allows identification of multiple outliers avoiding masking or swamping effects. The computational effectiveness and successful outlier detection of the proposed method is demonstrated via simulated experiments.
Similar content being viewed by others
References
Agulló, J. (2001). New algorithms for computing the least trimmed squares regression estimator. Computational Statistics and Data Analysis, 36, 425–439.
Arthanari, T. S., & Dodge, Y. (1993). Mathematical programming in statistics. New York: Wiley.
Atkison, A., & Riani, M. (2000). Robust diagnostic regression analysis. Berlin: Wiley.
Bazaraa, M., Shevali, H., & Shelty, C. (1993). Nonlinear programming: Theory and algorithms. New York: Wiley.
Camarinopoulos, L., & Zioutas, G. (2002). Formulating robust regression estimation as an optimum allocation problem. Journal of Statistical Computation and Simulation, 72(9), 687–705.
Giloni, A., & Padberg, M. (2002). Least trimmed squares regression, least median squares regression, and mathematical programming. Mathematical and Computer Modelling, 35, 1043–1060.
Hampel, F. R. (1978). Optimally bounding the gross error sensitivity and influence of position in factor space. In Proceedings of the ASA statistical computing section (pp. 59–64). ASA.
Hawkins, D. M. (1994). The feasible solution algorithm for least trimmed squares regression. Data Mining and Knowledge Discovery, 17, 185–196.
Hawkins, D. M., Bradu, D., & Kass, G. V. (1984). Location of several outliers in multiple regression data using elemental sets. Technometrics, 26, 197–208.
Huber, P. J. (1981). Robust statistics. New York: Wiley.
Mangasarian, O. L., & Musicant, D. R. (2000). Robust linear and support vector regression. IEEE Transactions on Patern Analysis and Machine Intelligence, 22, 950–955.
Peña, D., & Yohai, V. J. (1999). A fast procedure for outlier diagnostics in large regression problems. Journal of the American Statistical Association, 94, 434–445.
Rousseeuw, P. J., & Leroy, A. M. (1987). Robust regression and outlier detection. New York: Wiley.
Rousseeuw, P. J., & Van Driessen, K. (1999). A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41, 212–223.
Rousseeuw, P. J., & Van Driessen, K. (2006). Computing LTS regression for large data sets. Data Mining and Knowledge Discovery, 12, 29–45.
Smola, A. J., & Scholkopf, B. (1998). On kernel-based method for pattern recognition, regression, approximation and operator inversion. Algorithmica, 22, 211–231.
Vapnick, V. N. (1998). Statistical learning theory. New York: Wiley.
Wright, S. J. (2000). On reduced convex qp formulations of monotone lcp problems (Technical Report ANL/MCS-P808-0400). Argonne National Laboratory.
Zioutas, G., & Avramidis, A. (2005). Deleting outliers in robust regression with mixed integer programming. Acta Mathematicae Applicatae Sinica, 21, 323–334.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research has been partially funded by the Greek Ministry of Education under the program Pythagoras II.
Rights and permissions
About this article
Cite this article
Zioutas, G., Pitsoulis, L. & Avramidis, A. Quadratic mixed integer programming and support vectors for deleting outliers in robust regression. Ann Oper Res 166, 339–353 (2009). https://doi.org/10.1007/s10479-008-0412-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-008-0412-4