Abstract
Computation of typical statistical sample estimates such as the median or least squares fit usually require the solution of an unconstrained optimization problem with a convex objective function, that can be solved efficiently by various methods. The presence of outliers in the data dictates the computation of a robust estimate, which can be defined as the optimum statistical estimate for a subset that contains at least half of the observations. The resulting problem is now a combinatorial optimization problem which is often computationally intractable. Classical statistical methods for multivariate location \(\varvec{\mu }\) and scatter matrix \(\varvec{\varSigma }\) estimation are based on the sample mean vector and covariance matrix, which are very sensitive in the presence of outlier observations. We propose a new method for robust location and scatter estimation which is composed of two stages. In the first stage an unbiased multivariate \(L_{1}\)-median center for all the observations is attained by a novel procedure called the least trimmed Euclidean deviations estimator. This robust median defines a coverage set of observations which is used in the second stage to iteratively compute the set of outliers which violate the correlational structure of the data set. Extensive computational experiments indicate that the proposed method outperforms existing methods in accuracy, robustness and computational time.




Similar content being viewed by others
References
Aggarwal CharuC (2013) Outlier detection in graphs and networks. In: Outlier analysis, pp 343–371. Springer, New York
Agostinelli C, Markatou M (1998) A one-step robust estimator for regression based on the weighted likelihood reweighting scheme. Stat Probab Lett 37:342–350
Bassett GW (1991) Equivariant, monotonic, 50% breakdown estimators. Am Stat 45(2):135–137
Camarinopoulos L, Zioutas G (2002) Formulating robust regression estimation as an optimum allocation problem. J Stat Comput Simul 72(9):687–705
Donoho DL (1982) Breakdown Properties of Multivariate Location Estimators. PhD thesis, Harvard University
Filzmoser Peter, Maronna Ricardo, Werner Mark (2008) Outlier identification in high dimensions. Comput Stat Data Anal 52(3):1694–1711
Gao Jing, Liang Feng, Fan Wei, Wang Chi, Sun Yizhou, Han Jiawei (2010) On community outliers and their efficient detection in information networks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’10, pp 813–822. New York, NY, USA, ACM
Gnanadesikan R, Kettenring JR (1972) Robust estimates, residuals and outlier detection with multiresponse data. Biometrics 28:81–124
Hawkins DM, Olive DJ (1999) Applications and algorithms for least trimmed sum of absolute deviations regression. Comput Stat Data Anal 32:119–134
Heinrich F, Filzmoser P, Croux C (2012) A comparison of algorithms for the multivariate \(l_{1}\)-median. Comput Stat 27:393–410
Hettich S, Bay SD (1999) The UCI KDD archive. Technical report, University of California, Irvine. Department of Information and Computer Science, Irvine, CA
Lopuhaä H, Rousseeuw PJ (1991) Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. Ann Stat 19(1):229–248
Maronna RA, Zamar RH (2002) Robust estimates of location and dispersion for high dimensional datasets. Am Stat Assoc Am Soc Qual Technometrics 44(4)
Pitsoulis L, Zioutas G (2010) A fast algorithm for robust regression with penalised trimmed squares. Comput Stat 25(4):663–689
Roelant Ella, Van Aelst Stefan, Willems Gert (2009) The minimum weighted covariance determinant estimator. Metrika 70(2):177–204
Rousseeuw PJ (1984) Least median of squares regression. J Am Stat Assoc 79:871–881
Rousseeuw PJ (1985) Multivariate estimation with high breakpoint. Math Stat Assoc B:101–121
Rousseeuw PJ, Leroy AM (1987) Robust regression and outlier detection. Wiley, New York
Rousseeuw PJ, van Driesen K (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41:212–223
Stahel WA (1981) Breakdown of covariance estimators. Technical report, Fachgruppe fur Statistik, ETH Zurich
Tableman M (1994) The asymptotics of the least trimmed absolute deviations (ltad) estimator. Stat Probab Lett 19(5):387–398
Tableman Mara (1994) The influence functions for the least trimmed squares and the least trimmed absolute deviations estimators. Stat Probab Lett 19(4):329–337
Weiszfeld E (1937) Sur le point pour lequel la somme des distances de n points donnés est minimum. Tôhoku Math J 43:355–386
Zioutas G, Avramidis A (2005) Deleting outliers in robust regression with mixed integer programming. Acta Math Appl Sin 21:323–334
Zioutas G, Pitsoulis L, Avramidis A (2009) Quadratic mixed integer programming and support vectors for deleting outliers in robust regression. Ann Oper Res 166:339–353
Acknowledgments
The authors would like to thank Peter Filzmoser for his helpful comments and suggestions on various aspects of the paper. Work of the second author was conducted at National Research University Higher School of Economics and supported by RSF Grant 14-41-00039
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chatzinakos, C., Pitsoulis, L. & Zioutas, G. Optimization techniques for robust multivariate location and scatter estimation. J Comb Optim 31, 1443–1460 (2016). https://doi.org/10.1007/s10878-015-9833-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10878-015-9833-6