Skip to main content
Log in

Optimization techniques for robust multivariate location and scatter estimation

  • Published:
Journal of Combinatorial Optimization Aims and scope Submit manuscript

Abstract

Computation of typical statistical sample estimates such as the median or least squares fit usually require the solution of an unconstrained optimization problem with a convex objective function, that can be solved efficiently by various methods. The presence of outliers in the data dictates the computation of a robust estimate, which can be defined as the optimum statistical estimate for a subset that contains at least half of the observations. The resulting problem is now a combinatorial optimization problem which is often computationally intractable. Classical statistical methods for multivariate location \(\varvec{\mu }\) and scatter matrix \(\varvec{\varSigma }\) estimation are based on the sample mean vector and covariance matrix, which are very sensitive in the presence of outlier observations. We propose a new method for robust location and scatter estimation which is composed of two stages. In the first stage an unbiased multivariate \(L_{1}\)-median center for all the observations is attained by a novel procedure called the least trimmed Euclidean deviations estimator. This robust median defines a coverage set of observations which is used in the second stage to iteratively compute the set of outliers which violate the correlational structure of the data set. Extensive computational experiments indicate that the proposed method outperforms existing methods in accuracy, robustness and computational time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Aggarwal CharuC (2013) Outlier detection in graphs and networks. In: Outlier analysis, pp 343–371. Springer, New York

  • Agostinelli C, Markatou M (1998) A one-step robust estimator for regression based on the weighted likelihood reweighting scheme. Stat Probab Lett 37:342–350

  • Bassett GW (1991) Equivariant, monotonic, 50% breakdown estimators. Am Stat 45(2):135–137

    MathSciNet  Google Scholar 

  • Camarinopoulos L, Zioutas G (2002) Formulating robust regression estimation as an optimum allocation problem. J Stat Comput Simul 72(9):687–705

    Article  MathSciNet  MATH  Google Scholar 

  • Donoho DL (1982) Breakdown Properties of Multivariate Location Estimators. PhD thesis, Harvard University

  • Filzmoser Peter, Maronna Ricardo, Werner Mark (2008) Outlier identification in high dimensions. Comput Stat Data Anal 52(3):1694–1711

    Article  MathSciNet  MATH  Google Scholar 

  • Gao Jing, Liang Feng, Fan Wei, Wang Chi, Sun Yizhou, Han Jiawei (2010) On community outliers and their efficient detection in information networks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’10, pp 813–822. New York, NY, USA, ACM

  • Gnanadesikan R, Kettenring JR (1972) Robust estimates, residuals and outlier detection with multiresponse data. Biometrics 28:81–124

    Article  Google Scholar 

  • Hawkins DM, Olive DJ (1999) Applications and algorithms for least trimmed sum of absolute deviations regression. Comput Stat Data Anal 32:119–134

    Article  MathSciNet  Google Scholar 

  • Heinrich F, Filzmoser P, Croux C (2012) A comparison of algorithms for the multivariate \(l_{1}\)-median. Comput Stat 27:393–410

    Article  MATH  Google Scholar 

  • Hettich S, Bay SD (1999) The UCI KDD archive. Technical report, University of California, Irvine. Department of Information and Computer Science, Irvine, CA

  • Lopuhaä H, Rousseeuw PJ (1991) Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. Ann Stat 19(1):229–248

    Article  MathSciNet  MATH  Google Scholar 

  • Maronna RA, Zamar RH (2002) Robust estimates of location and dispersion for high dimensional datasets. Am Stat Assoc Am Soc Qual Technometrics 44(4)

  • Pitsoulis L, Zioutas G (2010) A fast algorithm for robust regression with penalised trimmed squares. Comput Stat 25(4):663–689

    Article  MathSciNet  MATH  Google Scholar 

  • Roelant Ella, Van Aelst Stefan, Willems Gert (2009) The minimum weighted covariance determinant estimator. Metrika 70(2):177–204

  • Rousseeuw PJ (1984) Least median of squares regression. J Am Stat Assoc 79:871–881

    Article  MathSciNet  MATH  Google Scholar 

  • Rousseeuw PJ (1985) Multivariate estimation with high breakpoint. Math Stat Assoc B:101–121

    Google Scholar 

  • Rousseeuw PJ, Leroy AM (1987) Robust regression and outlier detection. Wiley, New York

    Book  MATH  Google Scholar 

  • Rousseeuw PJ, van Driesen K (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41:212–223

    Article  Google Scholar 

  • Stahel WA (1981) Breakdown of covariance estimators. Technical report, Fachgruppe fur Statistik, ETH Zurich

  • Tableman M (1994) The asymptotics of the least trimmed absolute deviations (ltad) estimator. Stat Probab Lett 19(5):387–398

    Article  MathSciNet  MATH  Google Scholar 

  • Tableman Mara (1994) The influence functions for the least trimmed squares and the least trimmed absolute deviations estimators. Stat Probab Lett 19(4):329–337

    Article  MathSciNet  MATH  Google Scholar 

  • Weiszfeld E (1937) Sur le point pour lequel la somme des distances de n points donnés est minimum. Tôhoku Math J 43:355–386

    MATH  Google Scholar 

  • Zioutas G, Avramidis A (2005) Deleting outliers in robust regression with mixed integer programming. Acta Math Appl Sin 21:323–334

    Article  MathSciNet  MATH  Google Scholar 

  • Zioutas G, Pitsoulis L, Avramidis A (2009) Quadratic mixed integer programming and support vectors for deleting outliers in robust regression. Ann Oper Res 166:339–353

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The authors would like to thank Peter Filzmoser for his helpful comments and suggestions on various aspects of the paper. Work of the second author was conducted at National Research University Higher School of Economics and supported by RSF Grant 14-41-00039

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to L. Pitsoulis.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chatzinakos, C., Pitsoulis, L. & Zioutas, G. Optimization techniques for robust multivariate location and scatter estimation. J Comb Optim 31, 1443–1460 (2016). https://doi.org/10.1007/s10878-015-9833-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10878-015-9833-6

Keywords