Skip to main content
Log in

Controlling the size of multivariate outlier tests with the MCD estimator of scatter

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Multivariate outlier detection requires computation of robust distances to be compared with appropriate cut-off points. In this paper we propose a new calibration method for obtaining reliable cut-off points of distances derived from the MCD estimator of scatter. These cut-off points are based on a more accurate estimate of the extreme tail of the distribution of robust distances. We show that our procedure gives reliable tests of outlyingness in almost all situations of practical interest, provided that the sample size is not much smaller than 50. Therefore, it is a considerable improvement over all the available MCD procedures, which are unable to provide good control over the size of multiple outlier tests for the data structures considered in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Arsenis, S., Perrotta, D., Torti, F.: Price outliers in EU external trade data. Internal working document on work presented at the “Enlargement and Integration Workshop 2005”, Joint Research Centre of the European Commission, http://theseus.jrc.it/index.php?id=1298 (2005)

  • Becker, C., Gather, U.: The masking breakdown point of multivariate outlier identification rules. J. Am. Stat. Assoc. 94, 947–955 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  • Becker, C., Gather, U.: The largest nonidentifiable outlier: a comparison of multivariate simultaneous outlier identification rules. Comput. Stat. Data Anal. 36, 119–127 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  • Butler, R.W., Davies, P.L., Jhun, M.: Asymptotics for the minimum covariance determinant estimator. Ann. Stat. 21, 1385–1400 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  • Cohen Freue, G.V., Hollander, Z., Shen, E., Zamar, R.H., Balshaw, R., Scherer, A., McManus, B., Keown, P., McMaster, W.R., Ng, R.T.: MDQC: A new quality assessment method for microarrays based on quality control reports. Bioinformatics 23, 3162–3169 (2007)

    Article  Google Scholar 

  • Croux, H., Haesbroeck, G.: Influence function and efficiency of the minimum covariance determinant scatter matrix estimator. J. Multivar. Anal. 71, 161–190 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  • Croux, H., Haesbroeck, G.: Principal component analysis based on robust estimators of the covariance or correlation matrix: Influence functions and efficiencies. Biometrika 87, 603–618 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  • Hardin, J., Rocke, D.M.: The distribution of robust distances. J. Comput. Graph. Stat. 14, 910–927 (2005)

    Article  MathSciNet  Google Scholar 

  • Lopuhaä, H.P.: Asymptotics of reweighted estimators of multivariate location and scatter. Ann. Stat. 27, 1638–1665 (1999)

    Article  MATH  Google Scholar 

  • Matsumoto, M., Nishimura, T.: Mersenne Twister: A 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Trans. Model. Comput. Simul. 8, 3–30 (1998)

    Article  MATH  Google Scholar 

  • Pison, G., Van Aelst, S.: Diagnostic plots for robust multivariate methods. J. Comput. Graph. Stat. 13, 310–329 (2004)

    Article  Google Scholar 

  • Pison, G., Van Aelst, S., Willems, G.: Small sample corrections for LTS and MCD. Metrika 55, 111–123 (2002)

    Article  MathSciNet  Google Scholar 

  • Riani, M., Cerioli, A., Atkinson, A., Perrotta, D., Torti, F.: Fitting mixtures of regression lines with the Forward Search. In: Fogelman-Soulié, F., Perrotta, D., Piskorski, J., Steinberger, R. (eds.) Mining Massive Data Sets for Security. IOS Press, Amsterdam (2008)

    Google Scholar 

  • Riani, M., Atkinson, A.C., Cerioli, A.: Finding an unknown number of multivariate outliers. J. R. Stat. Soc. Ser. B 71 (2009)

  • Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. Wiley, New York (1987)

    Book  MATH  Google Scholar 

  • Rousseeuw, P.J., Van Driessen, K.: A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212–223 (1999)

    Article  Google Scholar 

  • Rousseeuw, P.J., Van Zomeren, B.C.: Unmasking multivariate outliers and leverage points. J. Am. Stat. Assoc. 85, 633–9 (1990)

    Article  Google Scholar 

  • Rousseeuw, P.J., Van Aelst, S., Van Driessen, K., Agulló, J.: Robust multivariate regression. Technometrics 46, 293–305 (2004)

    Article  MathSciNet  Google Scholar 

  • Šidák, Z.: Rectangular confidence regions for the means of multivariate normal distributions. J. Am. Stat. Assoc. 62, 626–633 (1967)

    Article  MATH  Google Scholar 

  • Todorov, V.: Robust selection of variables in linear discriminant analysis. Stat. Methods Appl. 15, 395–407 (2006)

    Article  MathSciNet  Google Scholar 

  • Todorov, V.: A note on the MCD consistency and small sample correction factors. Unpublished manuscript (2008, in preparation)

  • Todorov, V., Filzmoser, P.: Robust statistics for the one-way MANOVA. Unpublished manuscript (2008, submitted for publication)

  • Willems, G., Pison, G., Rousseeuw, P.J., Van Aelst, S.: A robust Hotelling test. Metrika 55, 125–138 (2002)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrea Cerioli.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cerioli, A., Riani, M. & Atkinson, A.C. Controlling the size of multivariate outlier tests with the MCD estimator of scatter. Stat Comput 19, 341–353 (2009). https://doi.org/10.1007/s11222-008-9096-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-008-9096-5

Keywords

Navigation