Skip to main content
Log in

The power of monitoring: how to make the most of a contaminated multivariate sample

  • Original Paper
  • Published:
Statistical Methods & Applications Aims and scope Submit manuscript

The Original Paper to this article was published on 15 March 2018

Abstract

Diagnostic tools must rely on robust high-breakdown methodologies to avoid distortion in the presence of contamination by outliers. However, a disadvantage of having a single, even if robust, summary of the data is that important choices concerning parameters of the robust method, such as breakdown point, have to be made prior to the analysis. The effect of such choices may be difficult to evaluate. We argue that an effective solution is to look at several pictures, and possibly to a whole movie, of the available data. This can be achieved by monitoring, over a range of parameter values, the results computed through the robust methodology of choice. We show the information gain that monitoring provides in the study of complex data structures through the analysis of multivariate datasets using different high-breakdown techniques. Our findings support the claim that the principle of monitoring is very flexible and that it can lead to robust estimators that are as efficient as possible. We also address through simulation some of the tricky inferential issues that arise from monitoring.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27

Similar content being viewed by others

References

  • Agostinelli C, Marazzi A, Yohai V (2014) Robust estimators of the generalized log-gamma distribution. Technometrics 56:92–101

    Article  MathSciNet  MATH  Google Scholar 

  • Alfons A, Croux C, Gelper S (2013) Sparse least trimmed squares regression for analyzing high-dimensional large data sets. Ann Appl Stat 7:226–248

    Article  MathSciNet  MATH  Google Scholar 

  • Amiguet M, Marazzi A, Valdora M, Yohai V (2017) Robust estimators for generalized linear models with a dispersion parameter. Technical Report 1703.09626v1, arXiv

  • Atkinson AC, Corbellini A, Riani M (2017a) Robust Bayesian regression with the forward search: theory and data analysis. Test, in press, https://doi.org/10.1007/s11749-017-0542-6

  • Atkinson AC, Riani M (2000) Robust diagnostic regression analysis. Springer, New York

    Book  MATH  Google Scholar 

  • Atkinson AC, Riani M (2007) Exploratory tools for clustering multivariate data. Comput Stat Data Anal 52:272–285

    Article  MathSciNet  MATH  Google Scholar 

  • Atkinson AC, Riani M, Cerioli A (2004) Exploring multivariate data with the forward search. Springer, New York

    Book  MATH  Google Scholar 

  • Atkinson AC, Riani M, Cerioli A (2010) The forward search: theory and data analysis (with discussion). J Korean Stat Soc 39:117–134

    Article  MATH  Google Scholar 

  • Atkinson AC, Riani M, Cerioli A (2017) Cluster detection and clustering with random start forward searches. J Appl Stat, in press, https://doi.org/10.1080/02664763.2017.1310806

  • Avella-Medina M, Ronchetti E (2015) Robust statistics: a selective overview and new directions. WIREs Comput Stat 7:372–393

    Article  MathSciNet  Google Scholar 

  • Azzalini A, Bowman A (1990) A look at some data on the Old Faithful geyser. Appl Stat 39:357–365

    Article  MATH  Google Scholar 

  • Boudt K, Rousseeuw P, Vanduffel S, Verdonck T (2017) The minimum regularized covariance determinant estimator. Technical Report 1701.07086v1, arXiv

  • Cerioli A (2010) Multivariate outlier detection with high-breakdown estimators. J Am Stat Assoc 105:147–156

    Article  MathSciNet  MATH  Google Scholar 

  • Cerioli A, Farcomeni A (2011) Error rates for multivariate outlier detection. Comput Stat Data Anal 55:544–553

    Article  MathSciNet  MATH  Google Scholar 

  • Cerioli A, Riani M (1999) The ordering of spatial data and the detection of multiple outliers. J Comput Gr Stat 8:239–258

    MathSciNet  Google Scholar 

  • Cerioli A, Riani M, Atkinson AC (2009) Controlling the size of multivariate outlier tests with the MCD estimator of scatter. Stat Comput 19:341–353

    Article  MathSciNet  Google Scholar 

  • Cerioli A, Farcomeni A, Riani M (2014) Strong consistency and robustness of the forward search estimator of multivariate location and scatter. J Multivar Anal 126:167–183

    Article  MathSciNet  MATH  Google Scholar 

  • Cerioli A, Atkinson AC, Riani M (2016) How to marry robustness and applied statistics. In: Di Battista T, Moreno E, Racugno W (eds) Topics on methodological and applied statistical inference. Springer, Heidelberg, pp 51–64

    Google Scholar 

  • Cerioli A, Farcomeni A, Riani M (2017) Wild adaptive trimming for robust estimation and cluster analysis. Submitted

  • Clarke BR, Schubert DD (2006) An adaptive trimmed likelihood algorithm for identification of multivariate outliers. Aust N Z J Stat 48:353–371

    Article  MathSciNet  MATH  Google Scholar 

  • Croux H, Haesbroeck G (1999) Influence function and efficiency of the minimum covariance determinant scatter matrix estimator. J Multivar Anal 71:161–190

    Article  MathSciNet  MATH  Google Scholar 

  • Davies PL (1987) Asymptotic behaviour of S-estimates of multivariate location parameters and dispersion matrices ellipsoid estimator. Ann Stat 15:1269–1292

    Article  MATH  Google Scholar 

  • Dotto F, Farcomeni A, García-Escudero LA, Mayo-Iscar A (2017) A reweighting approach to robust clustering. Stat Comput, in press, https://doi.org/10.1007/s11222-017-9742-x

  • Farcomeni A, Greco L (2015) Robust methods for data reduction. Chapman and Hall/CRC, Boca Raton

    Book  MATH  Google Scholar 

  • García-Escudero LA, Gordaliza A (2005) Generalized radius processes for elliptically contoured distributions. J Am Stat Assoc 100:1036–1045

    Article  MathSciNet  MATH  Google Scholar 

  • Green CG, Martin D (2014) An extension of a method of Hardin and Rocke, with an application to multivariate outlier detection via the IRMCD method of Cerioli. Technical Report available at http://christopherggreen.github.io/papers, Department of Statistics, University of Washington

  • Hardin J, Rocke DM (2005) The distribution of robust distances. J Comput Gr Stat 14:910–927

    Article  MathSciNet  Google Scholar 

  • Huber PJ, Ronchetti EM (2009) Robust statistics, 2nd edn. Wiley, Hoboken

    Book  MATH  Google Scholar 

  • Hubert M, Rousseeuw PJ, Van Aelst S (2008) High-breakdown robust multivariate methods. Stat Sci 23:92–119

    Article  MathSciNet  MATH  Google Scholar 

  • Hubert M, Rousseeuw PJ, Siegaert P (2015) Multivariate functional outlier detection (with discussion). Stat Methods Appl 24:177–202

    Article  MathSciNet  MATH  Google Scholar 

  • Johansen S, Nielsen B (2016a) Analysis of the Forward Search using some new results for martingales and empirical processes. Bernoulli 22:1131–1183

    Article  MathSciNet  MATH  Google Scholar 

  • Johansen S, Nielsen B (2016b) Asymptotic theory of outlier detection algorithms for linear time series regression models (with discussion). Scand J Stat 43:321–348

    Article  MathSciNet  MATH  Google Scholar 

  • Lopuhaä HP, Rousseeuw PJ (1991) Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. Ann Stat 19:229–248

    Article  MathSciNet  MATH  Google Scholar 

  • Maronna RA, Martin RD, Yohai VJ (2006) Robust statistics. Wiley, Chichester

    Book  MATH  Google Scholar 

  • Pison G, Van Aelst S, Willems G (2002) Small sample corrections for LTS and MCD. Metrika 55:111–123

    Article  MathSciNet  MATH  Google Scholar 

  • Riani M, Atkinson AC (2001) Regression diagnostics for binomial data from the forward search. J R Stat Soc Ser D 50:63–78

    MathSciNet  Google Scholar 

  • Riani M, Atkinson AC (2007) Fast calibrations of the forward search for testing multiple outliers in regression. Adv Data Anal Classif 1:123–141

    Article  MathSciNet  MATH  Google Scholar 

  • Riani M, Atkinson AC, Cerioli A (2009) Finding an unknown number of multivariate outliers. J R Stat Soc Ser B 71:447–466

    Article  MathSciNet  MATH  Google Scholar 

  • Riani M, Cerioli A, Atkinson AC, Perrotta D (2014a) Monitoring robust regression. Electron J Stat 8:646–677

    Article  MathSciNet  MATH  Google Scholar 

  • Riani M, Cerioli A, Torti F (2014b) On consistency factors and efficiency of robust S-estimators. Test 23:356–387

    Article  MathSciNet  MATH  Google Scholar 

  • Riani M, Atkinson AC, Perrotta D (2014c) A parametric framework for the comparison of methods of very robust regression. Stat Sci 29:128–143

    Article  MathSciNet  MATH  Google Scholar 

  • Riani M, Perrotta D, Cerioli A (2015) The forward search for very large datasets. J Stat Softw 67:1

    Article  Google Scholar 

  • Rousseeuw PJ, Leroy AM (1987) Robust regression and outlier detection. Wiley, New York

    Book  MATH  Google Scholar 

  • Salini S, Cerioli A, Laurini F, Riani M (2016) Reliable robust regression diagnostics. Int Stat Rev 84:99–127

    Article  MathSciNet  Google Scholar 

  • Tallis GM (1963) Elliptical and radial truncation in normal samples. Ann Math Stat 34:940–944

    Article  MATH  Google Scholar 

  • Venables WN, Ripley BD (2002) Modern applied statistics with S, 4th edn. Springer, New York

    Book  MATH  Google Scholar 

  • Yohai VJ (1987) High breakdown-point and high efficiency estimates for regression. Ann Stat 15:642–656

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We are very grateful to the Editor, Tommaso Proietti, for inviting this paper and for organizing its discussion. We also thank Alessio Farcomeni, Luca Greco, Domenico Perrotta and two anonymous reviewers for helpful comments on a previous draft. MR and ACA gratefully acknowledge support from the CRoNoS project, reference CRoNoS COST Action IC1408.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrea Cerioli.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cerioli, A., Riani, M., Atkinson, A.C. et al. The power of monitoring: how to make the most of a contaminated multivariate sample. Stat Methods Appl 27, 559–587 (2018). https://doi.org/10.1007/s10260-017-0409-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10260-017-0409-8

Keywords

Navigation