Skip to main content
Log in

RDELA—a Delaunay-triangulation-based, location and covariance estimator with high breakdown point

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

We propose an approach that utilizes the Delaunay triangulation to identify a robust/outlier-free subsample. Given that the data structure of the non-outlying points is convex (e.g. of elliptical shape), this subsample can then be used to give a robust estimation of location and scatter (by applying the classical mean and covariance). The estimators derived from our approach are shown to have a high breakdown point. In addition, we provide a diagnostic plot to expand the initial subset in a data-driven way, further increasing the estimators’ efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  • Allard, D., Fraley, C.: Nonparametric maximum likelihood estimation of features in spatial point processes using Voronoi tessellation. J. Am. Stat. Assoc. 92(440), 1485–1493 (1997)

    MATH  Google Scholar 

  • Alqallaf, F., Van Aelst, S., Yohai, V., Zamar, R.: Propagation of outliers in multivariate data. Ann. Stat. 37(1), 311–331 (2009)

    Article  MATH  Google Scholar 

  • Amenta, N., Attali, D., Devillers, O.: Complexity of Delaunay triangulation for points on lower-dimensional polyhedra. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete algorithms, SODA’07, pp. 1106–1113. Society for Industrial and Applied Mathematics, Philadelphia (2007)

    Google Scholar 

  • Amenta, N., Attali, D., Devillers, O.: A tight bound for the Delaunay triangulation of points on a polyhedron. Rapport de recherche RR-6522, INRIA (2008)

  • Atkinson, A., Riani, M., Cerioli, A.: Exploring Multivariate Data with the Forward Search. Springer, Berlin (2004)

    Book  MATH  Google Scholar 

  • Attali, D., Boissonnat, J.-D., Lieutier, A.: Complexity of the Delaunay triangulation of points on surfaces the smooth case. In: Proceedings of the Nineteenth Annual Symposium on Computational Geometry, SCG’03, pp. 201–210. ACM, New York (2003)

    Google Scholar 

  • Barnett, V., Lewis, T.: Outliers in Statistical Data, 3rd edn. Wiley, New York (2000)

    Google Scholar 

  • Becker, C., Gather, U.: The masking breakdown point of multivariate outlier identification rules. J. Am. Stat. Assoc. 94, 947–955 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  • Becker, C., Paris Scholz, S.: MVE, MCD, and MZE: a simulation study comparing convex body minimizers. Allg. Stat. Arch. 88(2), 155–162 (2004)

    Article  MathSciNet  Google Scholar 

  • Becker, C., Paris Scholz, S.: Deepest points and least deep points: robustness and outliers with MZE. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds.) From Data and Information Analysis to Knowledge Engineering, pp. 254–261. Springer, Berlin (2006)

    Chapter  Google Scholar 

  • Cignoni, P., Montani, C., Scopigno, R.: Dewall: a fast divide and conquer Delaunay triangulation algorithm in ed. Comput. Aided Des. 30(5), 333–341 (1998)

    Article  MATH  Google Scholar 

  • Croux, C., Haesbroeck, G.: Principal component analysis based on robust estimators of the covariance or correlation matrix: influence functions and efficiencies. Biometrika 87(3), 603 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  • Davies, P.: Asymptotic behaviour of S-estimates of multivariate location parameters and dispersion matrices. Ann. Stat. 15(3), 1269–1292 (1987)

    Article  MATH  Google Scholar 

  • Davies, P.: The asymptotics of Rousseeuw’s minimum volume ellipsoid estimator. Ann. Stat. 20(4), 1828–1843 (1992)

    Article  MATH  Google Scholar 

  • Davies, P., Gather, U.: Breakdown and groups. Ann. Stat. 33(3), 977–988 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • Davies, P., Gather, U.: Addendum to the discussion of “breakdown and groups”. Ann. Stat. 34(3), 1577–1579 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • De Berg, M., Cheong, O., Van Kreveld, M., Overmars, M.: Computational Geometry: Algorithms and Applications. Springer, New York (2008)

    Google Scholar 

  • Delaunay, B.: Sur la sphere vide. Izv. Akad. Nauk SSSR, Ser. VII, Otd. Mat. Est. Nauk 7, 793–800 (1934)

    Google Scholar 

  • Donoho, D.: Breakdown properties of multivariate location estimators. Ph.D. thesis (1982)

  • Donoho, D., Huber, P.: The notion of breakdown point. In: A Festschrift for Erich Lehmann, pp. 157–184 (1983)

    Google Scholar 

  • Gather, U., Becker, C.: Outlier identification and robust methods. In: Maddala, G., Rao, C. (eds.) Robust Inference. Handbook of Statistics, vol. 15, pp. 123–143 (1997)

    Chapter  Google Scholar 

  • Gnanadesikan, R., Kettenring, J.R.: Robust estimates, residuals, and outlier detection with multiresponse data. Biometrics 28(1), 81–124 (1972)

    Article  Google Scholar 

  • Gower, J.: Euclidean distance geometry. Math. Sci. 7, 1–14 (1982)

    MathSciNet  MATH  Google Scholar 

  • Gower, J.C.: Algorithm as 78: the mediancentre. J. R. Stat. Soc., Ser. C, Appl. Stat. 23(3), 466–470 (1974)

    Google Scholar 

  • Gower, J.C.: Properties of Euclidean and non-Euclidean distance matrices. In: Linear Algebra and its Applications, vol. 67, pp. 81–97 (1985)

    Google Scholar 

  • Hampel, F., Ronchetti, E., Rousseeuw, P., Stahel, W.: Robust Statistics: The Approach Based on Influence Functions. Wiley, New York (2005)

    Book  Google Scholar 

  • Hubert, M., Rousseeuw, P., Aelst, S.: High-breakdown robust multivariate methods. Stat. Sci. 23(1), 92–119 (2008)

    Article  Google Scholar 

  • Kirschstein, T., Liebscher, S., Becker, C.: Robust Estimation of Location and Scatter by Pruning the Minimum Spanning Tree (2012, submitted for publication)

  • Leach, G.: Improving worst-case optimal Delaunay triangulation algorithms. In: 4th Canadian Conference on Computational Geometry, p. 15 (1992)

    Google Scholar 

  • Liebscher, S., Kirschstein, T., Becker, C.: The flood algorithm—a multivariate, self-organizing-map-based, robust location and covariance estimator. Stat. Comput. 22, 325–336 (2012). doi:10.1007/s11222-011-9250-3

    Article  MathSciNet  Google Scholar 

  • Lopuhaä, H.: Asymptotics of reweighted estimators of multivariate location and scatter. Ann. Stat. 27(5), 1638–1665 (1999)

    Article  MATH  Google Scholar 

  • Lopuhaä, H., Rousseeuw, P.: Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. Ann. Stat. 19(1), 229–248 (1991)

    Article  MATH  Google Scholar 

  • Maronna, R., Martin, D., Yohai, V.: Robust Statistics: Theory and Methods. John Wiley and Sons, Chichester (2006)

    Book  Google Scholar 

  • Maronna, R.A., Zamar, R.H.: Robust estimates of location and dispersion for high-dimensional datasets. Technometrics 44(4), 307–317 (2002)

    Article  MathSciNet  Google Scholar 

  • McMullen, P.: The maximum numbers of faces of a convex polytope. Mathematika 17(02), 179–184 (1970)

    Article  MathSciNet  MATH  Google Scholar 

  • Paris Scholz, S.: Robustness concepts and investigations for estimators of convex bodies. Ph.D. thesis (2002)

  • Pison, G., van Aelst, S., Willems, G.: Small sample corrections for LTS and MCD. Metrika 55, 111–123 (2002)

    Article  MathSciNet  Google Scholar 

  • Riani, M., Atkinson, A., Cerioli, A.: Finding an unknown number of multivariate outliers. J. R. Stat. Soc., Ser. B, Stat. Methodol. 71(2), 447–466 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Rocke, D.: Robustness properties of S-estimators of multivariate location and shape in high dimension. Ann. Stat. 24(3), 1327–1345 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  • Rocke, D., Woodruff, D.: Computation of robust estimators of multivariate location and shape. Stat. Neerl. 47(1), 27–42 (1993)

    Article  MathSciNet  Google Scholar 

  • Rousseeuw, P.: Multivariate estimation with high breakdown point. Math. Stat. Appl. 8, 283–297 (1985)

    Article  MathSciNet  Google Scholar 

  • Rousseeuw, P., Leroy, A.: Robust Regression and Outlier Detection. John Wiley and Sons, New York (1987)

    Book  MATH  Google Scholar 

  • Rousseeuw, P., van Driessen, K.: A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212–223 (1999)

    Article  Google Scholar 

  • Su, P., Drysdale, R.L.S.: A comparison of sequential Delaunay triangulation algorithms. Comput. Geom. 7(5–6), 361–385. 11th ACM Symposium on Computational Geometry (1997)

    Article  MathSciNet  MATH  Google Scholar 

  • Tyler, D.E.: A distribution-free m-estimator of multivariate scatter. Ann. Stat. 15(1), 234–251 (1987)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Steffen Liebscher.

Appendix: Breakdown points of RDELA based estimators

Appendix: Breakdown points of RDELA based estimators

Theorem 1

Let X be a collection of nd+1 points x 1,…,x n in dimension d, x i ∈ℝd, i=1,…,n, in general position, and let T(X)={t k , k=1,…,l} be the corresponding Delaunay triangulation (where t k denotes a d-simplex with \(\mathbf{t}_{k}=\{\mathbf{x}_{i_{1}},\ldots,\mathbf{x}_{i_{d+1}}\}\subset \mathbf{X}\)), and let μ n and Σ n be the RDELA estimators of location and covariance (as described in Sects2.2 and 2.3). Then the breakdown point ε (defined as the smallest fraction of outliers that can take the estimate over all bounds) of these estimators is ε (μ n ,X)=⌊(n+1)/2⌋/n and ε (Σ n ,X)≥⌊(nd+1)/2⌋/n respectively.

Proof

We first show that ε (μ n ,X) is at least ⌊(n+1)/2⌋/n.

To let the location estimate break down at least one point in the chosen subset has to grow to infinity. If for some xX we have ∥x∥→∞, then ∥xy∥→∞ for all yX. For a subset of X consisting of d+1 points and its corresponding triangulation object t k with x,yt k it follows (due to the circumcircle property) that the radius of t k grows also to infinity r(t k )→∞.

Suppose that X=X′∪X″, where |X′|>|X″|=m=⌊(n−1)/2⌋, and X″ will be altered in an arbitrary way to cause the procedure’s breakdown. Denote YX the subset chosen by the RDELA procedure with |Y|=|X′|=nm. Furthermore, let \(\widetilde{\mathbf {T}}_{\mathbf{X}^{\prime}}= \{\mathbf {t}'_{k}=\{\mathbf{x}_{i_{1}},\ldots,\mathbf{x}_{i_{d+1}}\},\mathbf{x}_{i_{j}}\in \mathbf{X}^{\prime} \}\) as well as \(\widetilde{\mathbf {T}}_{\mathbf {Y}}= \{\mathbf {t}^{y}_{k}=\{\mathbf{x}_{i_{1}},\ldots,\mathbf{x}_{i_{d+1}}\},\mathbf{x}_{i_{j}}\in \mathbf {Y}\}\) the sets of the corresponding triangulation objects with \(\widetilde{\mathbf {T}}_{\mathbf{X}^{\prime}},\widetilde{\mathbf {T}}_{\mathbf {Y}}\subseteq \mathbf {T}(\mathbf{X})\). Then the radii of all triangulation objects \(\mathbf {t}'_{k}\) are bounded, i.e. ∃α∈ℝ with \(r(\mathbf {t}'_{k})<\alpha\).

Suppose now ∃Y′⊂Y in such a way that ∥xy∥≥2⋅α where xY′ and yYY′. Then it follows from the circumcircle property that for the triangulation objects \(\mathbf {t}^{y'}_{k} \in\widetilde{\mathbf {T}}_{\mathbf {Y}}\) with \(\mathbf{x},\mathbf {y}\in \mathbf {t}^{y'}_{k}\):

$$r\bigl(\mathbf {t}^{y'}_k\bigr) \geq\alpha. $$

But on the other hand if Y=X′, for all triangulation objects \(\mathbf {t}^{y}_{k} \in\widetilde{\mathbf {T}}_{\mathbf {Y}}\) holds \(r(\mathbf {t}^{y}_{k}) < \alpha\) as stated above. Hence, the algorithm described in Sect. 2.3 always terminates in X′. This proves that the breakdown point of the RDELA location estimator is at least ⌊(n+1)/2⌋/n.

The Delaunay triangulation is invariant against orthogonal transformations because its calculation is purely based on Euclidean distances which are invariant against this type of transformation. Consequently, this is also true for the estimators derived from the Delaunay triangulation.

As for orthogonal equivariant location estimators the maximum breakdown point is proved to be ⌊(n+1)/2⌋/n (Lopuhaä and Rousseeuw, 1991), this also ends the proof.

Now, we show that ε (Σ n ,X)≥⌊(nd+1)/2⌋/n.

Denote 0<λ 1≤⋯≤λ d <∞ the eigenvalues of Σ n (Y). The covariance estimator may break down by explosion (meaning that λ d →∞) or by implosion (if λ 1→0). For Σ n to explode, at least one observation y from the chosen subset Y has to grow arbitrarily, ∥y∥→∞. This case is covered by the location estimator’s proof.

For the case of implosion note that each chosen subset Y which is of size ⌊(n+d+1)/2⌋ by construction contains at least d+1 unaltered points x i X′. Because X′ is in general position each subset of X′ consisting of d+1 points is also in general position. Due to this Σ n is positive semidefinite (and λ i >0 ∀i). This proves that ε (Σ n ,X) is at least ⌊(nd+1)/2⌋/n. □

It can be assumed that the breakdown point of the covariance estimator is at most ⌊(nd+1)/2⌋/n. However, this has not been proved yet. This is primarily due to the fact that a triangulation is not defined for degenerated point configurations, which in turn are required to let the covariance estimate implode.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liebscher, S., Kirschstein, T. & Becker, C. RDELA—a Delaunay-triangulation-based, location and covariance estimator with high breakdown point. Stat Comput 23, 677–688 (2013). https://doi.org/10.1007/s11222-012-9337-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-012-9337-5

Keywords

Navigation