RDELA—a Delaunay-triangulation-based, location and covariance estimator with high breakdown point

Liebscher, Steffen; Kirschstein, Thomas; Becker, Claudia

doi:10.1007/s11222-012-9337-5

RDELA—a Delaunay-triangulation-based, location and covariance estimator with high breakdown point

Published: 22 June 2012

Volume 23, pages 677–688, (2013)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Steffen Liebscher¹,
Thomas Kirschstein¹ &
Claudia Becker¹

375 Accesses
3 Citations
Explore all metrics

Abstract

We propose an approach that utilizes the Delaunay triangulation to identify a robust/outlier-free subsample. Given that the data structure of the non-outlying points is convex (e.g. of elliptical shape), this subsample can then be used to give a robust estimation of location and scatter (by applying the classical mean and covariance). The estimators derived from our approach are shown to have a high breakdown point. In addition, we provide a diagnostic plot to expand the initial subset in a data-driven way, further increasing the estimators’ efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Article 27 November 2022

References

Allard, D., Fraley, C.: Nonparametric maximum likelihood estimation of features in spatial point processes using Voronoi tessellation. J. Am. Stat. Assoc. 92(440), 1485–1493 (1997)
MATH Google Scholar
Alqallaf, F., Van Aelst, S., Yohai, V., Zamar, R.: Propagation of outliers in multivariate data. Ann. Stat. 37(1), 311–331 (2009)
Article MATH Google Scholar
Amenta, N., Attali, D., Devillers, O.: Complexity of Delaunay triangulation for points on lower-dimensional polyhedra. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete algorithms, SODA’07, pp. 1106–1113. Society for Industrial and Applied Mathematics, Philadelphia (2007)
Google Scholar
Amenta, N., Attali, D., Devillers, O.: A tight bound for the Delaunay triangulation of points on a polyhedron. Rapport de recherche RR-6522, INRIA (2008)
Atkinson, A., Riani, M., Cerioli, A.: Exploring Multivariate Data with the Forward Search. Springer, Berlin (2004)
Book MATH Google Scholar
Attali, D., Boissonnat, J.-D., Lieutier, A.: Complexity of the Delaunay triangulation of points on surfaces the smooth case. In: Proceedings of the Nineteenth Annual Symposium on Computational Geometry, SCG’03, pp. 201–210. ACM, New York (2003)
Google Scholar
Barnett, V., Lewis, T.: Outliers in Statistical Data, 3rd edn. Wiley, New York (2000)
Google Scholar
Becker, C., Gather, U.: The masking breakdown point of multivariate outlier identification rules. J. Am. Stat. Assoc. 94, 947–955 (1999)
Article MathSciNet MATH Google Scholar
Becker, C., Paris Scholz, S.: MVE, MCD, and MZE: a simulation study comparing convex body minimizers. Allg. Stat. Arch. 88(2), 155–162 (2004)
Article MathSciNet Google Scholar
Becker, C., Paris Scholz, S.: Deepest points and least deep points: robustness and outliers with MZE. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds.) From Data and Information Analysis to Knowledge Engineering, pp. 254–261. Springer, Berlin (2006)
Chapter Google Scholar
Cignoni, P., Montani, C., Scopigno, R.: Dewall: a fast divide and conquer Delaunay triangulation algorithm in ed. Comput. Aided Des. 30(5), 333–341 (1998)
Article MATH Google Scholar
Croux, C., Haesbroeck, G.: Principal component analysis based on robust estimators of the covariance or correlation matrix: influence functions and efficiencies. Biometrika 87(3), 603 (2000)
Article MathSciNet MATH Google Scholar
Davies, P.: Asymptotic behaviour of S-estimates of multivariate location parameters and dispersion matrices. Ann. Stat. 15(3), 1269–1292 (1987)
Article MATH Google Scholar
Davies, P.: The asymptotics of Rousseeuw’s minimum volume ellipsoid estimator. Ann. Stat. 20(4), 1828–1843 (1992)
Article MATH Google Scholar
Davies, P., Gather, U.: Breakdown and groups. Ann. Stat. 33(3), 977–988 (2005)
Article MathSciNet MATH Google Scholar
Davies, P., Gather, U.: Addendum to the discussion of “breakdown and groups”. Ann. Stat. 34(3), 1577–1579 (2006)
Article MathSciNet MATH Google Scholar
De Berg, M., Cheong, O., Van Kreveld, M., Overmars, M.: Computational Geometry: Algorithms and Applications. Springer, New York (2008)
Google Scholar
Delaunay, B.: Sur la sphere vide. Izv. Akad. Nauk SSSR, Ser. VII, Otd. Mat. Est. Nauk 7, 793–800 (1934)
Google Scholar
Donoho, D.: Breakdown properties of multivariate location estimators. Ph.D. thesis (1982)
Donoho, D., Huber, P.: The notion of breakdown point. In: A Festschrift for Erich Lehmann, pp. 157–184 (1983)
Google Scholar
Gather, U., Becker, C.: Outlier identification and robust methods. In: Maddala, G., Rao, C. (eds.) Robust Inference. Handbook of Statistics, vol. 15, pp. 123–143 (1997)
Chapter Google Scholar
Gnanadesikan, R., Kettenring, J.R.: Robust estimates, residuals, and outlier detection with multiresponse data. Biometrics 28(1), 81–124 (1972)
Article Google Scholar
Gower, J.: Euclidean distance geometry. Math. Sci. 7, 1–14 (1982)
MathSciNet MATH Google Scholar
Gower, J.C.: Algorithm as 78: the mediancentre. J. R. Stat. Soc., Ser. C, Appl. Stat. 23(3), 466–470 (1974)
Google Scholar
Gower, J.C.: Properties of Euclidean and non-Euclidean distance matrices. In: Linear Algebra and its Applications, vol. 67, pp. 81–97 (1985)
Google Scholar
Hampel, F., Ronchetti, E., Rousseeuw, P., Stahel, W.: Robust Statistics: The Approach Based on Influence Functions. Wiley, New York (2005)
Book Google Scholar
Hubert, M., Rousseeuw, P., Aelst, S.: High-breakdown robust multivariate methods. Stat. Sci. 23(1), 92–119 (2008)
Article Google Scholar
Kirschstein, T., Liebscher, S., Becker, C.: Robust Estimation of Location and Scatter by Pruning the Minimum Spanning Tree (2012, submitted for publication)
Leach, G.: Improving worst-case optimal Delaunay triangulation algorithms. In: 4th Canadian Conference on Computational Geometry, p. 15 (1992)
Google Scholar
Liebscher, S., Kirschstein, T., Becker, C.: The flood algorithm—a multivariate, self-organizing-map-based, robust location and covariance estimator. Stat. Comput. 22, 325–336 (2012). doi:10.1007/s11222-011-9250-3
Article MathSciNet Google Scholar
Lopuhaä, H.: Asymptotics of reweighted estimators of multivariate location and scatter. Ann. Stat. 27(5), 1638–1665 (1999)
Article MATH Google Scholar
Lopuhaä, H., Rousseeuw, P.: Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. Ann. Stat. 19(1), 229–248 (1991)
Article MATH Google Scholar
Maronna, R., Martin, D., Yohai, V.: Robust Statistics: Theory and Methods. John Wiley and Sons, Chichester (2006)
Book Google Scholar
Maronna, R.A., Zamar, R.H.: Robust estimates of location and dispersion for high-dimensional datasets. Technometrics 44(4), 307–317 (2002)
Article MathSciNet Google Scholar
McMullen, P.: The maximum numbers of faces of a convex polytope. Mathematika 17(02), 179–184 (1970)
Article MathSciNet MATH Google Scholar
Paris Scholz, S.: Robustness concepts and investigations for estimators of convex bodies. Ph.D. thesis (2002)
Pison, G., van Aelst, S., Willems, G.: Small sample corrections for LTS and MCD. Metrika 55, 111–123 (2002)
Article MathSciNet Google Scholar
Riani, M., Atkinson, A., Cerioli, A.: Finding an unknown number of multivariate outliers. J. R. Stat. Soc., Ser. B, Stat. Methodol. 71(2), 447–466 (2009)
Article MathSciNet MATH Google Scholar
Rocke, D.: Robustness properties of S-estimators of multivariate location and shape in high dimension. Ann. Stat. 24(3), 1327–1345 (1996)
Article MathSciNet MATH Google Scholar
Rocke, D., Woodruff, D.: Computation of robust estimators of multivariate location and shape. Stat. Neerl. 47(1), 27–42 (1993)
Article MathSciNet Google Scholar
Rousseeuw, P.: Multivariate estimation with high breakdown point. Math. Stat. Appl. 8, 283–297 (1985)
Article MathSciNet Google Scholar
Rousseeuw, P., Leroy, A.: Robust Regression and Outlier Detection. John Wiley and Sons, New York (1987)
Book MATH Google Scholar
Rousseeuw, P., van Driessen, K.: A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212–223 (1999)
Article Google Scholar
Su, P., Drysdale, R.L.S.: A comparison of sequential Delaunay triangulation algorithms. Comput. Geom. 7(5–6), 361–385. 11th ACM Symposium on Computational Geometry (1997)
Article MathSciNet MATH Google Scholar
Tyler, D.E.: A distribution-free m-estimator of multivariate scatter. Ann. Stat. 15(1), 234–251 (1987)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Martin-Luther-University, Gr. Steinstrasse 73, 06099, Halle (Saale), Germany
Steffen Liebscher, Thomas Kirschstein & Claudia Becker

Authors

Steffen Liebscher
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Kirschstein
View author publications
You can also search for this author in PubMed Google Scholar
Claudia Becker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Steffen Liebscher.

Appendix: Breakdown points of RDELA based estimators

Theorem 1

Let X be a collection of n≥d+1 points x ₁,…,x _n in dimension d, x _i∈ℝ^d, i=1,…,n, in general position, and let T(X)={t _k, k=1,…,l} be the corresponding Delaunay triangulation (where t _k denotes a d-simplex with $\mathbf{t}_{k}=\{\mathbf{x}_{i_{1}},\ldots,\mathbf{x}_{i_{d+1}}\}\subset \mathbf{X}$), and let μ _n and Σ _n be the RDELA estimators of location and covariance (as described in Sects. 2.2 and 2.3). Then the breakdown point ε ^∗ (defined as the smallest fraction of outliers that can take the estimate over all bounds) of these estimators is ε ^∗(μ _n,X)=⌊(n+1)/2⌋/n and ε ^∗(Σ _n,X)≥⌊(n−d+1)/2⌋/n respectively.

Proof

We first show that ε ^∗(μ _n,X) is at least ⌊(n+1)/2⌋/n.

To let the location estimate break down at least one point in the chosen subset has to grow to infinity. If for some x∈X we have ∥x∥→∞, then ∥x−y∥→∞ for all y∈X. For a subset of X consisting of d+1 points and its corresponding triangulation object t _k with x,y∈t _k it follows (due to the circumcircle property) that the radius of t _k grows also to infinity r(t _k)→∞.

Suppose that X=X′∪X″, where |X′|>|X″|=m=⌊(n−1)/2⌋, and X″ will be altered in an arbitrary way to cause the procedure’s breakdown. Denote Y⊂X the subset chosen by the RDELA procedure with |Y|=|X′|=n−m. Furthermore, let $\widetilde{\mathbf {T}}_{\mathbf{X}^{\prime}}= \{\mathbf {t}'_{k}=\{\mathbf{x}_{i_{1}},\ldots,\mathbf{x}_{i_{d+1}}\},\mathbf{x}_{i_{j}}\in \mathbf{X}^{\prime} \}$ as well as $\widetilde{\mathbf {T}}_{\mathbf {Y}}= \{\mathbf {t}^{y}_{k}=\{\mathbf{x}_{i_{1}},\ldots,\mathbf{x}_{i_{d+1}}\},\mathbf{x}_{i_{j}}\in \mathbf {Y}\}$ the sets of the corresponding triangulation objects with $\widetilde{\mathbf {T}}_{\mathbf{X}^{\prime}},\widetilde{\mathbf {T}}_{\mathbf {Y}}\subseteq \mathbf {T}(\mathbf{X})$. Then the radii of all triangulation objects $\mathbf {t}'_{k}$ are bounded, i.e. ∃α∈ℝ with $r(\mathbf {t}'_{k})<\alpha$.

Suppose now ∃Y′⊂Y in such a way that ∥x−y∥≥2⋅α where x∈Y′ and y∈Y∖Y′. Then it follows from the circumcircle property that for the triangulation objects $\mathbf {t}^{y'}_{k} \in\widetilde{\mathbf {T}}_{\mathbf {Y}}$ with $\mathbf{x},\mathbf {y}\in \mathbf {t}^{y'}_{k}$:

$$r\bigl(\mathbf {t}^{y'}_k\bigr) \geq\alpha. $$

But on the other hand if Y=X′, for all triangulation objects $\mathbf {t}^{y}_{k} \in\widetilde{\mathbf {T}}_{\mathbf {Y}}$ holds $r(\mathbf {t}^{y}_{k}) < \alpha$ as stated above. Hence, the algorithm described in Sect. 2.3 always terminates in X′. This proves that the breakdown point of the RDELA location estimator is at least ⌊(n+1)/2⌋/n.

The Delaunay triangulation is invariant against orthogonal transformations because its calculation is purely based on Euclidean distances which are invariant against this type of transformation. Consequently, this is also true for the estimators derived from the Delaunay triangulation.

As for orthogonal equivariant location estimators the maximum breakdown point is proved to be ⌊(n+1)/2⌋/n (Lopuhaä and Rousseeuw, 1991), this also ends the proof.

Now, we show that ε ^∗(Σ _n,X)≥⌊(n−d+1)/2⌋/n.

Denote 0<λ ₁≤⋯≤λ _d<∞ the eigenvalues of Σ _n(Y). The covariance estimator may break down by explosion (meaning that λ _d→∞) or by implosion (if λ ₁→0). For Σ _n to explode, at least one observation y from the chosen subset Y has to grow arbitrarily, ∥y∥→∞. This case is covered by the location estimator’s proof.

For the case of implosion note that each chosen subset Y which is of size ⌊(n+d+1)/2⌋ by construction contains at least d+1 unaltered points x _i∈X′. Because X′ is in general position each subset of X′ consisting of d+1 points is also in general position. Due to this Σ _n is positive semidefinite (and λ _i>0 ∀i). This proves that ε ^∗(Σ _n,X) is at least ⌊(n−d+1)/2⌋/n. □

It can be assumed that the breakdown point of the covariance estimator is at most ⌊(n−d+1)/2⌋/n. However, this has not been proved yet. This is primarily due to the fact that a triangulation is not defined for degenerated point configurations, which in turn are required to let the covariance estimate implode.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liebscher, S., Kirschstein, T. & Becker, C. RDELA—a Delaunay-triangulation-based, location and covariance estimator with high breakdown point. Stat Comput 23, 677–688 (2013). https://doi.org/10.1007/s11222-012-9337-5

Download citation

Received: 05 September 2011
Accepted: 28 May 2012
Published: 22 June 2012
Issue Date: November 2013
DOI: https://doi.org/10.1007/s11222-012-9337-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RDELA—a Delaunay-triangulation-based, location and covariance estimator with high breakdown point

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Author information

Authors and Affiliations

Corresponding author

Appendix: Breakdown points of RDELA based estimators

Theorem 1

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

RDELA—a Delaunay-triangulation-based, location and covariance estimator with high breakdown point

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Author information

Authors and Affiliations

Corresponding author

Appendix: Breakdown points of RDELA based estimators

Appendix: Breakdown points of RDELA based estimators

Theorem 1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation