Skip to main content
Log in

Snipping for robust k-means clustering under component-wise contamination

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

We introduce the concept of snipping, complementing that of trimming, in robust cluster analysis. An observation is snipped when some of its dimensions are discarded, but the remaining are used for clustering and estimation. Snipped k-means is performed through a probabilistic optimization algorithm which is guaranteed to converge to the global optimum. We show global robustness properties of our snipped k-means procedure. Simulations and a real data application to optical recognition of handwritten digits are used to illustrate and compare the approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Alqallaf, F., Van Aelst, S., Yohai, V.J., Zamar, R.H.: Propagation of outliers in multivariate data. Ann. Stat. 37, 311–331 (2009)

    Article  MATH  Google Scholar 

  • Banfield, J., Raftery, A.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–821 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  • Chakraborty, B., Chaudhury, P.: On an optimization problem in robust statistics. J. Comput. Graph. Stat. 17, 683–702 (2008)

    Article  Google Scholar 

  • Cuesta-Albertos, J., Gordaliza, A., Matrán, C.: Trimmed k-means: an attempt to robustify quantizers. Ann. Stat. 25, 553–576 (1997)

    Article  MATH  Google Scholar 

  • Donoho, D., Huber, P.: The notion of breakdown point. In: Bickel, P., Doksum, K., Hodges, J. (eds.) A Festschirift for Erich L. Lehmann, Wadsworth, Belmont, CA, pp. 157–184 (1983)

    Google Scholar 

  • Farcomeni, A.: Robust double clustering: a method based on alternating concentration steps. J. Classif. 26, 77–101 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  • Farcomeni, A.: Robust constrained clustering in presence of entry-wise outliers. Technometrics (2013, to appear)

  • Farcomeni, A., Ventura, L.: An overview of robust methods in medical research. Stat. Methods Med. Res. 21, 111–133 (2012)

    Article  MathSciNet  Google Scholar 

  • Forero, P.A., Kekatos, V., Giannakis, G.B.: Robust clustering using outlier-sparsity regularization. IEEE Trans. Signal Process. 60, 4163–4177 (2012)

    Article  MathSciNet  Google Scholar 

  • Frank, A., Asuncion, A.: UCI machine learning repository (2010). http://archive.ics.uci.edu/ml

  • Frühwirth-Schnatter, S., Pyne, S.: Bayesian inference for finite mixtures of univariate skew-normal and skew-t distributions. Biostatistics 11, 317–336 (2010)

    Article  Google Scholar 

  • Gallegos, M., Ritter, G.: A robust method for cluster analysis. Ann. Stat. 33, 347–380 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  • Gallegos, M., Ritter, G.: Trimmed ML estimation of contaminated mixtures. Sankhya 71, 164–220 (2009a)

    MATH  MathSciNet  Google Scholar 

  • Gallegos, M., Ritter, G.: Trimming algorithms for clustering contaminated grouped data and their robustness. Adv. Data Anal. Classif. 3, 135–167 (2009b)

    Article  MATH  MathSciNet  Google Scholar 

  • Gallegos, M., Ritter, G.: Using combinatorial optimization in model-based trimmed clustering with cardinality constraints. Comput. Stat. Data Anal. 54, 637–654 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  • García-Escudero, L., Gordaliza, A.: Robustness properties of k means and trimmed k means. J. Am. Stat. Assoc. 94, 956–969 (1999)

    MATH  Google Scholar 

  • García-Escudero, L.A., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: A general trimming approach to robust cluster analysis. Ann. Stat. 36, 1324–1345 (2008)

    Article  MATH  Google Scholar 

  • García-Escudero, L.A., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: A review of robust clustering methods. Adv. Data Anal. Classif. 4, 89–109 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  • García-Escudero, L.A., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: Exploring the number of groups in robust model-based clustering. Stat. Comput. 21, 585–599 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  • Gordaliza, A.: Best approximations to random variables based on trimming procedures. J. Approx. Theory 64, 162–180 (1991)

    Article  MATH  MathSciNet  Google Scholar 

  • Hampel, F.: A general qualitative definition of robustness. Ann. Math. Stat. 42, 1887–1896 (1971)

    Article  MATH  MathSciNet  Google Scholar 

  • Hampel, F., Rousseeuw, P., Ronchetti, E., Stahel, W.: Robust Statistics: the Approach Based on the Influence Function. Wiley, New York (1986)

    MATH  Google Scholar 

  • Hennig, C.: Dissolution point and isolation robustness: robustness criteria for general cluster analysis methods. J. Multivar. Anal. 99, 11541176 (2008)

    MathSciNet  Google Scholar 

  • Heritier, S., Cantoni, E., Copt, S., Victoria-Feser, M.P.: Robust Methods in Biostatistics. Wiley, Chichester (2009)

    Book  MATH  Google Scholar 

  • Hodges, J.: Efficiency in normal samples and tolerance of extreme values for some estimates of location. In: Proc. Fifth Berkeley Symp. Math. Statist. Probab., vol. 1, pp. 163–186. University of California Press, Berkeley (1967)

    Google Scholar 

  • Huber, P., Ronchetti, E.: Robust Statistics. Wiley, New York (2009)

    Book  MATH  Google Scholar 

  • Huber, P.J.: Robust estimation of a location parameter. Ann. Math. Stat. 35, 73–101 (1964)

    Article  MATH  Google Scholar 

  • Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)

    Article  Google Scholar 

  • Hubert, M., Rousseeuw, P., Van Aelst, S.: High-breakdown robust multivariate methods. Stat. Sci. 23, 92–119 (2008)

    Article  Google Scholar 

  • Kaufman, L., Rousseeuw, P.: Finding Groups in Data. Wiley, New York (1990)

    Book  Google Scholar 

  • Rousseeuw, P., Van Driessen, K.: A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212–223 (1999)

    Article  Google Scholar 

  • Ruwet, C., Garcia-Escudero, L., Gordaliza, A., Mayo-Iscar, A.: On the breakdown behavior of robust constrained clustering procedures. TEST (2012, to appear)

  • Tukey, J.W.: The future of data analysis. Ann. Math. Stat. 33, 167 (1962)

    MathSciNet  Google Scholar 

Download references

Acknowledgements

The author is grateful to an AE and two anonymous referees for very kind suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alessio Farcomeni.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Farcomeni, A. Snipping for robust k-means clustering under component-wise contamination. Stat Comput 24, 907–919 (2014). https://doi.org/10.1007/s11222-013-9410-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-013-9410-8

Keywords

Navigation