Skip to main content
Log in

The flood algorithm—a multivariate, self-organizing-map-based, robust location and covariance estimator

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Self-organizing maps (SOMs) introduced by Kohonen (Biol. Cybern. 43(1):59–69, 1982) are well-known in the field of artificial neural networks. The way SOMs are performing is very intuitive, leading to great popularity and numerous applications (related to statistics: classification, clustering). The result of the unsupervised learning process performed by SOMs is a non-linear, low-dimensional projection of the high-dimensional input data, that preserves certain features of the underlying data, e.g. the topology and probability distribution (Lee and Verleysen in Nonlinear Dimensionality Reduction, Springer, 2007; Kohonen in Self-organizing Maps, 3rd edn., Springer, 2001).

With the U-matrix Ultsch (Information and Classification: Concepts, Methods and Applications, pp. 307–313, Springer, 1993) introduced a powerful visual representation of the SOM results. We propose an approach that utilizes the U-matrix to identify outlying data points. Then the revised subsample (i.e. the initial sample minus the outlying points) is used to give a robust estimation of location and scatter.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Atkinson, A.C.: Fast very robust methods for the detection of multiple outliers. J. Am. Stat. Assoc. 89(428), 1329–1339 (1994)

    Article  MATH  Google Scholar 

  • Barnett, V., Lewis, T.: Outliers in Statistical Data, 3rd edn. Wiley, New York (2000)

    Google Scholar 

  • Bartkowiak, A., Szustalewicz, A.: The grand tour as a method for detecting multivariate outliers. Mach. Graph. Vis. 6, 487–505 (1997)

    Google Scholar 

  • Becker, C., Gather, U.: The masking breakdown point of multivariate outlier identification rules. J. Am. Stat. Assoc. 94, 947–955 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  • Becker, C., Gather, U.: The largest nonidentifiable outlier: a comparison of multivariate simultaneous outlier identification rules. Comput. Stat. Data Anal. 36(1), 119–127 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  • Becker, C., Paris Scholz, S.: MVE, MCD, and MZE: A simulation study comparing convex body minimizers. Allg. Stat. Arch. 88(2), 155–162 (2004)

    MathSciNet  Google Scholar 

  • Becker, C., Paris Scholz, S.: Deepest points and least deep points: robustness and outliers with MZE. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds.) From Data and Information Analysis to Knowledge Engineering, pp. 254–261. Springer, Berlin (2006)

    Chapter  Google Scholar 

  • Cottrell, M., Fort, J.C., Pagès, G.: Theoretical aspects of the SOM algorithm. Neurocomputing 21(1–3), 119–138 (1998)

    Article  MATH  Google Scholar 

  • Croux, C., Haesbroeck, G.: Principal component analysis based on robust estimators of the covariance or correlation matrix: influence functions and efficiencies. Biometrika 87(3), 603 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  • Davies, P.: Asymptotic behaviour of S-estimates of multivariate location parameters and dispersion matrices. Ann. Stat. 15(3), 1269–1292 (1987)

    Article  MATH  Google Scholar 

  • Davies, P.: The asymptotics of Rousseeuw’s minimum volume ellipsoid estimator. Ann. Stat. 20(4), 1828–1843 (1992)

    Article  MATH  Google Scholar 

  • Davies, P., Gather, U.: Breakdown and groups. Ann. Stat. 33(3), 977–988 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • Davies, P., Gather, U.: Addendum to the discussion of “breakdown and groups”. Ann. Stat. 34(3), 1577–1579 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Erwin, E., Obermeyer, K., Schulten, K.: Convergence properties of self-organizing maps. Artif. Neural Netw. 1, 409–414 (1991)

    Google Scholar 

  • Erwin, E., Obermayer, K., Schulten, K.: Self-organizing maps: ordering, convergence properties and energy functions. Biol. Cybern. 67(1), 47–55 (1992a)

    Article  MATH  Google Scholar 

  • Erwin, E., Obermayer, K., Schulten, K.: Self-organizing maps: stationary states, metastability and convergence rate. Biol. Cybern. 67(1), 35–45 (1992b)

    Article  MATH  Google Scholar 

  • Fort, J.: SOM’s mathematics. Neural Netw. 19(6–7), 812–816 (2006). Advances in Self Organising Maps—WSOM’05

    Article  MATH  Google Scholar 

  • Fung, W.-K.: Unmasking outliers and leverage points: a confirmation. J. Am. Stat. Assoc. 88(422), 515–519 (1993)

    Article  MathSciNet  Google Scholar 

  • Gather, U., Becker, C.: Outlier identification and robust methods. In: Maddala, G., Rao, C. (eds.) Robust Inference. Handbook of Statistics, vol. 15, pp. 123–143 (1997)

    Google Scholar 

  • Hampel, F., Ronchetti, E., Rousseeuw, P., Stahel, W.: Robust Statistics: The Approach based on Influence Functions. Wiley, New York (2005)

    Book  Google Scholar 

  • Hardin, J., Rocke, D.: The distribution of robust distances. J. Comput. Graph. Stat. 14(4), 928–946 (2005)

    Article  MathSciNet  Google Scholar 

  • Haykin, S.: Neural Networks and Learning Machines, 3rd edn. Prentice Hall, New York (2009)

    Google Scholar 

  • Hertz, J., Palmer, R.G., Krogh, A.S.: Introduction to the Theory of Neural Computation. Perseus Publishing, New York (1991)

    Google Scholar 

  • Hubert, M., Rousseeuw, P., Aelst, S.: High-breakdown robust multivariate methods. Stat. Sci. 23(1), 92–119 (2008)

    Article  Google Scholar 

  • Kaski, S., Kangas, J., Kohonen, T.: Bibliography of self-organizing map (SOM) papers: 1981–1997. Neural Comput. Surv. 1(3&4), 1–176 (1998)

    Google Scholar 

  • Kohonen, T.: Self-organized formation of topologically correct feature maps. Biol. Cybern. 43(1), 59–69 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  • Kohonen, T.: Self-organizing Maps, 3rd edn. Springer, Berlin (2001)

    Book  MATH  Google Scholar 

  • Kohonen, T., Hynninen, J., Kangas, J., Laaksonen, J.: Som-pak: the self-organizing map program package. Report A31, Helsinki University of Technology, Laboratory of Computer and Information Science (1996)

  • Koshevoy, G., Mosler, K.: Zonoid trimming for multivariate distributions. Ann. Stat. 25(5), 1998–2017 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  • Koshevoy, G., Mosler, K.: Lift zonoids, random convex hulls, and the variability of random vectors. Bernoulli 4(3), 377–399 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  • Koshevoy, G., Möttönen, J., Oja, H.: A scatter matrix estimate based on the zonotope. Ann. Stat. 31(5), 1439–1459 (2003)

    Article  MATH  Google Scholar 

  • Lee, J., Verleysen, M.: Nonlinear Dimensionality Reduction. Springer, Berlin (2007)

    Book  MATH  Google Scholar 

  • Lopuhaä, H.: Asymptotics of reweighted estimators of multivariate location and scatter. Ann. Stat. 27(5), 1638–1665 (1999)

    Article  MATH  Google Scholar 

  • Lopuhaä, H., Rousseeuw, P.: Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. Ann. Stat. 19(1), 229–248 (1991)

    Article  MATH  Google Scholar 

  • Nag, A., Mitra, A., Mitra, S.: Multiple outlier detection in multivariate data using self-organizing maps title. Comput. Stat. 20(2), 245–264 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • Oja, M., Kaski, S., Kohonen, T.: Bibliography of self-organizing map (SOM) papers: 1998–2001 addendum. Neural Comput. Surv. 1, 1–176 (2002)

    Google Scholar 

  • Paris Scholz, S.: Robustness concepts and investigations for estimators of convex bodies. PhD thesis (2002)

  • Pison, G., van Aelst, S., Willems, G.: Small sample corrections for LTS and MCD. Metrika 55, 111–123 (2002)

    Article  MathSciNet  Google Scholar 

  • Pöllä, M., Honkela, T., Kohonen, T.: Bibliography of self-organizing map (som) papers: 2002–2005 addendum. Technical Report TKK-ICS-R23, Helsinki University of Technology, Department of Information and Computer Science, Espoo, Finland (2009)

  • Riani, M., Atkinson, A., Cerioli, A.: Finding an unknown number of multivariate outliers. J. R. Stat. Soc., Ser. B, Stat. Methodol. 71(2), 447–466 (2009)

    Article  MathSciNet  Google Scholar 

  • Rocke, D.: Robustness properties of S-estimators of multivariate location and shape in high dimension. Ann. Stat. 24(3), 1327–1345 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  • Rocke, D.M., Woodruff, D.L.: Identification of outliers in multivariate data. J. Am. Stat. Assoc. 91(435), 1047–1061 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  • Rojas, R.: Theorie der neuronalen Netze: eine systematische Einführung. Springer, Berlin (1993)

    Google Scholar 

  • Rousseeuw, P.: Multivariate estimation with high breakdown point. Math. Stat. Appl. 8, 283–297 (1985)

    Article  MathSciNet  Google Scholar 

  • Rousseeuw, P., Leroy, A.: Robust Regression and Outlier Detection. Wiley, New York (1987)

    Book  MATH  Google Scholar 

  • Rousseeuw, P., van Driessen, K.: A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212–223 (1999)

    Article  Google Scholar 

  • Rousseeuw, P., van Zomeren, B.: Unmasking multivariate outliers and leverage points. J. Am. Stat. Assoc. 85(411), 633–639 (1990)

    Article  Google Scholar 

  • Ultsch, A.: Self-organizing neural networks for visualization and classification. In: Opitz, O., Lausen, B., Klar, R. (eds.) Information and Classification: Concepts, Methods and Applications, pp. 307–313. Springer, Berlin (1993)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Steffen Liebscher.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liebscher, S., Kirschstein, T. & Becker, C. The flood algorithm—a multivariate, self-organizing-map-based, robust location and covariance estimator. Stat Comput 22, 325–336 (2012). https://doi.org/10.1007/s11222-011-9250-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-011-9250-3

Keywords

Navigation