Abstract
Self-organizing maps (SOMs) introduced by Kohonen (Biol. Cybern. 43(1):59–69, 1982) are well-known in the field of artificial neural networks. The way SOMs are performing is very intuitive, leading to great popularity and numerous applications (related to statistics: classification, clustering). The result of the unsupervised learning process performed by SOMs is a non-linear, low-dimensional projection of the high-dimensional input data, that preserves certain features of the underlying data, e.g. the topology and probability distribution (Lee and Verleysen in Nonlinear Dimensionality Reduction, Springer, 2007; Kohonen in Self-organizing Maps, 3rd edn., Springer, 2001).
With the U-matrix Ultsch (Information and Classification: Concepts, Methods and Applications, pp. 307–313, Springer, 1993) introduced a powerful visual representation of the SOM results. We propose an approach that utilizes the U-matrix to identify outlying data points. Then the revised subsample (i.e. the initial sample minus the outlying points) is used to give a robust estimation of location and scatter.
Similar content being viewed by others
References
Atkinson, A.C.: Fast very robust methods for the detection of multiple outliers. J. Am. Stat. Assoc. 89(428), 1329–1339 (1994)
Barnett, V., Lewis, T.: Outliers in Statistical Data, 3rd edn. Wiley, New York (2000)
Bartkowiak, A., Szustalewicz, A.: The grand tour as a method for detecting multivariate outliers. Mach. Graph. Vis. 6, 487–505 (1997)
Becker, C., Gather, U.: The masking breakdown point of multivariate outlier identification rules. J. Am. Stat. Assoc. 94, 947–955 (1999)
Becker, C., Gather, U.: The largest nonidentifiable outlier: a comparison of multivariate simultaneous outlier identification rules. Comput. Stat. Data Anal. 36(1), 119–127 (2001)
Becker, C., Paris Scholz, S.: MVE, MCD, and MZE: A simulation study comparing convex body minimizers. Allg. Stat. Arch. 88(2), 155–162 (2004)
Becker, C., Paris Scholz, S.: Deepest points and least deep points: robustness and outliers with MZE. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds.) From Data and Information Analysis to Knowledge Engineering, pp. 254–261. Springer, Berlin (2006)
Cottrell, M., Fort, J.C., Pagès, G.: Theoretical aspects of the SOM algorithm. Neurocomputing 21(1–3), 119–138 (1998)
Croux, C., Haesbroeck, G.: Principal component analysis based on robust estimators of the covariance or correlation matrix: influence functions and efficiencies. Biometrika 87(3), 603 (2000)
Davies, P.: Asymptotic behaviour of S-estimates of multivariate location parameters and dispersion matrices. Ann. Stat. 15(3), 1269–1292 (1987)
Davies, P.: The asymptotics of Rousseeuw’s minimum volume ellipsoid estimator. Ann. Stat. 20(4), 1828–1843 (1992)
Davies, P., Gather, U.: Breakdown and groups. Ann. Stat. 33(3), 977–988 (2005)
Davies, P., Gather, U.: Addendum to the discussion of “breakdown and groups”. Ann. Stat. 34(3), 1577–1579 (2006)
Erwin, E., Obermeyer, K., Schulten, K.: Convergence properties of self-organizing maps. Artif. Neural Netw. 1, 409–414 (1991)
Erwin, E., Obermayer, K., Schulten, K.: Self-organizing maps: ordering, convergence properties and energy functions. Biol. Cybern. 67(1), 47–55 (1992a)
Erwin, E., Obermayer, K., Schulten, K.: Self-organizing maps: stationary states, metastability and convergence rate. Biol. Cybern. 67(1), 35–45 (1992b)
Fort, J.: SOM’s mathematics. Neural Netw. 19(6–7), 812–816 (2006). Advances in Self Organising Maps—WSOM’05
Fung, W.-K.: Unmasking outliers and leverage points: a confirmation. J. Am. Stat. Assoc. 88(422), 515–519 (1993)
Gather, U., Becker, C.: Outlier identification and robust methods. In: Maddala, G., Rao, C. (eds.) Robust Inference. Handbook of Statistics, vol. 15, pp. 123–143 (1997)
Hampel, F., Ronchetti, E., Rousseeuw, P., Stahel, W.: Robust Statistics: The Approach based on Influence Functions. Wiley, New York (2005)
Hardin, J., Rocke, D.: The distribution of robust distances. J. Comput. Graph. Stat. 14(4), 928–946 (2005)
Haykin, S.: Neural Networks and Learning Machines, 3rd edn. Prentice Hall, New York (2009)
Hertz, J., Palmer, R.G., Krogh, A.S.: Introduction to the Theory of Neural Computation. Perseus Publishing, New York (1991)
Hubert, M., Rousseeuw, P., Aelst, S.: High-breakdown robust multivariate methods. Stat. Sci. 23(1), 92–119 (2008)
Kaski, S., Kangas, J., Kohonen, T.: Bibliography of self-organizing map (SOM) papers: 1981–1997. Neural Comput. Surv. 1(3&4), 1–176 (1998)
Kohonen, T.: Self-organized formation of topologically correct feature maps. Biol. Cybern. 43(1), 59–69 (1982)
Kohonen, T.: Self-organizing Maps, 3rd edn. Springer, Berlin (2001)
Kohonen, T., Hynninen, J., Kangas, J., Laaksonen, J.: Som-pak: the self-organizing map program package. Report A31, Helsinki University of Technology, Laboratory of Computer and Information Science (1996)
Koshevoy, G., Mosler, K.: Zonoid trimming for multivariate distributions. Ann. Stat. 25(5), 1998–2017 (1997)
Koshevoy, G., Mosler, K.: Lift zonoids, random convex hulls, and the variability of random vectors. Bernoulli 4(3), 377–399 (1998)
Koshevoy, G., Möttönen, J., Oja, H.: A scatter matrix estimate based on the zonotope. Ann. Stat. 31(5), 1439–1459 (2003)
Lee, J., Verleysen, M.: Nonlinear Dimensionality Reduction. Springer, Berlin (2007)
Lopuhaä, H.: Asymptotics of reweighted estimators of multivariate location and scatter. Ann. Stat. 27(5), 1638–1665 (1999)
Lopuhaä, H., Rousseeuw, P.: Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. Ann. Stat. 19(1), 229–248 (1991)
Nag, A., Mitra, A., Mitra, S.: Multiple outlier detection in multivariate data using self-organizing maps title. Comput. Stat. 20(2), 245–264 (2005)
Oja, M., Kaski, S., Kohonen, T.: Bibliography of self-organizing map (SOM) papers: 1998–2001 addendum. Neural Comput. Surv. 1, 1–176 (2002)
Paris Scholz, S.: Robustness concepts and investigations for estimators of convex bodies. PhD thesis (2002)
Pison, G., van Aelst, S., Willems, G.: Small sample corrections for LTS and MCD. Metrika 55, 111–123 (2002)
Pöllä, M., Honkela, T., Kohonen, T.: Bibliography of self-organizing map (som) papers: 2002–2005 addendum. Technical Report TKK-ICS-R23, Helsinki University of Technology, Department of Information and Computer Science, Espoo, Finland (2009)
Riani, M., Atkinson, A., Cerioli, A.: Finding an unknown number of multivariate outliers. J. R. Stat. Soc., Ser. B, Stat. Methodol. 71(2), 447–466 (2009)
Rocke, D.: Robustness properties of S-estimators of multivariate location and shape in high dimension. Ann. Stat. 24(3), 1327–1345 (1996)
Rocke, D.M., Woodruff, D.L.: Identification of outliers in multivariate data. J. Am. Stat. Assoc. 91(435), 1047–1061 (1996)
Rojas, R.: Theorie der neuronalen Netze: eine systematische Einführung. Springer, Berlin (1993)
Rousseeuw, P.: Multivariate estimation with high breakdown point. Math. Stat. Appl. 8, 283–297 (1985)
Rousseeuw, P., Leroy, A.: Robust Regression and Outlier Detection. Wiley, New York (1987)
Rousseeuw, P., van Driessen, K.: A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212–223 (1999)
Rousseeuw, P., van Zomeren, B.: Unmasking multivariate outliers and leverage points. J. Am. Stat. Assoc. 85(411), 633–639 (1990)
Ultsch, A.: Self-organizing neural networks for visualization and classification. In: Opitz, O., Lausen, B., Klar, R. (eds.) Information and Classification: Concepts, Methods and Applications, pp. 307–313. Springer, Berlin (1993)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liebscher, S., Kirschstein, T. & Becker, C. The flood algorithm—a multivariate, self-organizing-map-based, robust location and covariance estimator. Stat Comput 22, 325–336 (2012). https://doi.org/10.1007/s11222-011-9250-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-011-9250-3