Skip to main content
Log in

Multiple outlier detection in multivariate data using self-organizing maps title

  • Published:
Computational Statistics Aims and scope Submit manuscript

Summary

The problem of detection of multidimensional outliers is a fundamental and important problem in applied statistics. The unreliability of multivariate outlier detection techniques such as Mahalanobis distance and hat matrix leverage has led to development of techniques which have been known in the statistical community for well over a decade. The literature on this subject is vast and growing. In this paper, we propose to use the artificial intelligence technique ofself-organizing map (SOM) for detecting multiple outliers in multidimensional datasets. SOM, which produces a topology-preserving mapping of the multidimensional data cloud onto lower dimensional visualizable plane, provides an easy way of detection of multidimensional outliers in the data, at respective levels of leverage. The proposed SOM based method for outlier detection not only identifies the multidimensional outliers, it actually provides information about the entire outlier neighbourhood. Being an artificial intelligence technique, SOM based outlier detection technique is non-parametric and can be used to detect outliers from very large multidimensional datasets. The method is applied to detect outliers from varied types of simulated multivariate datasets, a benchmark dataset and also to real life cheque processing dataset. The results show that SOM can effectively be used as a useful technique for multidimensional outlier detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10a
Figure 11a & 11b
Figure 12a & 12b
Figure 13a & 13b
Figure 14a & 14b

Similar content being viewed by others

Notes

  1. 1 Let X be an n × p matrix representing sample of n points in ℜp and \(S=n^{-1}(X-\overline{X})^{T}(X-\overline{X})\) denote the sample covariance matrix. Then the shape of the sample X is given by s/|S|1/p.

References

  • Atkinson, A.C. (1994), ‘Fast very robust methods for detection of multiple outliers’,Journal of American Statistical Association,89, 1329–1339.

    Article  Google Scholar 

  • Bartkowiak, A. & Szustalewicz, A.(1997), ‘The grand tour method for detecting multivariate outliers’,Machine Graphics & Vision,6, 487–505.

    Google Scholar 

  • Cambell, N.A. (1980), ‘Robust procedures in multivariate analysis I: Robust covariance estimation’,Applied Statistics,29, 231–237, 1980.

    Article  Google Scholar 

  • Cambell, N.A. (1982), ‘Robust procedures in multivariate analysis II: Robust canonical variate analysis’,Applied Statistics,31, 1–8.

    Article  Google Scholar 

  • Davies, P.L. (1987), ‘Asymptotic behavior of S-estimators of multivariate location parameters and dispersion matrices’,The Annals of Statistics,15, 1269–1292.

    Article  MathSciNet  Google Scholar 

  • Devlin, S.J., Gnanadesikan, R. & Kettenring, J.R. (1981), ‘Robust estimation of dispersion matrices and principal components’,Journal of American Statistical Association,76, 354–362.

    Article  Google Scholar 

  • Donoho, D.L. (1982),Breakdown properties of multivariate location estimators, Ph.D. qualifying paper, Harvard University, Department of Statistics.

  • Fung, W.K. (1993), ‘Unmasking outliers and leverage points: A confirmation’,Journal of American Statistical Association,88, 515–519.

    Article  MathSciNet  Google Scholar 

  • Hadi, A.S. (1992), ‘Identifying multiple outliers in multivariate data’,Journal of Royal Statistical Society, Ser. B,54, 761–771.

    MathSciNet  Google Scholar 

  • Hadi, A.S. and Simonoff, J.S. (1993), ‘Procedures for the identification of multiple outliers in linear Models’,Journal of American Statistical Association,88, 1264–1272.

    Article  MathSciNet  Google Scholar 

  • Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., and Stahel, W.A. (1986),Robust Statistics, New York, John Wiley.

    MATH  Google Scholar 

  • Hawkins, D.M. (1980),The identification of outliers, London, Chapman and Hall.

    Book  Google Scholar 

  • Hawkins, D.M.(1993), ‘A feasible solution algorithm for the minimum volume ellipsoid estimators’,Computational Statistics,9, 95–107.

    Google Scholar 

  • Hawkins, D.M. (1994), ‘The feasible solution algorithm for the minimum covariance determinant estimator in multivariate data,Computational Statistics and Data Analysis,17, 197–210.

    Article  Google Scholar 

  • Hawkins, D.M., Bradu, D. and Kass, G.V. (1984), ‘Location of several outliers in multiple regression using elemental subsets,Technometrics,26, 197–208.

    Article  MathSciNet  Google Scholar 

  • Huber, P.J. (1981),Robust Statistics, New York, John Wiley.

    Book  Google Scholar 

  • Kaski, S. (1997),Data exploration using self-organizing maps. Acta Polytechnica Scandinavica, Mathematics, Computing and Management in Engineering Series No. 82, D.Sc.(Tech) Thesis, Helsinki University of Technology, Finland.

    MATH  Google Scholar 

  • Kohonen, T. (1989),Self Organization and Associative Memory, Third edition, Heidelberg, Springer-Verlag, Berlin.

    Book  Google Scholar 

  • Kohonen, T. (1990), ‘The Self organizing maps’,Proceeding of the IEEE,78(9), 1464–1480.

    Article  Google Scholar 

  • Kohonen, T. (1997),Self-Organizing Maps, Second edition, Springer-Verlag, Berlin.

    Book  Google Scholar 

  • Iivarinen, J., Kohonen, T., Kangas, J., and Kaski, S.(1994), ‘Visualizing the clusters on the self-organizing map’,Proceedings of the Conference on Artificial Intelligence Research in Finland, Eds. C. Carlsson, T. Järvi and T. Reponen, Finnish Artificial Intelligence Society, Helsinki, Finland, 122–126.

    Google Scholar 

  • Lopuhaä, H.P., and Rousseeuw, P.J. (1989), ‘Breakdown point of affine equivariant estimators of multivariate location and covariance matrices’,The Annals of Statistics,17, 1662–1683.

    Article  MathSciNet  Google Scholar 

  • Maronna, R.A. (1976),‘Robust M-estimators of multivariate location and scatter’,The Annals of Statistics,4, 51–67.

    Article  MathSciNet  Google Scholar 

  • Oja, M., Nikkilä, J., Törönen, P., Castrén, E. and Kaski, S (2002), ‘Learning metrics for visualizing gene functional similarities’,STeP 2002 —Intelligence, The Art of Natural and Artificial (Eds. Pekka Ala-Siuru and Samuel Kaski).The 10th Finnish Artificial Intelligence Conference, Oulu, Finland, 31–40.

  • Ritter, H. and Schulten, K. (1986), ‘On the stationary state of Kohonen’s self-organizing sensory mapping’,Biological Cybernatics,54, 99–106.

    Article  Google Scholar 

  • Ritter, H. and Schulten, K. (1989), ‘Convergence properties of Kohonen’s topology conserving maps: Fluctuations, stability and dimension selection’,Biological Cybernatics,69, 59–71.

    MATH  Google Scholar 

  • Rocke, D.M. (1996), ‘Robustness properties of S-estimators of multivariate location and shape in high dimension’,The Annals of Statistics,24, 1327–1345.

    Article  MathSciNet  Google Scholar 

  • Rocke, D.M. and Woodruff, D.L. (1996),‘Identification of outliers in multivariate data’,Journal of American Statistical Association,91, 1047–1061.

    Article  MathSciNet  Google Scholar 

  • Rousseeuw, P.J. (1985),Multivariate estimation with high breakdown point, Mathematical Statistics and Applications, Volume B, eds. W. Grossman, G. Pflug, I. Vincze and W. Werz, Dordrecht: Reidel.

    MATH  Google Scholar 

  • Rousseeuw, P.J. and Leroy, A.M. (1987),Robust regression and outlier detection, New York, John Wiley.

    Book  Google Scholar 

  • Rousseeuw, P.J. and van Zomeren, B.C. (1990), ‘Unmasking multivariate outliers and leverage points (with discussion)’,Journal of American Statistical Association,85, 633–651.

    Article  Google Scholar 

  • Ruppert, D. (1992), ‘Computing S-estimators for regression and multivariate location/dispersion’,Journal of Computational and Graphical Statistics,1, 253–270.

    Google Scholar 

  • Tyler, D.E. (1988), ‘Some results on the existence, uniqueness and computation of the M-estimates of multivariate location and scatter’,SI AM Journal on Scientific and Statistical Computing,9, 354–362.

    Article  MathSciNet  Google Scholar 

  • Tyler, D.E. (1991),‘Some issues in the robust estimation of multivariate location and scatter’,Directions in Robust Statistics and Diagnostics Part II, eds. W. Stahel and S. Weisberg, New York: Springer-Verlag.

    Google Scholar 

  • Ultsch, A. (1992), ‘Self-organizing neural networks for visualisation and classification’,Proc. Conf. Soc. for Information and Classification, Dortmund, April 1992.

  • Ultsch, A. (1993), ‘Self-organizing neural networks for visualization and classification’,Information and Classification, eds. O. Opitz, B. Lausen and R. Klar, Springer-Verlag, Berlin, 307–313.

    Chapter  Google Scholar 

  • Woodruff, D.L. and Rocke, D.M. (1993), ‘Heuristic search algorithms for the minimum volume ellipsoid’,Journal of Computational and Graphical Statistics,2, 69–95.

    Google Scholar 

  • Woodruff, D.L. and Rocke, D.M. (1994), ‘Computable robust estimation of multivariate location and shape in high dimension using compound estimators’,Journal of American Statistical Association,89, 888–896.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nag, A.K., Mitra, A. & Mitra, S. Multiple outlier detection in multivariate data using self-organizing maps title. Computational Statistics 20, 245–264 (2005). https://doi.org/10.1007/BF02789702

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02789702

Keywords

Navigation