Skip to main content
Log in

Data Clustering Based on Maximization of Outlier Factor

  • Published:
Journal of Global Optimization Aims and scope Submit manuscript

Abstract

There exist many data clustering algorithms, but they can not adequately handle the number of clusters or cluster shapes. Their performance mainly depends on a choice of algorithm parameters. Our approach to data clustering and algorithm does not require the parameter choice; it can be treated as a natural adaptation to the existing structure of distances between data points. The outlier factor introduced by the author specifies a degree of being an outlier for each data point. The outlier factor notion is based on the difference between the frequency distribution of interpoint distances in a given dataset and the corresponding distribution of uniformly distributed points. Then data clusters can be determined by maximizing the outlier factor function. The data points in dataset are divided into clusters according to the attractor regions of local optima. An experimental evaluation of the proposed algorithm shows that the proposed method can identify complex cluster shapes. Key advantages of the approach are: good clustering properties for datasets with comparatively large amount of noise (an additional data points), and an absence of important parameters which adequate choice determines the quality of results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Brin, S. (1995), Near Neighbor Search in Large Metric Spaces. In: Proceedings of the 21st International Conference on Very Large Databases (VLDB-1995), Zurich, Switzerland, Morgan Kaufmann, pp. 574–584.

  2. N.R. Draper H. Smith (1966) Applied Regression Analysis Wiley New York

    Google Scholar 

  3. Ertoz, L., Steinbach, M. and Kumar, V. (2002), A new shared nearest neighbor clustering algorithm and its applications, AHPCRC, Technical Report 134.

  4. R.A. Fisher (1936) ArticleTitleThe use of multiple measurements in taxonomy problems Annals of Eugenics 7 179–188

    Google Scholar 

  5. D.M. Hawkins D. Bradu G.V. Kass (1984) ArticleTitleLocation of several outliers in multiple regression data using elemental sets Technometrics 26 197–208 Occurrence Handle10.2307/1267545

    Article  Google Scholar 

  6. Hinneburg, A. and Keim, D. (1998), An efficient approach to clustering large multimedia databases with noise. In: Proceedings of the 4th ACM SIGKDD, New York, NY, pp. 58–65.

  7. A.K. Jain R.C. Dubes (1988) Algorithms for Clustering Data Prentice Hall Englewood Cliffs, NJ

    Google Scholar 

  8. A. Jain M.N. Murty P. Flynn (1999) ArticleTitleData clustering: a review ACM Computing Surveys 31 IssueID3 264–323 Occurrence Handle10.1145/331499.331504

    Article  Google Scholar 

  9. J. MacQueen (1967) Some methods for classification and analysis of multivariate observations L.M. Le Cam J. Neyman (Eds) Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume I: Statistics. University of California Press Berkeley and Los Angeles, CA 281–297

    Google Scholar 

  10. V. Saltenis (2004) ArticleTitleOutlier detection based on the distribution of distances between data points Informatica 15 IssueID3 399–410

    Google Scholar 

  11. Steinbach, M., Ertoz, L. and Kumar, V. (2003), Challenges of Clustering High Dimensional Data. New Vistas in Statistical Physics. Applications in Econophysics, Bioinformatics, and Pattern Recognition, Springer-Verlag, Berlin.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vydunas Saltenis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saltenis, V. Data Clustering Based on Maximization of Outlier Factor. J Glob Optim 35, 625–635 (2006). https://doi.org/10.1007/s10898-005-5372-5

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10898-005-5372-5

Keywords

Navigation