Data Clustering Based on Maximization of Outlier Factor

Saltenis, Vydunas

doi:10.1007/s10898-005-5372-5

Data Clustering Based on Maximization of Outlier Factor

Published: August 2006

Volume 35, pages 625–635, (2006)
Cite this article

Journal of Global Optimization Aims and scope Submit manuscript

Vydunas Saltenis¹

82 Accesses
Explore all metrics

Abstract

There exist many data clustering algorithms, but they can not adequately handle the number of clusters or cluster shapes. Their performance mainly depends on a choice of algorithm parameters. Our approach to data clustering and algorithm does not require the parameter choice; it can be treated as a natural adaptation to the existing structure of distances between data points. The outlier factor introduced by the author specifies a degree of being an outlier for each data point. The outlier factor notion is based on the difference between the frequency distribution of interpoint distances in a given dataset and the corresponding distribution of uniformly distributed points. Then data clusters can be determined by maximizing the outlier factor function. The data points in dataset are divided into clusters according to the attractor regions of local optima. An experimental evaluation of the proposed algorithm shows that the proposed method can identify complex cluster shapes. Key advantages of the approach are: good clustering properties for datasets with comparatively large amount of noise (an additional data points), and an absence of important parameters which adequate choice determines the quality of results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Brin, S. (1995), Near Neighbor Search in Large Metric Spaces. In: Proceedings of the 21st International Conference on Very Large Databases (VLDB-1995), Zurich, Switzerland, Morgan Kaufmann, pp. 574–584.
N.R. Draper H. Smith (1966) Applied Regression Analysis Wiley New York
Google Scholar
Ertoz, L., Steinbach, M. and Kumar, V. (2002), A new shared nearest neighbor clustering algorithm and its applications, AHPCRC, Technical Report 134.
R.A. Fisher (1936) ArticleTitleThe use of multiple measurements in taxonomy problems Annals of Eugenics 7 179–188
Google Scholar
D.M. Hawkins D. Bradu G.V. Kass (1984) ArticleTitleLocation of several outliers in multiple regression data using elemental sets Technometrics 26 197–208 Occurrence Handle10.2307/1267545
Article Google Scholar
Hinneburg, A. and Keim, D. (1998), An efficient approach to clustering large multimedia databases with noise. In: Proceedings of the 4th ACM SIGKDD, New York, NY, pp. 58–65.
A.K. Jain R.C. Dubes (1988) Algorithms for Clustering Data Prentice Hall Englewood Cliffs, NJ
Google Scholar
A. Jain M.N. Murty P. Flynn (1999) ArticleTitleData clustering: a review ACM Computing Surveys 31 IssueID3 264–323 Occurrence Handle10.1145/331499.331504
Article Google Scholar
J. MacQueen (1967) Some methods for classification and analysis of multivariate observations L.M. Le Cam J. Neyman (Eds) Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume I: Statistics. University of California Press Berkeley and Los Angeles, CA 281–297
Google Scholar
V. Saltenis (2004) ArticleTitleOutlier detection based on the distribution of distances between data points Informatica 15 IssueID3 399–410
Google Scholar
Steinbach, M., Ertoz, L. and Kumar, V. (2003), Challenges of Clustering High Dimensional Data. New Vistas in Statistical Physics. Applications in Econophysics, Bioinformatics, and Pattern Recognition, Springer-Verlag, Berlin.

Download references

Author information

Authors and Affiliations

Institute of Mathematics and Informatics, Akademijos 4, LT-08663, Vilnius, Lithuania
Vydunas Saltenis

Authors

Vydunas Saltenis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vydunas Saltenis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saltenis, V. Data Clustering Based on Maximization of Outlier Factor. J Glob Optim 35, 625–635 (2006). https://doi.org/10.1007/s10898-005-5372-5

Download citation

Received: 17 November 2005
Accepted: 21 November 2005
Issue Date: August 2006
DOI: https://doi.org/10.1007/s10898-005-5372-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data Clustering Based on Maximization of Outlier Factor

Abstract

Access this article

Similar content being viewed by others

A New K-means-Based Algorithm for Automatic Clustering and Outlier Discovery

An Outlier Detection Algorithm Based on Spectral Clustering

Outlier Detection Using Subset Formation of Clustering Based Method

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Data Clustering Based on Maximization of Outlier Factor

Abstract

Access this article

Similar content being viewed by others

A New K-means-Based Algorithm for Automatic Clustering and Outlier Discovery

An Outlier Detection Algorithm Based on Spectral Clustering

Outlier Detection Using Subset Formation of Clustering Based Method

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation