KMN - Removing Noise from K-Means Clustering Results

Schelling, Benjamin; Plant, Claudia

doi:10.1007/978-3-319-98539-8_11

Benjamin Schelling¹⁵ &
Claudia Plant^15,16

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11031))

Included in the following conference series:

International Conference on Big Data Analytics and Knowledge Discovery

1073 Accesses
4 Citations

Abstract

K-Means is one of the most important data mining techniques for scientists who want to analyze their data. But K-Means has the disadvantage that it is unable to handle noise data points. This paper proposes a technique that can be applied to the k-means Clustering result to exclude noise data points. We refer to it as KMN (short for K-Means with Noise). This technique is compatible with the different strategies to initialize k-means and determine the number of clusters. Moreover, it is completely parameter-free. The technique has been tested on artificial and real data sets to demonstrate its performance in comparison with other noise-excluding techniques for k-means.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
For additional information on the concept of voronoi cells we refer to [13] or the Wikipedia article.

References

Ahmed, M., Naser, A.: A novel approach for outlier detection and clustering improvement. In: ICIEA (2013)
Google Scholar
Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: SODA (2007)
Google Scholar
Avis, D., Fukuda, K.: A pivoting algorithm for convex hulls and vertex enumeration of arrangements and polyhedra. Discret. Comput. Geom. 8, 295–313 (1992)
Article MathSciNet Google Scholar
Campos, G., et al.: On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min. Knowl. Discov. 30, 891–927 (2016)
Article MathSciNet Google Scholar
Chawla S., Gionis A.: k-means–: a unified approach to clustering and outlier detection. In: ICDM (2013)
Google Scholar
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD (1996)
Google Scholar
Gan, G., Kwok-Po Ng, M.: k-means clustering with outlier removal. Pattern Recognit. Lett. 90, 8–14 (2017)
Article Google Scholar
Johnson, N., Kotz, S., Balakrishnan, N.: Continuous Univariate Distributions. Houghton Mifflin, Boston (1994)
MATH Google Scholar
Lichman, M.: UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences (2013)
Google Scholar
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Berkeley Symposium on Mathematical Statistics and Probability (1967)
Google Scholar
Mendez, J., Lorenzo, J.: computing voronoi adjacencies in high dimensional spaces by using linear programming. In: Latorre Carmona, P., Sánchez, J., Fred, A. (eds.) Mathematical Methodologies in Pattern Recognition and Machine Learning. Springer Proceedings in Mathematics & Statistics, vol. 30, pp. 33–49. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-5076-4_3
Chapter Google Scholar
Pelleg, D., Moore A.W.: X-means: extending K-means with efficient estimation of the number of clusters. In: ICML (2000)
Google Scholar
Preparata, F., Shamos, M.: Computational Geometry: An Introduction. Springer, New York (1985). https://doi.org/10.1007/978-1-4612-1098-6
Book MATH Google Scholar
Vinh, N.X., Bailey, J.: Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. JMLR 11, 2837–2854 (2011)
MathSciNet MATH Google Scholar
Wangh, J.J., Dhillon, I., Gleich, D.: Non-exhaustive, Overlapping k-means. In: SDM (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Vienna, Vienna, Austria
Benjamin Schelling & Claudia Plant
ds:UniVie, Vienna, Austria
Claudia Plant

Authors

Benjamin Schelling
View author publications
You can also search for this author in PubMed Google Scholar
Claudia Plant
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Benjamin Schelling .

Editor information

Editors and Affiliations

University of Houston, Houston, Texas, USA
Carlos Ordonez
LIAS/ISAE-ENSMA, Chasseneuil-du-Poitou, France
Ladjel Bellatreche

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schelling, B., Plant, C. (2018). KMN - Removing Noise from K-Means Clustering Results. In: Ordonez, C., Bellatreche, L. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2018. Lecture Notes in Computer Science(), vol 11031. Springer, Cham. https://doi.org/10.1007/978-3-319-98539-8_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-98539-8_11
Published: 08 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98538-1
Online ISBN: 978-3-319-98539-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics