Abstract
K-Means is one of the most important data mining techniques for scientists who want to analyze their data. But K-Means has the disadvantage that it is unable to handle noise data points. This paper proposes a technique that can be applied to the k-means Clustering result to exclude noise data points. We refer to it as KMN (short for K-Means with Noise). This technique is compatible with the different strategies to initialize k-means and determine the number of clusters. Moreover, it is completely parameter-free. The technique has been tested on artificial and real data sets to demonstrate its performance in comparison with other noise-excluding techniques for k-means.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
For additional information on the concept of voronoi cells we refer to [13] or the Wikipedia article.
References
Ahmed, M., Naser, A.: A novel approach for outlier detection and clustering improvement. In: ICIEA (2013)
Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: SODA (2007)
Avis, D., Fukuda, K.: A pivoting algorithm for convex hulls and vertex enumeration of arrangements and polyhedra. Discret. Comput. Geom. 8, 295–313 (1992)
Campos, G., et al.: On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min. Knowl. Discov. 30, 891–927 (2016)
Chawla S., Gionis A.: k-means–: a unified approach to clustering and outlier detection. In: ICDM (2013)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD (1996)
Gan, G., Kwok-Po Ng, M.: k-means clustering with outlier removal. Pattern Recognit. Lett. 90, 8–14 (2017)
Johnson, N., Kotz, S., Balakrishnan, N.: Continuous Univariate Distributions. Houghton Mifflin, Boston (1994)
Lichman, M.: UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences (2013)
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Berkeley Symposium on Mathematical Statistics and Probability (1967)
Mendez, J., Lorenzo, J.: computing voronoi adjacencies in high dimensional spaces by using linear programming. In: Latorre Carmona, P., Sánchez, J., Fred, A. (eds.) Mathematical Methodologies in Pattern Recognition and Machine Learning. Springer Proceedings in Mathematics & Statistics, vol. 30, pp. 33–49. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-5076-4_3
Pelleg, D., Moore A.W.: X-means: extending K-means with efficient estimation of the number of clusters. In: ICML (2000)
Preparata, F., Shamos, M.: Computational Geometry: An Introduction. Springer, New York (1985). https://doi.org/10.1007/978-1-4612-1098-6
Vinh, N.X., Bailey, J.: Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. JMLR 11, 2837–2854 (2011)
Wangh, J.J., Dhillon, I., Gleich, D.: Non-exhaustive, Overlapping k-means. In: SDM (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Schelling, B., Plant, C. (2018). KMN - Removing Noise from K-Means Clustering Results. In: Ordonez, C., Bellatreche, L. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2018. Lecture Notes in Computer Science(), vol 11031. Springer, Cham. https://doi.org/10.1007/978-3-319-98539-8_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-98539-8_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98538-1
Online ISBN: 978-3-319-98539-8
eBook Packages: Computer ScienceComputer Science (R0)