Skip to main content

KMN - Removing Noise from K-Means Clustering Results

  • Conference paper
  • First Online:
Book cover Big Data Analytics and Knowledge Discovery (DaWaK 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11031))

Included in the following conference series:

Abstract

K-Means is one of the most important data mining techniques for scientists who want to analyze their data. But K-Means has the disadvantage that it is unable to handle noise data points. This paper proposes a technique that can be applied to the k-means Clustering result to exclude noise data points. We refer to it as KMN (short for K-Means with Noise). This technique is compatible with the different strategies to initialize k-means and determine the number of clusters. Moreover, it is completely parameter-free. The technique has been tested on artificial and real data sets to demonstrate its performance in comparison with other noise-excluding techniques for k-means.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    For additional information on the concept of voronoi cells we refer to [13] or the Wikipedia article.

References

  1. Ahmed, M., Naser, A.: A novel approach for outlier detection and clustering improvement. In: ICIEA (2013)

    Google Scholar 

  2. Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: SODA (2007)

    Google Scholar 

  3. Avis, D., Fukuda, K.: A pivoting algorithm for convex hulls and vertex enumeration of arrangements and polyhedra. Discret. Comput. Geom. 8, 295–313 (1992)

    Article  MathSciNet  Google Scholar 

  4. Campos, G., et al.: On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min. Knowl. Discov. 30, 891–927 (2016)

    Article  MathSciNet  Google Scholar 

  5. Chawla S., Gionis A.: k-means–: a unified approach to clustering and outlier detection. In: ICDM (2013)

    Google Scholar 

  6. Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD (1996)

    Google Scholar 

  7. Gan, G., Kwok-Po Ng, M.: k-means clustering with outlier removal. Pattern Recognit. Lett. 90, 8–14 (2017)

    Article  Google Scholar 

  8. Johnson, N., Kotz, S., Balakrishnan, N.: Continuous Univariate Distributions. Houghton Mifflin, Boston (1994)

    MATH  Google Scholar 

  9. Lichman, M.: UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences (2013)

    Google Scholar 

  10. MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Berkeley Symposium on Mathematical Statistics and Probability (1967)

    Google Scholar 

  11. Mendez, J., Lorenzo, J.: computing voronoi adjacencies in high dimensional spaces by using linear programming. In: Latorre Carmona, P., Sánchez, J., Fred, A. (eds.) Mathematical Methodologies in Pattern Recognition and Machine Learning. Springer Proceedings in Mathematics & Statistics, vol. 30, pp. 33–49. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-5076-4_3

    Chapter  Google Scholar 

  12. Pelleg, D., Moore A.W.: X-means: extending K-means with efficient estimation of the number of clusters. In: ICML (2000)

    Google Scholar 

  13. Preparata, F., Shamos, M.: Computational Geometry: An Introduction. Springer, New York (1985). https://doi.org/10.1007/978-1-4612-1098-6

    Book  MATH  Google Scholar 

  14. Vinh, N.X., Bailey, J.: Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. JMLR 11, 2837–2854 (2011)

    MathSciNet  MATH  Google Scholar 

  15. Wangh, J.J., Dhillon, I., Gleich, D.: Non-exhaustive, Overlapping k-means. In: SDM (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Benjamin Schelling .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Schelling, B., Plant, C. (2018). KMN - Removing Noise from K-Means Clustering Results. In: Ordonez, C., Bellatreche, L. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2018. Lecture Notes in Computer Science(), vol 11031. Springer, Cham. https://doi.org/10.1007/978-3-319-98539-8_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-98539-8_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-98538-1

  • Online ISBN: 978-3-319-98539-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics