Skip to main content

A Variant of the K-Means Clustering Algorithm for Continuous-Nominal Data

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 403))

Abstract

The core idea of the proposed algorithm is to embed the considered dataset into a metric space. Two spaces for embedding of nominal part with the Hamming metric are considered: Euclidean space (the classical approach) and the standard unit sphere \(\mathbb S\) (our new approach). We proved that the distortion of embedding into the unit sphere is at least 75 % better than that of the classical approach. In our model, combinations of continuous and nominal data are interpreted as points of a cylinder \(\mathbb R^p\times \mathbb S\), where p is the dimension of continuous data. We use a version of the gradient algorithm to compute centroids of finite sets on a cylinder. Experimental results show certain advances of the new algorithm. Specifically, it produces better clusters in tests with predefined groups.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Bourgaine, J.: On lipschitz embeddings of finite metric spaces in Hilbert space. Isr. J. Math. 52, 46–52 (1985)

    Article  Google Scholar 

  2. Enflo, P.: On the nonexistence of uniform homeomorphisms between Lp spaces. Ark. Mat. 8, 5–103 (1969)

    MathSciNet  Google Scholar 

  3. Fisher, D.H.: Knowledge acquisition via incremental conceptual clustering. Mach. Learn. 2, 139–172 (1987)

    Google Scholar 

  4. Frank, A., Asuncion, A.: UCI Machine Learning Repository http://archive.ics.uci.edu/ml (2010). University of California, Irvine, School of Information and Computer Science

  5. Grabowski, M., Korpusik, M.: Metrics and similarities in modeling dependencies between continuous-nominal data. Zeszyty Naukowe WWSI. 10/7 (2013)

    Google Scholar 

  6. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer Series in Statistics. Springer, Berlin (2001)

    Book  MATH  Google Scholar 

  7. Indyk, P., Matoušek, J.: Low-distortion embeddings of finite metric spaces. Handbook of Discrete and Computational Geometry. CRC Press LLC, Boca Raton (2004)

    Google Scholar 

  8. Krzanowski, W.J.: Principles of Multivariate Analysis: A User’s Perspective. Clarendon Press, Oxford (1998)

    MATH  Google Scholar 

  9. Linial, N.: Finite metric spaces: combinatorics, geometry and algorithms. In: Symposium on Computational Geometry. ACM, Barcelona (2002)

    Google Scholar 

  10. Peng, J., Heisterkamp, D., Dai, H.: Adaptative quasiconformal kernel nearest neighbor classification. IEEE Trans. Pattern Anal. Mach. Intell. 26(5), 656–661 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aleksander Denisiuk .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Denisiuk, A., Grabowski, M. (2016). A Variant of the K-Means Clustering Algorithm for Continuous-Nominal Data. In: Burduk, R., Jackowski, K., Kurzyński, M., Woźniak, M., Żołnierek, A. (eds) Proceedings of the 9th International Conference on Computer Recognition Systems CORES 2015. Advances in Intelligent Systems and Computing, vol 403. Springer, Cham. https://doi.org/10.1007/978-3-319-26227-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26227-7_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26225-3

  • Online ISBN: 978-3-319-26227-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics