Abstract
The core idea of the proposed algorithm is to embed the considered dataset into a metric space. Two spaces for embedding of nominal part with the Hamming metric are considered: Euclidean space (the classical approach) and the standard unit sphere \(\mathbb S\) (our new approach). We proved that the distortion of embedding into the unit sphere is at least 75 % better than that of the classical approach. In our model, combinations of continuous and nominal data are interpreted as points of a cylinder \(\mathbb R^p\times \mathbb S\), where p is the dimension of continuous data. We use a version of the gradient algorithm to compute centroids of finite sets on a cylinder. Experimental results show certain advances of the new algorithm. Specifically, it produces better clusters in tests with predefined groups.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bourgaine, J.: On lipschitz embeddings of finite metric spaces in Hilbert space. Isr. J. Math. 52, 46–52 (1985)
Enflo, P.: On the nonexistence of uniform homeomorphisms between Lp spaces. Ark. Mat. 8, 5–103 (1969)
Fisher, D.H.: Knowledge acquisition via incremental conceptual clustering. Mach. Learn. 2, 139–172 (1987)
Frank, A., Asuncion, A.: UCI Machine Learning Repository http://archive.ics.uci.edu/ml (2010). University of California, Irvine, School of Information and Computer Science
Grabowski, M., Korpusik, M.: Metrics and similarities in modeling dependencies between continuous-nominal data. Zeszyty Naukowe WWSI. 10/7 (2013)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer Series in Statistics. Springer, Berlin (2001)
Indyk, P., Matoušek, J.: Low-distortion embeddings of finite metric spaces. Handbook of Discrete and Computational Geometry. CRC Press LLC, Boca Raton (2004)
Krzanowski, W.J.: Principles of Multivariate Analysis: A User’s Perspective. Clarendon Press, Oxford (1998)
Linial, N.: Finite metric spaces: combinatorics, geometry and algorithms. In: Symposium on Computational Geometry. ACM, Barcelona (2002)
Peng, J., Heisterkamp, D., Dai, H.: Adaptative quasiconformal kernel nearest neighbor classification. IEEE Trans. Pattern Anal. Mach. Intell. 26(5), 656–661 (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Denisiuk, A., Grabowski, M. (2016). A Variant of the K-Means Clustering Algorithm for Continuous-Nominal Data. In: Burduk, R., Jackowski, K., Kurzyński, M., Woźniak, M., Żołnierek, A. (eds) Proceedings of the 9th International Conference on Computer Recognition Systems CORES 2015. Advances in Intelligent Systems and Computing, vol 403. Springer, Cham. https://doi.org/10.1007/978-3-319-26227-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-26227-7_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26225-3
Online ISBN: 978-3-319-26227-7
eBook Packages: EngineeringEngineering (R0)