A K-Means Clustering Algorithm: Using the Chi-Square as a Distance

Ariosto Serna, Luis; Alejandro Hernández, Kevin; Navarro González, Piedad

doi:10.1007/978-3-030-15127-0_46

Luis Ariosto Serna¹⁷,
Kevin Alejandro Hernández¹⁷ &
Piedad Navarro González¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11354))

Included in the following conference series:

International Conference on Human Centered Computing

1405 Accesses

Abstract

The recurrent use of databases with variables of the categorical type in different fields of science. Demands new approaches when using cluster analysis techniques on this type of database. For this reason, in this article we compare the function kmeans() of Matlab with a function K-Means implemented by us, with the addition that it has integrated a measure of similarity that the function of Matlab does not have, the distance chi-square, both algorithms were tested in databases with quantitative and categorical variables. The experimental results showed a higher level of classification success in favor of the function implemented by us, explaining the correct functioning of the implemented algorithm and demonstrating that the chi-square distance is the measure of appropriate similarity for categorical type databases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Comparative Study on k-means Clustering Method and Analysis

Algorithms of Combinatorial Cluster Analysis

cs-means: Determining optimal number of clusters based on a level-of-similarity

Article 06 October 2020

References

Hand, D.J.: Principles of data mining. Drug Saf. 30(7), 621–622 (2007)
Article Google Scholar
Anderberg, M.R.: Cluster Analysis for Applications: Probability and Mathematical Statistics: A Series of Monographs and Textbooks, vol. 19. Academic Press, Cambridge (2014)
Google Scholar
Ball, G.: A clustering technique for summarizing multivariate data. Behav. Sci. 12(2), 153–155 (1967)
Article Google Scholar
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 14, no. 1, pp. 281–297 (1967)
Google Scholar
Ralambondrainy, H.: A conceptual version of the K-means algorithm. Pattern Recogn. Lett. 16(11), 1147–1157 (1995)
Article Google Scholar
Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27, 857–871 (1971)
Article Google Scholar
Gowda, K.: Symbolic clustering using a new dissimilarity measure. Pattern Recogn. 24(6), 567–578 (1991)
Article Google Scholar
Kaufman, L.: Finding Groups in Data: An Introduction to Cluster Analysis, vol. 344. Wiley, New York (2009)
Google Scholar
Woodbury, M.A.: Clinical pure types as a fuzzy partition. J. Cybern. 4(3), 111–121 (1974)
Article Google Scholar
Michalski, R.S.: Automated construction of classifications: conceptual clustering versus numerical taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 4, 396–410 (1983)
Article Google Scholar
Ghosh, S., Dubey, S.K.: Comparative analysis of K-means and fuzzy C-means algorithms. Int. J. Adv. Comput. Sci. Appl. 4(4), 35–38 (2013)
Google Scholar
Mohanavalli, S.: Precise distance metric for mixed data clustering using chi-square statistics. Res. J. Appl. Sci. Eng. Technol. 10(12), 1441–1444 (2015)
Article Google Scholar
Mathworks.com: K-means clustering - MATLAB kmeans (2018). https://www.mathworks.com/help/stats/kmeans.html. Accessed 26 June 2018
Asuncion, A., Newman, D.J.: UCI Machine Learning Repository Irvine. University of California, School of Information and Computer Science (2013)
Google Scholar
Martinez, T.: Improved heterogeneous distance functions. J. Artif. Intell. Res. 6, 1–34 (1997)
Article MathSciNet Google Scholar

Download references

Acknowledgments

We would like to thank the Corporacion Instituto de Administracion y Finanzas (CIAF) and the research group of organizations and innovation belonging to the same institution. Who supported us in the development and financing of the article.

Author information

Authors and Affiliations

Universidad Tecnologica de Pereira, Pereira, Colombia
Luis Ariosto Serna & Kevin Alejandro Hernández
Corporación Instituto de Administración y finanzas (CIAF), Pereira, Colombia
Piedad Navarro González

Authors

Luis Ariosto Serna
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Alejandro Hernández
View author publications
You can also search for this author in PubMed Google Scholar
Piedad Navarro González
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luis Ariosto Serna .

Editor information

Editors and Affiliations

South China Normal University, Guangzhou, China
Yong Tang
Wuhan University of Technology, Wuhan City, China
Qiaohong Zu
CINVESTAV, Mexico City, Mexico
José G. Rodríguez García

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ariosto Serna, L., Alejandro Hernández, K., Navarro González, P. (2019). A K-Means Clustering Algorithm: Using the Chi-Square as a Distance. In: Tang, Y., Zu, Q., Rodríguez García, J. (eds) Human Centered Computing. HCC 2018. Lecture Notes in Computer Science(), vol 11354. Springer, Cham. https://doi.org/10.1007/978-3-030-15127-0_46

Download citation

DOI: https://doi.org/10.1007/978-3-030-15127-0_46
Published: 22 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15126-3
Online ISBN: 978-3-030-15127-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics