Skip to main content

A K-Means Clustering Algorithm: Using the Chi-Square as a Distance

  • Conference paper
  • First Online:
Human Centered Computing (HCC 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11354))

Included in the following conference series:

  • 1373 Accesses

Abstract

The recurrent use of databases with variables of the categorical type in different fields of science. Demands new approaches when using cluster analysis techniques on this type of database. For this reason, in this article we compare the function kmeans() of Matlab with a function K-Means implemented by us, with the addition that it has integrated a measure of similarity that the function of Matlab does not have, the distance chi-square, both algorithms were tested in databases with quantitative and categorical variables. The experimental results showed a higher level of classification success in favor of the function implemented by us, explaining the correct functioning of the implemented algorithm and demonstrating that the chi-square distance is the measure of appropriate similarity for categorical type databases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Hand, D.J.: Principles of data mining. Drug Saf. 30(7), 621–622 (2007)

    Article  Google Scholar 

  2. Anderberg, M.R.: Cluster Analysis for Applications: Probability and Mathematical Statistics: A Series of Monographs and Textbooks, vol. 19. Academic Press, Cambridge (2014)

    Google Scholar 

  3. Ball, G.: A clustering technique for summarizing multivariate data. Behav. Sci. 12(2), 153–155 (1967)

    Article  Google Scholar 

  4. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 14, no. 1, pp. 281–297 (1967)

    Google Scholar 

  5. Ralambondrainy, H.: A conceptual version of the K-means algorithm. Pattern Recogn. Lett. 16(11), 1147–1157 (1995)

    Article  Google Scholar 

  6. Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27, 857–871 (1971)

    Article  Google Scholar 

  7. Gowda, K.: Symbolic clustering using a new dissimilarity measure. Pattern Recogn. 24(6), 567–578 (1991)

    Article  Google Scholar 

  8. Kaufman, L.: Finding Groups in Data: An Introduction to Cluster Analysis, vol. 344. Wiley, New York (2009)

    Google Scholar 

  9. Woodbury, M.A.: Clinical pure types as a fuzzy partition. J. Cybern. 4(3), 111–121 (1974)

    Article  Google Scholar 

  10. Michalski, R.S.: Automated construction of classifications: conceptual clustering versus numerical taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 4, 396–410 (1983)

    Article  Google Scholar 

  11. Ghosh, S., Dubey, S.K.: Comparative analysis of K-means and fuzzy C-means algorithms. Int. J. Adv. Comput. Sci. Appl. 4(4), 35–38 (2013)

    Google Scholar 

  12. Mohanavalli, S.: Precise distance metric for mixed data clustering using chi-square statistics. Res. J. Appl. Sci. Eng. Technol. 10(12), 1441–1444 (2015)

    Article  Google Scholar 

  13. Mathworks.com: K-means clustering - MATLAB kmeans (2018). https://www.mathworks.com/help/stats/kmeans.html. Accessed 26 June 2018

  14. Asuncion, A., Newman, D.J.: UCI Machine Learning Repository Irvine. University of California, School of Information and Computer Science (2013)

    Google Scholar 

  15. Martinez, T.: Improved heterogeneous distance functions. J. Artif. Intell. Res. 6, 1–34 (1997)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

We would like to thank the Corporacion Instituto de Administracion y Finanzas (CIAF) and the research group of organizations and innovation belonging to the same institution. Who supported us in the development and financing of the article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luis Ariosto Serna .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ariosto Serna, L., Alejandro Hernández, K., Navarro González, P. (2019). A K-Means Clustering Algorithm: Using the Chi-Square as a Distance. In: Tang, Y., Zu, Q., Rodríguez García, J. (eds) Human Centered Computing. HCC 2018. Lecture Notes in Computer Science(), vol 11354. Springer, Cham. https://doi.org/10.1007/978-3-030-15127-0_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-15127-0_46

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-15126-3

  • Online ISBN: 978-3-030-15127-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics