New Internal Clustering Evaluation Index Based on Line Segments

Thomas, Juan Carlos Rojas; Peñas, Matilde Santos

doi:10.1007/978-3-030-33607-3_57

Juan Carlos Rojas Thomas¹⁴ &
Matilde Santos Peñas¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11871))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

1612 Accesses

Abstract

This work proposes a new internal clustering evaluation index, based on line segments as central elements of the clusters. The data dispersion is calculated as the average of the distances of the cluster to the respective line segment. It also defines a new measure of distance based on a line segment that connects the centroids of the clusters, from which an approximation of the edges of their geometries is obtained. The proposed index is validated with a series of experiments on 10 artificial data sets that are generated with different cluster characteristics, such as size, shape, noise and dimensionality, and on 8 real data sets. In these experiments, the performance of the new index is compared with 12 representative indices of the literature, surpassing all of them. These results allow to conclude the effectiveness of the proposal and shows the appropriateness of including geometric properties in the definition of internal indexes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Article Google Scholar
Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J.M., Perona, I.: An extensive comparative study of cluster validity indices. Pattern Recogn 46(1), 243–256 (2013)
Article Google Scholar
Rojas-Thomas, J.C., Santos, M., Mora, M.: New internal index for clustering validation based on graphs. Expert Syst. Appl. 86, 334–349 (2017)
Article Google Scholar
Brun, M., et al.: Model-based evaluation of clustering validation measures. Pattern Recogn. 40(3), 807–824 (2007)
Article Google Scholar
Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. Theory Methods 3(1), 1–27 (1974)
Article MathSciNet Google Scholar
Maulik, U., Bandyopadhyay, S.: Performance evaluation of some clustering algorithms and validity indices. IEEE Trans. Pattern Anal. Mach. Intell. 24(12), 1650–1654 (2002)
Article Google Scholar
Davies, D., Bouldin, D.: A cluster separation measure. IEEE PAMI 1(2), 224–227 (1979)
Google Scholar
Xie, S.L., Beni, G.: A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 8, 841–847 (1991)
Article Google Scholar
Dunn, J.: Well separated clusters and optimal fuzzy partitions. J. Cybern 4(1), 95–104 (1974)
Article MathSciNet Google Scholar
Chou, C-H., Mu-Chun S., Lai, E.: A new cluster validity measure for clusters with different densities. In: IASTED International Conference on Intelligent Systems and Control (2003)
Google Scholar
Hubert, L.J., Levin, J.R.: A general statistical framework for assessing categorical clustering in free recall. Psychol. Bull. 83(6), 1072 (1976)
Article Google Scholar
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Article Google Scholar
Baker, F.B., Hubert, L.J.: Measuring the power of hierarchical cluster analysis. J. Am. Stat. Assoc. 70, 31–38 (1975)
Article Google Scholar
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Clustering validity checking methods: part II. ACM Sigmod Rec. 31(3), 19–27 (2002)
Article Google Scholar
Thomas, J.C.R.: A new clustering algorithm based on k-means using a line segment as prototype. In: San Martin, C., Kim, S.-W. (eds.) CIARP 2011. LNCS, vol. 7042, pp. 638–645. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25085-9_76
Chapter Google Scholar
Dua, D. Graff, C.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA (2019). http://archive.ics.uci.edu/ml
Rojas-Thomas, J.C., Santos M., Mora, M., Duro, N.: Performance analysis of clustering internal validation indexes with asymmetric clusters. IEEE Lat. Am. Trans. (5) (2019, in press)
Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Informática y Automática, UNED, Madrid, Spain
Juan Carlos Rojas Thomas
Facultad de Informática, Universidad Complutense de Madrid, Madrid, Spain
Matilde Santos Peñas

Authors

Juan Carlos Rojas Thomas
View author publications
You can also search for this author in PubMed Google Scholar
Matilde Santos Peñas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matilde Santos Peñas .

Editor information

Editors and Affiliations

University of Manchester, Manchester, UK
Hujun Yin
Technical University of Madrid, Madrid, Spain
David Camacho
University of Birmingham, Birmingham, UK
Peter Tino
University of Huelva, Huelva, Spain
Antonio J. Tallón-Ballesteros
University of Exeter, Exeter, UK
Ronaldo Menezes
University of Manchester, Manchester, UK
Richard Allmendinger

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Thomas, J.C.R., Peñas, M.S. (2019). New Internal Clustering Evaluation Index Based on Line Segments. In: Yin, H., Camacho, D., Tino, P., Tallón-Ballesteros, A., Menezes, R., Allmendinger, R. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2019. IDEAL 2019. Lecture Notes in Computer Science(), vol 11871. Springer, Cham. https://doi.org/10.1007/978-3-030-33607-3_57

Download citation

DOI: https://doi.org/10.1007/978-3-030-33607-3_57
Published: 18 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33606-6
Online ISBN: 978-3-030-33607-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics