A Method to Determine the Number of Clusters Based on Multi-validity Index

Sun, Ning; Yu, Hong

doi:10.1007/978-3-319-99368-3_33

Ning Sun¹⁷ &
Hong Yu¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11103))

Included in the following conference series:

International Joint Conference on Rough Sets

1021 Accesses
3 Citations
1 Altmetric

Abstract

Cluster analysis is a method of unsupervised learning technology which is playing a more and more important role in data mining. However, one basic and difficult question for clustering is how to gain the number of clusters automatically. The traditional solution for the problem is to introduce a single validity index which may lead to failure because the index is bias to some specific condition. On the other hand, most of the existing clustering algorithms are based on hard partitioning which can not reflect the uncertainty of the data in the clustering process. To combat these drawbacks, this paper proposes a method to determine the number of clusters automatically based on three-way decision and multi-validity index which includes three parts: (1) the k-means clustering algorithm is devised to obtain the three-way clustering results; (2) multi-validity indexes are employed to evaluate the results and each evaluated result is weighed according to the mean similarity between the corresponding clustering result and the others based on the idea of the median partition in clustering ensemble; and (3) the comprehensive evaluation results are sorted and the best ranked k value is selected as the optional number of clusters. The experimental results show that the proposed method is better than the single evaluation method used in the fusion at determining the number of clusters automatically.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Azimi, R., Ghayekhloo, M., Ghofrani, M., et al.: A novel clustering algorithm based on data transformation approaches. Expert Syst. Appl. Int. J. 76(C), 59–70 (2017)
Google Scholar
Chen, H.P., Shen, X.J., Lv, Y.D.: A novel automatic fuzzy clustering algorithm based on soft partition and membership information. Neurocomputing 236, 104–112 (2016)
Google Scholar
Cristofor, D., Simovici, D.: Finding median partitions using information-theoretical-based genetic algorithms. J. Univers. Comput. Sci. 8(2), 153–172 (2002)
MathSciNet MATH Google Scholar
Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. In: International Conference on Data Engineering, 2005, ICDE 2005. Proceedings. IEEE, pp. 341–352 (2005)
Google Scholar
Huang, D., Wang, C., Lai, J., et al.: Clustering ensemble by decision weighting. JCAAI Trans. Intell. Syst. 11(3), 418–424 (2016)
Google Scholar
Jaskowiak, P.A., Moulavi, D., Furtado, A.C.S.: On strategies for building effective ensembles of relative clustering validity criteria. Knowl. Inf. Syst. 47(2), 329–354 (2016)
Article Google Scholar
Ling, H.L., Wu, J.S., Zhou, Y., et al.: How many clusters? A robust PSO-based local density model. Neurocomputing 207(C), 264–275 (2016)
Google Scholar
Mok, P.Y., Huang, H.Q., Kwok, Y.L.: A robust adaptive clustering analysis method for automatic identification of clusters. Pattern Recogn. 45(8), 3017–3033 (2012)
Article Google Scholar
Naldi, M.C., Carvalho, A.C., Campello, R.J.: Cluster ensemble selection based on relative validity indexes. Data Min. Knowl. Discov. 27(2), 259–289 (2013)
Article MathSciNet Google Scholar
Singhbiostatistics, V.: Ensemble clustering using semidefiniteprogramming. Mach. Learn. 79(1–2), 177–200 (2008)
Google Scholar
Vega-Pons, S., Avesani, P.: On pruning the search space for clustering ensemble problems. Neurocomputing 150(1), 481–489 (2015)
Article Google Scholar
Yangtao, W., Lihui, C., Jianping, M.: Incremental fuzzy clustering with multiple medoids for large data. IEEE Trans. Fuzzy Syst. 22(6), 1557–1568 (2014)
Article Google Scholar
Wu, X., Kumar, V., Quinlan, J.R.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2007)
Article Google Scholar
Xie, X.L., Beni, G.: A validity measure for fuzzy clustering. IEEE Trans. Pami 13(13), 841–847 (1991)
Article Google Scholar
Yao, Y.Y.: Three-way decisions with probabilistic rough sets. Inf. Sci. 180(3), 341–353 (2010)
Article MathSciNet Google Scholar
Yu, H., Liu, Z., Wang, G.: An automatic method to determine the number of clusters using decision-theoretic rough set. Int. J. Approximate Reasoning 55(1), 101–115 (2014)
Article MathSciNet Google Scholar

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant Nos. 61533020, 61751312 and 61379114.

Author information

Authors and Affiliations

Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, People’s Republic of China
Ning Sun & Hong Yu

Authors

Ning Sun
View author publications
You can also search for this author in PubMed Google Scholar
Hong Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hong Yu .

Editor information

Editors and Affiliations

University of Warsaw, Warsaw, Poland
Hung Son Nguyen
Faculty of Information Technology, Vietnam National University, Hanoi, Vietnam
Quang-Thuy Ha
School of Information Science, Southwest Jiaotong University, Chengdu, China
Tianrui Li
Institute of Computer Science, University of Silesia, Sosnowiec, Poland
Małgorzata Przybyła-Kasperek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sun, N., Yu, H. (2018). A Method to Determine the Number of Clusters Based on Multi-validity Index. In: Nguyen, H., Ha, QT., Li, T., Przybyła-Kasperek, M. (eds) Rough Sets. IJCRS 2018. Lecture Notes in Computer Science(), vol 11103. Springer, Cham. https://doi.org/10.1007/978-3-319-99368-3_33

Download citation

DOI: https://doi.org/10.1007/978-3-319-99368-3_33
Published: 15 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99367-6
Online ISBN: 978-3-319-99368-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics