Abstract
In this paper, we present the results of the research concerning comparison analysis of both the internal and external clustering quality criteria for clustering various types of datasets using density-based DBSCAN clustering algorithm implemented based on Inductive Methods of Objective Clustering (IMOC). Implementation of the IMOC technique assumes division of the initial dataset into two similar subsets contained the same number of pairwise similar objects at the first step of this procedure implementation. Then, we have executed the data clustering on the obtained subsets concurrently within the range of the appropriate algorithm parameters variation with estimation of various types of clustering quality criteria (internal (IQC) and external (EQC)) at each step of this procedure implementation. The final solution concerning algorithm optimal parameters determination was made based on the maximum values of the complex balance criterion (CBC) which contains both the ICQ and ECQ as the components. The analysis of the simulation results has allowed us to evaluate the effectiveness of both the internal and external clustering quality criteria to determine the optimal parameters of clustering algorithm using various type of data. To our mind, the obtained results can allow us to increase the clustering procedure exactness and to decrease the reproducibility error.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Babichev, S., Durnyak, B., Pikh, I., Senkivskyy, V.: An evaluation of the objective clustering inductive technology effectiveness implemented using density-based and agglomerative hierarchical clustering algorithms. In: Lytvynenko, V., Babichev, S., Wójcik, W., Vynokurova, O., Vyshemyrskaya, S., Radetskaya, S. (eds.) ISDMCI 2019. AISC, vol. 1020, pp. 532–553. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-26474-1_37
Babichev, S., Lytvynenko, V., Korobchynskyi, M., Taiff, M.A.: Objective clustering inductive technology of gene expression sequences features. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) BDAS 2017. CCIS, vol. 716, pp. 359–372. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58274-0_29
Babichev, S., Lytvynenko, V., Skvor, J., Fiser, J.: Model of the objective clustering inductive technology of gene expression profiles based on SOTA and DBSCAN clustering algorithms. In: Shakhovska, N., Stepashko, V. (eds.) CSIT 2017. AISC, vol. 689, pp. 21–39. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-70581-1_2
Babichev, S., Taif, M., Lytvynenko, V.: Estimation of the inductive model of objects clustering stability based on the k-means algorithm for different levels of data noise. Radio Electron. Comput. Sci. Control 4, 54–60 (2016). https://doi.org/10.15588/1607-3274-2016-4-7
Babichev, S., Taif, M., Lytvynenko, V., Osypenko, V.: Criterial analysis of gene expression sequences to create the objective clustering inductive technology. In: 2017 IEEE 37th International Conference on Electronics and Nanotechnology, pp. 244–248 (2017). https://doi.org/10.1109/ELNANO.2017.7939756
Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. 3, 1–27 (1974)
Desgraupes, B.: Compute clustering validation indices (2018). https://cran.r-project.org/web/packages/clusterCrit
Elm, J., Kubečka, J., Besel, V., et al.: Modeling the formation and growth of atmospheric molecular clusters: a review. J. Aerosol Sci. 149, Article no. 105621 (2020). https://doi.org/10.1016/j.jaerosci.2020.105621
Esposito, A.M., Alaia, G., Giudicepietro, F., Pappalardo, L., D’Antonio, M.: Unsupervised geochemical analysis of the eruptive products of Ischia, Vesuvius and Campi Flegrei. In: Esposito, A., Faundez-Zanuy, M., Morabito, F.C., Pasero, E. (eds.) Progresses in Artificial Intelligence and Neural Systems. SIST, vol. 184, pp. 175–184. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-5093-5_17
Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial datasets with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pp. 226–231 (1996)
Fränti, P., Sieranoja, S.: K-means properties on six clustering benchmark datasets. Appl. Intell. 48(12), 4743–4759 (2018). https://doi.org/10.1007/s10489-018-1238-7
Guo, X., Lin, H., Wu, Y., Peng, M.: A new data clustering strategy for enhancing mutual privacy in healthcare IoT systems. Future Gener. Comput. Syst. 113, 407–417 (2020). https://doi.org/10.1016/j.future.2020.07.023
Hahsler, M., Piekenbrock, M., Arya, S., Mount, D.: Density based clustering of applications with noise (DBSCAN) and related algorithms (2019). https://github.com/mhahsler/dbscan
Harrington, J.: The desirability function. Ind. Qual. Control 21(10), 494–498 (1965). http://asq.org/qic/display-item/?item=4860
Hu, Z., Tyshchenko, O.K.: An approach to online fuzzy clustering based on the Mahalanobis distance measure. In: Hu, Z., Petoukhov, S., He, M. (eds.) CSDEIS 2019. AISC, vol. 1127, pp. 364–374. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-39216-1_33
Ihaka, R., Gentleman, R.: R: a language for data analysis and graphics. J. Comput. Graph. Stat. 5(3), 299–314 (1996)
Ivakhnenko, A.: Objective clustering based on the theory of self-organization models. Automatics 5, 6–15 (1987)
Izonin, I., Kryvinska, N., Vitynskyi, P., Tkachenko, R., Zub, K.: GRNN approach towards missing data recovery between IoT systems. In: Barolli, L., Nishino, H., Miwa, H. (eds.) INCoS 2019. AISC, vol. 1035, pp. 445–453. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-29035-1_43
Kanishcheva, O., Vysotska, V., Chyrun, L., Gozhyj, A.: Method of integration and content management of the information resources network. In: Shakhovska, N., Stepashko, V. (eds.) CSIT 2017. AISC, vol. 689, pp. 204–216. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-70581-1_14
Liu, Z., Barahona, M.: Graph-based data clustering via multiscale community detection. Appl. Netw. Sci. 5(1), Article no. 3 (2020). https://doi.org/10.1007/s41109-019-0248-7
Madala, H., Ivakhnenko, A.: Inductive Learning Algorithms for Complex Systems Modeling, p. 380. CRC Press, Boca Raton (1994). Chap. 5: Clusterization and Recognition
Mishchuk, O., Tkachenko, R., Izonin, I.: Missing data imputation through SGTM neural-like structure for environmental monitoring tasks. In: Hu, Z., Petoukhov, S., Dychka, I., He, M. (eds.) ICCSEEA 2019. AISC, vol. 938, pp. 142–151. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-16621-2_13
Naum, O., Chyrun, L., Vysotska, V., Kanishcheva, O.: Intellectual system design for content formation. In: Proceedings of the 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies, CSIT 2017, vol. 1, pp. 131–138. Institute of Electrical and Electronics Engineers Inc. (2017). https://doi.org/10.1109/STC-CSIT.2017.8098753
Ruiz, L., Pegalajar, M., Arcucci, R., Molina-Solana, M.: A time-series clustering methodology for knowledge extraction in energy consumption data. Expert Syst. Appl. 160, Article no. 113731 (2020). https://doi.org/10.1016/j.eswa.2020.113731
Senouci, O., Harous, S., Aliouat, Z.: Survey on vehicular ad hoc networks clustering algorithms: overview, taxonomy, challenges, and open research issues. Int. J. Commun. Syst. 33(11), Article no. e4402 (2020). https://doi.org/10.1002/dac.4402
Wang, F., Geng, Y., Zhang, H.: An improved fuzzy C-means clustering algorithm based on intuitionistic fuzzy sets. In: Liu, Q., Liu, X., Li, L., Zhou, H., Zhao, H.-H. (eds.) Proceedings of the 9th International Conference on Computer Engineering and Networks. AISC, vol. 1143, pp. 333–345. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-3753-0_32
Wang, S., Li, Q., Zhao, C., Zhu, X., Yuan, H., Dai, T.: Extreme clustering - a clustering method via density extreme points. Inf. Sci. 542, 24–39 (2021). https://doi.org/10.1016/j.ins.2020.06.069
Zhao, Q., Xu, M., Fränti, P.: Sum-of-squares based cluster validity index and significance analysis. In: Kolehmainen, M., Toivanen, P., Beliczynski, B. (eds.) ICANNGA 2009. LNCS, vol. 5495, pp. 313–322. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04921-7_32
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Babichev, S., Spivakovskiy, A., Škvor, J. (2020). Comparison Analysis of Clustering Quality Criteria Using Inductive Methods of Objective Clustering. In: Babichev, S., Peleshko, D., Vynokurova, O. (eds) Data Stream Mining & Processing. DSMP 2020. Communications in Computer and Information Science, vol 1158. Springer, Cham. https://doi.org/10.1007/978-3-030-61656-4_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-61656-4_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61655-7
Online ISBN: 978-3-030-61656-4
eBook Packages: Computer ScienceComputer Science (R0)