Skip to main content

Comparison Analysis of Clustering Quality Criteria Using Inductive Methods of Objective Clustering

  • Conference paper
  • First Online:
Data Stream Mining & Processing (DSMP 2020)

Abstract

In this paper, we present the results of the research concerning comparison analysis of both the internal and external clustering quality criteria for clustering various types of datasets using density-based DBSCAN clustering algorithm implemented based on Inductive Methods of Objective Clustering (IMOC). Implementation of the IMOC technique assumes division of the initial dataset into two similar subsets contained the same number of pairwise similar objects at the first step of this procedure implementation. Then, we have executed the data clustering on the obtained subsets concurrently within the range of the appropriate algorithm parameters variation with estimation of various types of clustering quality criteria (internal (IQC) and external (EQC)) at each step of this procedure implementation. The final solution concerning algorithm optimal parameters determination was made based on the maximum values of the complex balance criterion (CBC) which contains both the ICQ and ECQ as the components. The analysis of the simulation results has allowed us to evaluate the effectiveness of both the internal and external clustering quality criteria to determine the optimal parameters of clustering algorithm using various type of data. To our mind, the obtained results can allow us to increase the clustering procedure exactness and to decrease the reproducibility error.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Babichev, S., Durnyak, B., Pikh, I., Senkivskyy, V.: An evaluation of the objective clustering inductive technology effectiveness implemented using density-based and agglomerative hierarchical clustering algorithms. In: Lytvynenko, V., Babichev, S., Wójcik, W., Vynokurova, O., Vyshemyrskaya, S., Radetskaya, S. (eds.) ISDMCI 2019. AISC, vol. 1020, pp. 532–553. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-26474-1_37

    Chapter  Google Scholar 

  2. Babichev, S., Lytvynenko, V., Korobchynskyi, M., Taiff, M.A.: Objective clustering inductive technology of gene expression sequences features. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) BDAS 2017. CCIS, vol. 716, pp. 359–372. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58274-0_29

    Chapter  Google Scholar 

  3. Babichev, S., Lytvynenko, V., Skvor, J., Fiser, J.: Model of the objective clustering inductive technology of gene expression profiles based on SOTA and DBSCAN clustering algorithms. In: Shakhovska, N., Stepashko, V. (eds.) CSIT 2017. AISC, vol. 689, pp. 21–39. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-70581-1_2

    Chapter  Google Scholar 

  4. Babichev, S., Taif, M., Lytvynenko, V.: Estimation of the inductive model of objects clustering stability based on the k-means algorithm for different levels of data noise. Radio Electron. Comput. Sci. Control 4, 54–60 (2016). https://doi.org/10.15588/1607-3274-2016-4-7

    Article  Google Scholar 

  5. Babichev, S., Taif, M., Lytvynenko, V., Osypenko, V.: Criterial analysis of gene expression sequences to create the objective clustering inductive technology. In: 2017 IEEE 37th International Conference on Electronics and Nanotechnology, pp. 244–248 (2017). https://doi.org/10.1109/ELNANO.2017.7939756

  6. Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. 3, 1–27 (1974)

    MathSciNet  MATH  Google Scholar 

  7. Desgraupes, B.: Compute clustering validation indices (2018). https://cran.r-project.org/web/packages/clusterCrit

  8. Elm, J., Kubečka, J., Besel, V., et al.: Modeling the formation and growth of atmospheric molecular clusters: a review. J. Aerosol Sci. 149, Article no. 105621 (2020). https://doi.org/10.1016/j.jaerosci.2020.105621

  9. Esposito, A.M., Alaia, G., Giudicepietro, F., Pappalardo, L., D’Antonio, M.: Unsupervised geochemical analysis of the eruptive products of Ischia, Vesuvius and Campi Flegrei. In: Esposito, A., Faundez-Zanuy, M., Morabito, F.C., Pasero, E. (eds.) Progresses in Artificial Intelligence and Neural Systems. SIST, vol. 184, pp. 175–184. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-5093-5_17

    Chapter  Google Scholar 

  10. Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial datasets with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pp. 226–231 (1996)

    Google Scholar 

  11. Fränti, P., Sieranoja, S.: K-means properties on six clustering benchmark datasets. Appl. Intell. 48(12), 4743–4759 (2018). https://doi.org/10.1007/s10489-018-1238-7

    Article  MATH  Google Scholar 

  12. Guo, X., Lin, H., Wu, Y., Peng, M.: A new data clustering strategy for enhancing mutual privacy in healthcare IoT systems. Future Gener. Comput. Syst. 113, 407–417 (2020). https://doi.org/10.1016/j.future.2020.07.023

    Article  Google Scholar 

  13. Hahsler, M., Piekenbrock, M., Arya, S., Mount, D.: Density based clustering of applications with noise (DBSCAN) and related algorithms (2019). https://github.com/mhahsler/dbscan

  14. Harrington, J.: The desirability function. Ind. Qual. Control 21(10), 494–498 (1965). http://asq.org/qic/display-item/?item=4860

    Google Scholar 

  15. Hu, Z., Tyshchenko, O.K.: An approach to online fuzzy clustering based on the Mahalanobis distance measure. In: Hu, Z., Petoukhov, S., He, M. (eds.) CSDEIS 2019. AISC, vol. 1127, pp. 364–374. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-39216-1_33

    Chapter  Google Scholar 

  16. Ihaka, R., Gentleman, R.: R: a language for data analysis and graphics. J. Comput. Graph. Stat. 5(3), 299–314 (1996)

    Google Scholar 

  17. Ivakhnenko, A.: Objective clustering based on the theory of self-organization models. Automatics 5, 6–15 (1987)

    MathSciNet  Google Scholar 

  18. Izonin, I., Kryvinska, N., Vitynskyi, P., Tkachenko, R., Zub, K.: GRNN approach towards missing data recovery between IoT systems. In: Barolli, L., Nishino, H., Miwa, H. (eds.) INCoS 2019. AISC, vol. 1035, pp. 445–453. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-29035-1_43

    Chapter  Google Scholar 

  19. Kanishcheva, O., Vysotska, V., Chyrun, L., Gozhyj, A.: Method of integration and content management of the information resources network. In: Shakhovska, N., Stepashko, V. (eds.) CSIT 2017. AISC, vol. 689, pp. 204–216. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-70581-1_14

    Chapter  Google Scholar 

  20. Liu, Z., Barahona, M.: Graph-based data clustering via multiscale community detection. Appl. Netw. Sci. 5(1), Article no. 3 (2020). https://doi.org/10.1007/s41109-019-0248-7

  21. Madala, H., Ivakhnenko, A.: Inductive Learning Algorithms for Complex Systems Modeling, p. 380. CRC Press, Boca Raton (1994). Chap. 5: Clusterization and Recognition

    MATH  Google Scholar 

  22. Mishchuk, O., Tkachenko, R., Izonin, I.: Missing data imputation through SGTM neural-like structure for environmental monitoring tasks. In: Hu, Z., Petoukhov, S., Dychka, I., He, M. (eds.) ICCSEEA 2019. AISC, vol. 938, pp. 142–151. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-16621-2_13

    Chapter  Google Scholar 

  23. Naum, O., Chyrun, L., Vysotska, V., Kanishcheva, O.: Intellectual system design for content formation. In: Proceedings of the 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies, CSIT 2017, vol. 1, pp. 131–138. Institute of Electrical and Electronics Engineers Inc. (2017). https://doi.org/10.1109/STC-CSIT.2017.8098753

  24. Ruiz, L., Pegalajar, M., Arcucci, R., Molina-Solana, M.: A time-series clustering methodology for knowledge extraction in energy consumption data. Expert Syst. Appl. 160, Article no. 113731 (2020). https://doi.org/10.1016/j.eswa.2020.113731

  25. Senouci, O., Harous, S., Aliouat, Z.: Survey on vehicular ad hoc networks clustering algorithms: overview, taxonomy, challenges, and open research issues. Int. J. Commun. Syst. 33(11), Article no. e4402 (2020). https://doi.org/10.1002/dac.4402

  26. Wang, F., Geng, Y., Zhang, H.: An improved fuzzy C-means clustering algorithm based on intuitionistic fuzzy sets. In: Liu, Q., Liu, X., Li, L., Zhou, H., Zhao, H.-H. (eds.) Proceedings of the 9th International Conference on Computer Engineering and Networks. AISC, vol. 1143, pp. 333–345. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-3753-0_32

    Chapter  Google Scholar 

  27. Wang, S., Li, Q., Zhao, C., Zhu, X., Yuan, H., Dai, T.: Extreme clustering - a clustering method via density extreme points. Inf. Sci. 542, 24–39 (2021). https://doi.org/10.1016/j.ins.2020.06.069

    Article  MathSciNet  Google Scholar 

  28. Zhao, Q., Xu, M., Fränti, P.: Sum-of-squares based cluster validity index and significance analysis. In: Kolehmainen, M., Toivanen, P., Beliczynski, B. (eds.) ICANNGA 2009. LNCS, vol. 5495, pp. 313–322. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04921-7_32

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sergii Babichev .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Babichev, S., Spivakovskiy, A., Škvor, J. (2020). Comparison Analysis of Clustering Quality Criteria Using Inductive Methods of Objective Clustering. In: Babichev, S., Peleshko, D., Vynokurova, O. (eds) Data Stream Mining & Processing. DSMP 2020. Communications in Computer and Information Science, vol 1158. Springer, Cham. https://doi.org/10.1007/978-3-030-61656-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-61656-4_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-61655-7

  • Online ISBN: 978-3-030-61656-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics