Skip to main content

Advertisement

Log in

TS3FCM: trusted safe semi-supervised fuzzy clustering method for data partition with high confidence

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Data partition with high confidence is one of the main concentration of researchers in Soft Computing for many years. It is known that there may be some data with less confidence (wrong values, incorrect attribute types, irrelevant domain ranges, etc.) existed in the whole dataset due to the data gathering process. This would degrade the performance of final clustering results because of noises and outliers being occurred. Safe semi-supervised fuzzy clustering has been used extensively in recent years to tackle with this problem by adding the concept of a local graph between labeled and unlabeled data so that wrong labeled data has small impact to the final clusters. However, this process often takes much computational time and sometimes produces unreasonable results. In this research, we propose a new algorithm for the Data partition with confidence problem named as Trusted Safe Semi-Supervised Fuzzy Clustering Method (TS3FCM). The key motivation behind TS3FCM is to handle the drawbacks of the related safe semi-supervised fuzzy clustering algorithms regarding huge computational time. The novelty of TS3FCM against the other safe semi-supervised fuzzy clustering algorithms lies at the isolated processes of finding trusted labeled data and performing semi-supervised fuzzy clustering. The key contributions of the paper are briefly summarized as follows. At first, a new objective function is proposed. This function is incorporated with new weights for each labeled data so that the system can check whether a labeled data point is corrected or not. This function is also optimized to find the cluster centers and the membership matrix. Indeed, the labeled data having small impact after clustering are either set up with very low membership values or removed from the set of labeled data. Furthermore, a new semi-supervised fuzzy clustering model is defined to partition the whole dataset with the additional information being a mixture of the prior membership degrees (\( \overline{\mathrm{U}} \)) and labeled data. The whole TS3FCM works through 3 main phases with the main aim to accelerate the computational time and to achieve reasonable clustering quality compared to the related algorithms. TS3FCM is implemented and experimentally compared against the related methods such as the standard Fuzzy C-Means (FCM), the Semi-supervised Fuzzy Clustering method (SSFCM), and the Confidence-weighted safe semi-supervised clustering (CS3FCM) algorithm by both the computational time and the quality of clustering results. The experimental results on the benchmark UCI Machine Learning datasets show that TS3FCM runs faster than the other algorithms while maintaining reasonable clustering quality. We also analyze the results statistically by ANOVA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Antoine V, Guerrero JA, Xie J (2021) Fast semi-supervised evidential clustering. Int J Approx Reason 133:116–132

    Article  MathSciNet  Google Scholar 

  2. Arora J, Tushir M (2019) A new semi-supervised intuitionistic fuzzy C-means clustering. ICST Trans Scalable Inf Syst 7(24):159622. https://doi.org/10.4108/eai.13-7-2018.159622

  3. Bezdek JC, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10(2–3):191–203

    Article  Google Scholar 

  4. Casalino G, Castellano G, Mencar C (2019, August) Credit card fraud detection by dynamic incremental semi-supervised fuzzy clustering. In: 11th conference of the European Society for Fuzzy Logic and Technology (EUSFLAT 2019). Atlantis press. pp. 198-204

  5. Chen B, Jiang T, Chen L (2020) Weblog Fuzzy Clustering Algorithm based on Convolutional Neural Network Microprocessors and Microsystems, 103420

  6. Curiskis SA, Drake B, Osborn TR, Kennedy PJ (2020) An evaluation of document clustering and topic modelling in two online social networks: twitter and Reddit. Inf Process Manag 57(2):102034

    Article  Google Scholar 

  7. Gan H (2019) Safe semi-supervised fuzzy c -means clustering. IEEE Access 7:95659–95664. https://doi.org/10.1109/ACCESS.2019.2929307

    Article  Google Scholar 

  8. Gan H, Li Z, Wu W, Luo Z, Huang R (2018) Safety-aware graph-based semi-supervised learning. Expert Syst Appl 107:243–254

    Article  Google Scholar 

  9. Gan H, Fan Y, Luo Z, Zhang Q (2018) Local homogeneous consistent safe semi-supervised clustering. Expert Syst Appl 97:384–393

    Article  Google Scholar 

  10. Gan H, Fan Y, Luo Z, Huang R, Yang Z (2019) Confidence-weighted safe semi-supervised clustering. Eng Appl Artif Intell 81:107–116

    Article  Google Scholar 

  11. Goel S, Tushir M (2021) A new semi-supervised clustering for incomplete data. J Intell Fuzzy Syst 42:727–739

    Article  Google Scholar 

  12. Guo L, Gan H, Xia S, Xu X, Zhou T (2021) Joint exploring of risky labeled and unlabeled samples for safe semi-supervised clustering. Expert Syst Appl 176:114796

    Article  Google Scholar 

  13. Han Y, Wang T (2021) Semi supervised clustering for financial risk analysis. Neural Process Lett 53:3561–3572

    Article  Google Scholar 

  14. Hao Z, Xu S., Zhong G., Liu B (2020, April) Pairwise-constraints based semi-supervised fuzzy clustering with entropy regularization. In: 2020 3rd international conference on advanced electronic materials, computers and software engineering (AEMCSE). IEEE. pp. 137-144

  15. Kaczmarek-Majer K (2020) Dynamic incremental semi-supervised fuzzy clustering for bipolar disorder episode prediction. In: Discovery science: 23rd international conference, DS 2020, Thessaloniki, Greece, October 19-21, 2020, proceedings. Springer Nature. Vol. 12323, p. 79

  16. Kumar A, Bhadauria HS, Singh A (2020) Semi-supervised OTSU based hyperbolic tangent Gaussian kernel fuzzy C-mean clustering for dental radiographs segmentation. Multimed Tools Appl 79(3):2745–2768

    Article  Google Scholar 

  17. Li Z, Li Y, Lu W, Huang J (2020) Crowdsourcing logistics pricing optimization model based on DBSCAN clustering algorithm. IEEE Access 8:92615–92626

    Google Scholar 

  18. Li H, Wang Y, Li Y, Xiao G, Hu P, Zhao R, Li B (2021) Learning adaptive criteria weights for active semi-supervised learning. Inf Sci 561:286–303

    Article  MathSciNet  Google Scholar 

  19. Lovász L, Plummer MD (2009) Matching theory, Providence, RI, USA: Amer Math Soc, vol. 367

  20. Mai DS, Ngo LT, Hagras H (2021) A hybrid interval type-2 semi-supervised possibilistic fuzzy c-means clustering and particle swarm optimization for satellite image analysis. Inf Sci 548:398–422

    Article  MathSciNet  Google Scholar 

  21. Majumdar S, Laha AK (2020) Clustering and classification of time series using topological data analysis with applications to finance. Expert Syst Appl 162:113868

    Article  Google Scholar 

  22. Narayana GS, Kolli K (2021) Fuzzy K-means clustering with fast density peak clustering on multivariate kernel estimator with evolutionary multimodal optimization clusters on a large dataset. Multimed Tools Appl 80(3):4769–4787

    Article  Google Scholar 

  23. Pedrycz W, Waletzky J (1997) Fuzzy clustering with partial supervision. IEEE Trans Syst Man Cybern B Cybern 27(5):787–795

    Article  Google Scholar 

  24. Qin Y, Ding S, Wang L, Wang Y (2019) Research progress on semi-supervised clustering. Cogn Comput 11(5):599–612

    Article  Google Scholar 

  25. Rahim R, Santoso JT, Jumini S, Bhawika GW, Susilo D, Wibowo D (2021) Unsupervised data mining technique for clustering library in Indonesia. Library Philosophy and Practice (e-journal). 4866. https://digitalcommons.unl.edu/libphilprac/4866

  26. Ramasubbareddy S, Srinivas T AS, Govinda K, Manivannan SS (2020) Comparative study of clustering techniques in market segmentation. In: Innovations in computer science and engineering. Springer, Singapore, pp. 117–125

  27. Robinson YH, Julie EG, Kumar R (2019) Probability-based cluster head selection and fuzzy multipath routing for prolonging lifetime of wireless sensor networks. Peer Peer Netw Appl 12(5):1061–1075

    Article  Google Scholar 

  28. Salehi F, Keyvanpour MR, Sharifi A (2021) SMKFC-ER: semi-supervised multiple kernel fuzzy clustering based on entropy and relative entropy. Inf Sci 547:667–688

    Article  MathSciNet  Google Scholar 

  29. Shi W, Chen WN, Gu T, Jin H, Zhang J (2020) Handling uncertainty in financial decision making: a clustering estimation of distribution algorithm with simplified simulation. IEEE Trans Emerg Top Comput Intell 5(1):42–56

    Article  Google Scholar 

  30. Tamba SP, Batubara MD, Purba W, Sihombing M, Siregar VMM, Banjarnahor J (2019, July) Book data grouping in libraries using the k-means clustering method. J Phys Conf Ser 1230(1):012074

    Article  Google Scholar 

  31. Tuan TM (2017) Dental segmentation from X-ray images using semi-supervised fuzzy clustering with spatial constraints. Eng Appl Artif Intell 59:186–195

    Article  Google Scholar 

  32. UCI Machine Learning Repository (2021) Data. Online: https://archive.ics.uci.edu/ml/datasets.php

  33. Vendramin L, Campello RJ, Hruschka ER (2010) Relative clustering validity criteria: a comparative overview. Stat Anal Data Min: the ASA data science journal 3(4):209–235

    Article  MathSciNet  Google Scholar 

  34. Xiong J, Liu X, Zhu X, Zhu H, Li H, Zhang Q (2020) Semi-supervised fuzzy C-means clustering optimized by simulated annealing and genetic algorithm for fault diagnosis of bearings. IEEE Access 8:181976–181987

    Article  Google Scholar 

  35. Xu H (2020) Semi-supervised manifold learning based on polynomial mapping for localization in wireless sensor networks. Signal Process 172:107570

    Article  Google Scholar 

  36. Yu K, Lin TR, Ma H, Li X, Li X (2021) A multi-stage semi-supervised learning approach for intelligent fault diagnosis of rolling bearing using data augmentation and metric learning. Mech Syst Signal Process 146:107043

    Article  Google Scholar 

  37. Zhao K, Jiang Y, Xia K, Zhou L, Chen Y, Xu K, Qian P (2020) View-collaborative fuzzy soft subspace clustering for automatic medical image segmentation. Multimed Tools Appl 79(13):9523–9542

    Article  Google Scholar 

Download references

Acknowledgments

This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant number 102.05-2020.11.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tran Manh Tuan.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huan, P.T., Thong, P.H., Tuan, T.M. et al. TS3FCM: trusted safe semi-supervised fuzzy clustering method for data partition with high confidence. Multimed Tools Appl 81, 12567–12598 (2022). https://doi.org/10.1007/s11042-022-12133-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-12133-6

Keywords

Navigation