Skip to main content
Log in

An automatic three-way clustering method based on sample similarity

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

The three-way clustering is an extension of traditional clustering by adding the concept of fringe region, which can effectively solve the problem of inaccurate decision-making caused by inaccurate information or insufficient data in traditional two-way clustering methods. The existing three-way clustering works often select the appropriate number of clusters and the thresholds for three-way partition according to subjective tuning. However, the method of fixing the number of clusters and the thresholds of the partition cannot automatically select the optimal number of clusters and partition thresholds for different data sets with different sizes and densities. To address the above problem, this paper proposed an improved three-way clustering method. First, we define the roughness degree by introducing the sample similarity to measure the uncertainty of the fringe region. Moreover, based on the roughness degree, we define a novel partitioning validity index to measure the clustering partitions and propose an automatic threshold selection method. Second, based on the concept of sample similarity, we introduce the intra-class similarity and the inter-class similarity to describe the quantitative change of the relationship between the sample and the clusters, and define a novel clustering validity index to measure the clustering performance under different numbers of clusters through the integration of the above two kinds of similarities. Furthermore, we propose an automatic cluster number selection method. Finally, we give an automatic three-way clustering approach by combining the proposed threshold selection method and the cluster number selection method. The comparison experiments demonstrate the effectiveness of our proposal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. https://archive.ics.uci.edu/ml/datasets.php.

References

  1. Afridi MK, Azam N, Yao J, Alanazi E (2018) A three-way clustering approach for handling missing data using GTRS. Int J Approx Reason 98:11–24

    Article  MathSciNet  Google Scholar 

  2. Dunn J (1973) A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. J Cybern 3(3):32–57

    Article  MathSciNet  Google Scholar 

  3. Friedman HP, Rubin J (1967) On some invariant criteria for grouping data. J Am Stat Assoc 62(320):1159–1178

    Article  MathSciNet  Google Scholar 

  4. Gu Y, Jia X, Shang L (2015) Three-way decisions based bayesian network. In: Proceedings of the IEEE international conference on progress in informatics and computing (PIC), pp 51–55

  5. Hu B (2017) Three-way decisions based on semi-three-way decision spaces. Inf Sci 382–383:415–440

    Article  Google Scholar 

  6. Jain AK, Murty MN, Flynn PJ (1999) ACM Comput Surv 31:264–323

    Article  Google Scholar 

  7. Jia X, Shang L (2014) Three-way decisions versus two-way decisions on filtering spam email. In: Transactions on rough sets XVIII, pp 69–91

  8. Jia X, Liao W, Tang Z, Shang L (2013) Minimum cost attribute reduction in decision-theoretic rough set models. Inf Sci 219:151–167

    Article  MathSciNet  Google Scholar 

  9. Jia X, Shang L, Zhou B, Yao Y (2016) Generalized attribute reduct in rough set theory. Knowl Based Syst 91:204–218

    Article  Google Scholar 

  10. Jia X, Li W, Shang L (2019) A multiphase cost-sensitive learning method based on the multiclass three-way decision-theoretic rough set model. Inf Sci 485:248–262

    Article  Google Scholar 

  11. Jia X, Rao Y, Shang L, Li T (2020) Similarity-based attribute reduction in rough set theory: a clustering perspective. Int J Mach Learn Cybernet 11:1047–1060

    Article  Google Scholar 

  12. Li H, Zhang L, Zhou X, Huang B (2017) Cost-sensitive sequential three-way decision modeling using a deep neural network. Int J Approx Reason 85:68–78

    Article  MathSciNet  Google Scholar 

  13. Li J, Huang C, Qi J, Qian Y, Liu W (2017) Three-way cognitive concept learning via multi-granularity. Inf Sci 378:244–263

    Article  Google Scholar 

  14. Li W, Huang Z, Jia X (2013) Two-phase classification based on three-way decisions. In: Proceedings of the international conference on rough sets and knowledge technology, pp 338–345

  15. Li W, Huang Z, Jia X, Cai X (2016) Neighborhood based decision-theoretic rough set models. Int J Approx Reason 69:1–17

    Article  MathSciNet  Google Scholar 

  16. Li W, Huang Z, Li Q (2016) Three-way decisions based software defect prediction. Knowl-Based Syst 91:263–274

    Article  Google Scholar 

  17. Li W, Jia X, Wang L, Zhou B (2019) Multi-objective attribute reduction in three-way decision-theoretic rough set model. Int J Approx Reason 105:327–341

    Article  MathSciNet  Google Scholar 

  18. Li X, Yi H, She Y, Sun B (2017) Generalized three-way decision models based on subset evaluation. Int J Approx Reason 83:142–159

    Article  MathSciNet  Google Scholar 

  19. Li Y, Zhang L, Xu Y, Yao Y, Lau RYK, Wu Y (2017) Enhancing binary classification by modeling uncertain boundary in three-way decisions. IEEE Trans Knowl Data Eng 29(7):1438–1451

    Article  Google Scholar 

  20. Liang D, Xu Z, Liu D (2017) Three-way decisions with intuitionistic fuzzy decision-theoretic rough sets based on point operators. Inf Sci 375:183–201

    Article  Google Scholar 

  21. Lingras P, Yan R, West C (2003) Comparison of conventional and rough k-means clustering. In: Proceedings of the international conference on rough sets, fuzzy sets, data mining, and granular computing, pp 130–137

  22. Liu D, Liang D (2014) An overview of function based three-way decisions. In: Proceedings of the international conference on rough sets and knowledge technology, pp 812–823

  23. Min F, Liu F, Wen L, Zhang Z (2019) Tri-partition cost-sensitive active learning through kNN. Soft Comput 23:1557–1572

    Article  Google Scholar 

  24. Pawlak Z (1982) Rough sets. Int J Comput Inform Sci 11(5):341–356

    Article  Google Scholar 

  25. Peters G, Crespo F, Lingras P, Weber R (2013) Soft clustering - fuzzy and rough approaches and their extensions and derivatives. Int J Approx Reason 54(2):307–322

    Article  MathSciNet  Google Scholar 

  26. Qi J, Qian T, Wei L (2016) The connections between three-way and classical concept lattices. Knowl-Based Syst 91:143–151 three-way Decisions and Granular Computing

    Article  Google Scholar 

  27. Qian T, Wei L, Qi J (2017) Constructing three-way concept lattices based on apposition and subposition of formal contexts. Knowl-Based Syst 116:39–48

    Article  Google Scholar 

  28. Yao J, Azam N (2015) Web-based medical decision support systems for three-way medical decision making with game-theoretic rough sets. IEEE Trans Fuzzy Syst 23(1):3–15

    Article  Google Scholar 

  29. Yao Y (2010) Three-way decisions with probabilistic rough sets. Inf Sci 180:341–353

    Article  MathSciNet  Google Scholar 

  30. Yao Y (2018) Three-way decision and granular computing. Int J Approx Reason 103:107–123

    Article  Google Scholar 

  31. Yu H (2018) Three-way decisions and three-way clustering. In: Proceedings of the international joint conference on rough sets, pp 13–28

  32. Yu H, Wang Y (2012) Three-way decisions method for overlapping clustering. In: Proceedings of international conference on rough sets and current trends in computing, pp 277–286

  33. Yu H, Liu Z, Wang G (2014) An automatic method to determine the number of clusters using decision-theoretic rough set. Int J Approx Reason 55(1, Part 2):101–115

    Article  MathSciNet  Google Scholar 

  34. Yu H, Zhang C, Wang G (2016) A tree-based incremental overlapping clustering method using the three-way decision theory. Knowl-Based Syst 91(1):189–203

    Article  Google Scholar 

  35. Yu H, Chen Y, Lingras P, Wang G (2019) A three-way cluster ensemble approach for large-scale data. Int J Approx Reason 115:32–49

    Article  MathSciNet  Google Scholar 

  36. Yu H, Wang X, Wang G, Zeng X (2020) An active three-way clustering method via low-rank matrices for multi-view data. Inf Sci 507:823–839

    Article  Google Scholar 

  37. Yu J, Cheng Q (2002) Search range of optimal cluster number in fuzzy clustering methods. Sci Chin Ser E Technol Sci 32:274–280 (in Chinese)

    Google Scholar 

  38. Zhang Q, Lv G, Chen Y, Wang G (2018) A dynamic three-way decision model based on the updating of attribute values. Knowl-Based Syst 142:71–84

    Article  Google Scholar 

  39. Zhang Y, Yao J (2017) Gini objective functions for three-way classifications. Int J Approx Reason 81:103–114

    Article  MathSciNet  Google Scholar 

  40. Zhang Y, Miao D, Zhang Z, Xu J, Luo S (2018) A three-way selective ensemble model for multi-label classification. Int J Approx Reason 103:394–413

    Article  MathSciNet  Google Scholar 

  41. Zhang Y, Miao D, Wang J, Zhang Z (2019) A cost-sensitive three-way combination technique for ensemble learning in sentiment classification. Int J Approx Reason 105:85–97

    Article  MathSciNet  Google Scholar 

  42. Zhang Y, Zhang Z, Miao D, Wang J (2019) Three-way enhanced convolutional neural networks for sentence-level sentiment classification. Inf Sci 477:55–64

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weiwei Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is supported by the National Natural Science Foundation of China (Grant Nos. 61773208, 61906090, 61876027 and 71671086), the Natural Science Foundation of Jiangsu Province (Grant No. BK20191287), the Natural Science Foundation of Anhui Province of China (Grant No. 1808085MF178), the Fundamental Research Funds for the Central Universities (Grant No. 30920021131), and the China Postdoctoral Science Foundation (Grant No. 2018M632304).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jia, X., Rao, Y., Li, W. et al. An automatic three-way clustering method based on sample similarity. Int. J. Mach. Learn. & Cyber. 12, 1545–1556 (2021). https://doi.org/10.1007/s13042-020-01255-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-020-01255-8

Keywords

Navigation