Abstract
The three-way clustering is an extension of traditional clustering by adding the concept of fringe region, which can effectively solve the problem of inaccurate decision-making caused by inaccurate information or insufficient data in traditional two-way clustering methods. The existing three-way clustering works often select the appropriate number of clusters and the thresholds for three-way partition according to subjective tuning. However, the method of fixing the number of clusters and the thresholds of the partition cannot automatically select the optimal number of clusters and partition thresholds for different data sets with different sizes and densities. To address the above problem, this paper proposed an improved three-way clustering method. First, we define the roughness degree by introducing the sample similarity to measure the uncertainty of the fringe region. Moreover, based on the roughness degree, we define a novel partitioning validity index to measure the clustering partitions and propose an automatic threshold selection method. Second, based on the concept of sample similarity, we introduce the intra-class similarity and the inter-class similarity to describe the quantitative change of the relationship between the sample and the clusters, and define a novel clustering validity index to measure the clustering performance under different numbers of clusters through the integration of the above two kinds of similarities. Furthermore, we propose an automatic cluster number selection method. Finally, we give an automatic three-way clustering approach by combining the proposed threshold selection method and the cluster number selection method. The comparison experiments demonstrate the effectiveness of our proposal.


Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Afridi MK, Azam N, Yao J, Alanazi E (2018) A three-way clustering approach for handling missing data using GTRS. Int J Approx Reason 98:11–24
Dunn J (1973) A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. J Cybern 3(3):32–57
Friedman HP, Rubin J (1967) On some invariant criteria for grouping data. J Am Stat Assoc 62(320):1159–1178
Gu Y, Jia X, Shang L (2015) Three-way decisions based bayesian network. In: Proceedings of the IEEE international conference on progress in informatics and computing (PIC), pp 51–55
Hu B (2017) Three-way decisions based on semi-three-way decision spaces. Inf Sci 382–383:415–440
Jain AK, Murty MN, Flynn PJ (1999) ACM Comput Surv 31:264–323
Jia X, Shang L (2014) Three-way decisions versus two-way decisions on filtering spam email. In: Transactions on rough sets XVIII, pp 69–91
Jia X, Liao W, Tang Z, Shang L (2013) Minimum cost attribute reduction in decision-theoretic rough set models. Inf Sci 219:151–167
Jia X, Shang L, Zhou B, Yao Y (2016) Generalized attribute reduct in rough set theory. Knowl Based Syst 91:204–218
Jia X, Li W, Shang L (2019) A multiphase cost-sensitive learning method based on the multiclass three-way decision-theoretic rough set model. Inf Sci 485:248–262
Jia X, Rao Y, Shang L, Li T (2020) Similarity-based attribute reduction in rough set theory: a clustering perspective. Int J Mach Learn Cybernet 11:1047–1060
Li H, Zhang L, Zhou X, Huang B (2017) Cost-sensitive sequential three-way decision modeling using a deep neural network. Int J Approx Reason 85:68–78
Li J, Huang C, Qi J, Qian Y, Liu W (2017) Three-way cognitive concept learning via multi-granularity. Inf Sci 378:244–263
Li W, Huang Z, Jia X (2013) Two-phase classification based on three-way decisions. In: Proceedings of the international conference on rough sets and knowledge technology, pp 338–345
Li W, Huang Z, Jia X, Cai X (2016) Neighborhood based decision-theoretic rough set models. Int J Approx Reason 69:1–17
Li W, Huang Z, Li Q (2016) Three-way decisions based software defect prediction. Knowl-Based Syst 91:263–274
Li W, Jia X, Wang L, Zhou B (2019) Multi-objective attribute reduction in three-way decision-theoretic rough set model. Int J Approx Reason 105:327–341
Li X, Yi H, She Y, Sun B (2017) Generalized three-way decision models based on subset evaluation. Int J Approx Reason 83:142–159
Li Y, Zhang L, Xu Y, Yao Y, Lau RYK, Wu Y (2017) Enhancing binary classification by modeling uncertain boundary in three-way decisions. IEEE Trans Knowl Data Eng 29(7):1438–1451
Liang D, Xu Z, Liu D (2017) Three-way decisions with intuitionistic fuzzy decision-theoretic rough sets based on point operators. Inf Sci 375:183–201
Lingras P, Yan R, West C (2003) Comparison of conventional and rough k-means clustering. In: Proceedings of the international conference on rough sets, fuzzy sets, data mining, and granular computing, pp 130–137
Liu D, Liang D (2014) An overview of function based three-way decisions. In: Proceedings of the international conference on rough sets and knowledge technology, pp 812–823
Min F, Liu F, Wen L, Zhang Z (2019) Tri-partition cost-sensitive active learning through kNN. Soft Comput 23:1557–1572
Pawlak Z (1982) Rough sets. Int J Comput Inform Sci 11(5):341–356
Peters G, Crespo F, Lingras P, Weber R (2013) Soft clustering - fuzzy and rough approaches and their extensions and derivatives. Int J Approx Reason 54(2):307–322
Qi J, Qian T, Wei L (2016) The connections between three-way and classical concept lattices. Knowl-Based Syst 91:143–151 three-way Decisions and Granular Computing
Qian T, Wei L, Qi J (2017) Constructing three-way concept lattices based on apposition and subposition of formal contexts. Knowl-Based Syst 116:39–48
Yao J, Azam N (2015) Web-based medical decision support systems for three-way medical decision making with game-theoretic rough sets. IEEE Trans Fuzzy Syst 23(1):3–15
Yao Y (2010) Three-way decisions with probabilistic rough sets. Inf Sci 180:341–353
Yao Y (2018) Three-way decision and granular computing. Int J Approx Reason 103:107–123
Yu H (2018) Three-way decisions and three-way clustering. In: Proceedings of the international joint conference on rough sets, pp 13–28
Yu H, Wang Y (2012) Three-way decisions method for overlapping clustering. In: Proceedings of international conference on rough sets and current trends in computing, pp 277–286
Yu H, Liu Z, Wang G (2014) An automatic method to determine the number of clusters using decision-theoretic rough set. Int J Approx Reason 55(1, Part 2):101–115
Yu H, Zhang C, Wang G (2016) A tree-based incremental overlapping clustering method using the three-way decision theory. Knowl-Based Syst 91(1):189–203
Yu H, Chen Y, Lingras P, Wang G (2019) A three-way cluster ensemble approach for large-scale data. Int J Approx Reason 115:32–49
Yu H, Wang X, Wang G, Zeng X (2020) An active three-way clustering method via low-rank matrices for multi-view data. Inf Sci 507:823–839
Yu J, Cheng Q (2002) Search range of optimal cluster number in fuzzy clustering methods. Sci Chin Ser E Technol Sci 32:274–280 (in Chinese)
Zhang Q, Lv G, Chen Y, Wang G (2018) A dynamic three-way decision model based on the updating of attribute values. Knowl-Based Syst 142:71–84
Zhang Y, Yao J (2017) Gini objective functions for three-way classifications. Int J Approx Reason 81:103–114
Zhang Y, Miao D, Zhang Z, Xu J, Luo S (2018) A three-way selective ensemble model for multi-label classification. Int J Approx Reason 103:394–413
Zhang Y, Miao D, Wang J, Zhang Z (2019) A cost-sensitive three-way combination technique for ensemble learning in sentiment classification. Int J Approx Reason 105:85–97
Zhang Y, Zhang Z, Miao D, Wang J (2019) Three-way enhanced convolutional neural networks for sentence-level sentiment classification. Inf Sci 477:55–64
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work is supported by the National Natural Science Foundation of China (Grant Nos. 61773208, 61906090, 61876027 and 71671086), the Natural Science Foundation of Jiangsu Province (Grant No. BK20191287), the Natural Science Foundation of Anhui Province of China (Grant No. 1808085MF178), the Fundamental Research Funds for the Central Universities (Grant No. 30920021131), and the China Postdoctoral Science Foundation (Grant No. 2018M632304).
Rights and permissions
About this article
Cite this article
Jia, X., Rao, Y., Li, W. et al. An automatic three-way clustering method based on sample similarity. Int. J. Mach. Learn. & Cyber. 12, 1545–1556 (2021). https://doi.org/10.1007/s13042-020-01255-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-020-01255-8