Abstract
In view of the fact that most clustering algorithms cannot solve the clustering problem about samples with uncertain information, according to the theory of fuzzy sets and probability, we define the fuzzy-probability binary measure space and triangular fuzzy normal random variables firstly, and then combine the advantages of k-means algorithm, such as simple principle, few parameters, fast convergence rate, good clustering effect and good scalability, etc., a clustering algorithm is proposed for samples containing multiple triangular fuzzy normal random variables, which we call TFNRV-k-means algorithm. The algorithm uses our proposed Euclidean random comprehensive absolute distance (ERCAD for short) as a measurement, under the fuzzy measure, the lower bound, the principal value and the upper bound of the triangular fuzzy normal random variables are iterated, respectively, by means, and then the cluster center is updated until it becomes stable and unchanged. Then we analyze the time complexity of the proposed algorithm, and test the algorithm under different sample sets by random simulation experiments. We get the highest clustering accuracy of 99.00% and the maximum Kappa coefficient of 0.9850, and draw the conclusion that TFNRV-k-means clustering algorithm has good clustering effect. Finally, we summarize the content of the article, list the advantages and disadvantages of TFNRV-k-means clustering algorithm, and propose corresponding improvement methods, which provide ideas for further research on TFNRV-k-means in the future.
Similar content being viewed by others
References
Han, Jiawei, Kamber, Micheline, Pei, Jian: Data Mining: Concept and Technique, 3rd edn. China Machine Press, Beijing (2012)
Everitt, B.S., Landau, S., Leese, M., et al.: Cluster Analysis, 5th edn. Wiley, London (2011)
Sun, J.G., Liu, J., Zhao, L.Y.: Clustering algorithms research. J. Softw. 01, 48–61 (2008)
Jain, A., Mutry, M., Flynn, P.: Data clustering. ACM Comput. Surv. 31(03), 48–61 (1999)
Sambasivam, S., Theodosopoulos, N.: Advanced data clustering methods of mining web documents. Issues Inform. Sci. Inf. Technol. 12(03), 563–579 (2006)
Jain, A., Duin, R., Mao, J.: Statistical pattern recognition: a review. IEEE Trans. Pattern Anal. Mach. Intell. 22(01), 4–37 (2000)
MacQueen J.B. Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, America, California, University of California Press, pp. 281–297 (1967)
Chen, Z.S., Chin, K.S., Li, Y.L.: A framework for triangular fuzzy random multiple-criteria decision making. Int. J. Fuzzy Syst. 18(02), 227–247 (2016)
Zhao, Y.P., Chen, L., Chen, C.L.P.: Fuzzy clustering in cascaded feature space. Int. J. Fuzzy Syst. 21(07), 2155–2167 (2019)
Azimpour, P., Shad, R., Ghaemi, M.: Hyperspectral image clustering with albedo recovery fuzzy c-means. Int. J. Remote Sens. 41(16), 6117–6134 (2020)
Zadeh, L.A.: Fuzzy sets. Inf. Control 8(03), 338–353 (1965)
Deng, J.L.: The grey control system. J. Huazhong Inst. Technol. 03, 9–18 (1982)
Pawlak, Z.: Rough sets. Inf. J. Inf. Comput. Sci. 11, 341–356 (1982)
Liu, S.F., Yang, Y.J., Wu, L.F., et al.: Grey system and its application, 7th edn. Science Press, Beijing (2014)
Zhao, R.: Clustering with kernel k-means and diffusion distance [D]. Department of Mathematical Sciences, Tsinghua University, Beijing (2011)
Bezdek, J.C.: Convergence theorem for the fuzzy ISODATA clustering algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 2(01), 1–8 (1980)
Arthur D, Vassilvitskii S. k-means++: the advantages of careful seeding. In: Proceedings of Eighteenth ACM-Siam Symposium on Discrete Algorithms, pp. 1027–1035. ACM Press, New Orleans (2007)
Sarma, T.H., Viswanath, P., Negi A:. Speeding-up the prototype based kernel k-means clustering method for large data sets. In: Proceedings of 2016 IEEE international joint conference on neural networks, pp. 1903–1910. IEEE Press, Vancouver (2016)
Bai, L., Liang, J.Y., Guo, Y.K.: An ensemble clusterer of multiple fuzzy k-means clusterings to recognize arbitrarily shaped clusters. IEEE Trans. Fuzzy Syst. 26(06), 3524–3533 (2018)
Burrough, P.A., van Gaans, P.F.M., MacMillan, R.A.: High-resolution landform classification using fuzzy k-means. Fuzzy Sets Syst. 113, 37–52 (2000)
Lingras, P., West, C.: Interval set clustering of web users with rough k-means. J. Intell. Inf. Syst. 23(01), 5–16 (2004)
Li, M.J., Ng, M.K., Cheung, Y.M., et al.: Agglomerative fuzzy k-means clustering algorithm with selection of number of clusters. IEEE Trans. Knowl. Data Eng. 20(11), 1519–1534 (2008)
Peters, G., Grespo, F., Lingras, P., et al.: Soft clustering-fuzzy and rough approaches and their extensions and derivatives. Int. J. Approx. Reason. 54(02), 307–322 (2013)
Chen, S.L., Li, J.G., Wang, X.G.: The theory of fuzzy sets and its application, 1st edn. Science Press, Beijing (2005)
Luo, C.Z.: Introduction to fuzzy sets (last volume). Beijing Normal University Press, Beijing (2007)
Li, D.F.: Fuzzy mulobjective many-person decision making and games. National Defense Industry Press, Beijing (2003)
Wang, F.S., Xu, D., Wu, W.X.: A cluster algorithm of automatic key frame extraction based on adaptive threshold. J. Comput. Res. Dev. 42(10), 1752–1757 (2005)
Al-Shammary, D., Khalil, I., Tari, Z., et al.: Fractal self-similarity measurements based clustering technique for SOAP Web message. J. Parallel Distrib. Comput. 73(05), 664–676 (2013)
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20(01), 37–46 (1960)
Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33(01), 159–174 (1977)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, Y., Chen, Y. & Li, Q. A Clustering Algorithm for Triangular Fuzzy Normal Random Variables. Int. J. Fuzzy Syst. 22, 2083–2100 (2020). https://doi.org/10.1007/s40815-020-00933-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40815-020-00933-7