Abstract
This paper proposes a new clustering technique for handling a categorical data called Parametric Soft set (PSS). It bases on statistical distribution namely multinomial multivariate function. The probability of the data category with binary value can be calculated by binomial distribution. Its generalization called multinomial distribution function for data category with multivariate values. Firstly, the data is represented as multi soft set where every object in each soft set has its probability. The probability of each object is calculated by cluster joint distribution function following the multivariate multinomial distribution function. The highest probability will be assigned to the related cluster. The first experiment is conducted to estimate the parameter of the data drawn from random multivariate mixtures distribution. While the second experiment is evaluated the processing times, purity and rand index using benchmarks datasets. The experiment results show that the proposed approach has improved the processing times up to 92.96%. It also has better performance in term of purity and rand index and error mean of the estimation parameters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Haixia, X.U., Zheng, T.: An optimal spectral clustering approach based on Cauchy-Schwarz divergence. Chin. J. Electron. 18(1), 105–108 (2009)
Von Luxburg, U., Williamson, R.C., Guyon, I.: Clustering: science or art? J. Mach. Learn. Res. Proc. Track 27, 65–80 (2012)
Leopold, N., Rose, O.: UNIC: a fast nonparametric clustering. Pattern Recognit. 100, 107117 (2020)
Nooraeni, R., Arsa, M.I., Kusumo Projo, N.W.: Fuzzy centroid and genetic algorithms: solutions for numeric and categorical mixed data clustering. Procedia Comput. Sci. 179(2020), 677–684 (2021)
Golzari Oskouei, A., Balafar, M.A., Motamed, C.: FKMAWCW: categorical fuzzy k-modes clustering with automated attribute-weight and cluster-weight learning. Chaos Solitons Fractals 153, 111494 (2021)
Kuo, R.J., Zheng, Y.R., Nguyen, T.P.Q.: Metaheuristic-based possibilistic fuzzy k-modes algorithms for categorical data clustering. Inf. Sci. (Ny) 557, 1–15 (2021)
Hennig, C.: What are the true clusters? Pattern Recognit. Lett. 64, 53–62 (2015)
Li, J., Ray, S., Lindsay, B.G.: A nonparametric statistical approach to clustering via mode identification. J. Mach. Learn. Res. 8, 1687–1723 (2007)
Ramos Emmendorfer, L., de Paula Canuto, A.M.: A generalized average linkage criterion for hierarchical agglomerative clustering. Appl. Soft Comput. 100, 106990 (2021)
Bi, X., Luo, X., Sun, Q.: Branch tire packet classification algorithm based on single-linkage clustering. Math. Comput. Simul. 155, 78–91 (2019)
Schmidt, M., Kutzner, A., Heese, K.: A novel specialized single-linkage clustering algorithm for taxonomically ordered data. J. Theor. Biol. 427, 1–7 (2017)
Xiong, Y., et al.: A spectra partition algorithm based on spectral clustering for interval variable selection. Infrared Phys. Technol. 105, 103259 (2020)
Nguyen, T.P.Q., Kuo, R.J.: Partition-and-merge based fuzzy genetic clustering algorithm for categorical data. Appl. Soft Comput. J. 75, 254–264 (2019)
Sinharay, S.: Discrete probability distributions. Int. Encycl. Educ., 132–134 (2010). https://doi.org/10.1016/B978-0-08-044894-7.01721-8
Herawan, T., Deris, M.M.: On multi-soft sets construction in information systems. In: Huang, DS., Jo, KH., Lee, HH., Kang, HJ., Bevilacqua, V. (eds.) ICIC 2009. LNCS, vol. 5755. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04020-7_12
Malefaki, S., Iliopoulos, G.: Simulating from a multinomial distribution with large number of categories. Comput. Stat. Data Anal. 51(12), 5471–5476 (2007)
Molodtsov, D.: Soft set theory—first results. Comput. Math. with Appl. 37(4–5), 19–31 (1999)
Maji, P.K., Biswas, R., Roy, A.R.: Soft set theory. Comput. Math. with Appl. 45(4–5), 555–562 (2003)
Yang, M.S., Chiang, Y.H., Chen, C.C., Lai, C.Y.: A fuzzy k-partitions model for categorical data and its comparison to the GoM model. Fuzzy Sets Syst. 159(4), 390–405 (2008)
Dheeru, D., Karra Taniskidou, E.: UCI Machine Learning Repository (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Yanto, I.T.R., Deris, M.M., Senan, N. (2022). PSS: New Parametric Based Clustering for Data Category. In: Ghazali, R., Mohd Nawi, N., Deris, M.M., Abawajy, J.H., Arbaiy, N. (eds) Recent Advances in Soft Computing and Data Mining. SCDM 2022. Lecture Notes in Networks and Systems, vol 457. Springer, Cham. https://doi.org/10.1007/978-3-031-00828-3_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-00828-3_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-00827-6
Online ISBN: 978-3-031-00828-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)