Skip to main content

PSS: New Parametric Based Clustering for Data Category

  • Conference paper
  • First Online:
Recent Advances in Soft Computing and Data Mining (SCDM 2022)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 457))

Included in the following conference series:

  • 242 Accesses

Abstract

This paper proposes a new clustering technique for handling a categorical data called Parametric Soft set (PSS). It bases on statistical distribution namely multinomial multivariate function. The probability of the data category with binary value can be calculated by binomial distribution. Its generalization called multinomial distribution function for data category with multivariate values. Firstly, the data is represented as multi soft set where every object in each soft set has its probability. The probability of each object is calculated by cluster joint distribution function following the multivariate multinomial distribution function. The highest probability will be assigned to the related cluster. The first experiment is conducted to estimate the parameter of the data drawn from random multivariate mixtures distribution. While the second experiment is evaluated the processing times, purity and rand index using benchmarks datasets. The experiment results show that the proposed approach has improved the processing times up to 92.96%. It also has better performance in term of purity and rand index and error mean of the estimation parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 299.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Haixia, X.U., Zheng, T.: An optimal spectral clustering approach based on Cauchy-Schwarz divergence. Chin. J. Electron. 18(1), 105–108 (2009)

    Google Scholar 

  2. Von Luxburg, U., Williamson, R.C., Guyon, I.: Clustering: science or art? J. Mach. Learn. Res. Proc. Track 27, 65–80 (2012)

    Google Scholar 

  3. Leopold, N., Rose, O.: UNIC: a fast nonparametric clustering. Pattern Recognit. 100, 107117 (2020)

    Google Scholar 

  4. Nooraeni, R., Arsa, M.I., Kusumo Projo, N.W.: Fuzzy centroid and genetic algorithms: solutions for numeric and categorical mixed data clustering. Procedia Comput. Sci. 179(2020), 677–684 (2021)

    Google Scholar 

  5. Golzari Oskouei, A., Balafar, M.A., Motamed, C.: FKMAWCW: categorical fuzzy k-modes clustering with automated attribute-weight and cluster-weight learning. Chaos Solitons Fractals 153, 111494 (2021)

    Google Scholar 

  6. Kuo, R.J., Zheng, Y.R., Nguyen, T.P.Q.: Metaheuristic-based possibilistic fuzzy k-modes algorithms for categorical data clustering. Inf. Sci. (Ny) 557, 1–15 (2021)

    Article  MathSciNet  Google Scholar 

  7. Hennig, C.: What are the true clusters? Pattern Recognit. Lett. 64, 53–62 (2015)

    Article  Google Scholar 

  8. Li, J., Ray, S., Lindsay, B.G.: A nonparametric statistical approach to clustering via mode identification. J. Mach. Learn. Res. 8, 1687–1723 (2007)

    MathSciNet  MATH  Google Scholar 

  9. Ramos Emmendorfer, L., de Paula Canuto, A.M.: A generalized average linkage criterion for hierarchical agglomerative clustering. Appl. Soft Comput. 100, 106990 (2021)

    Google Scholar 

  10. Bi, X., Luo, X., Sun, Q.: Branch tire packet classification algorithm based on single-linkage clustering. Math. Comput. Simul. 155, 78–91 (2019)

    Article  MathSciNet  Google Scholar 

  11. Schmidt, M., Kutzner, A., Heese, K.: A novel specialized single-linkage clustering algorithm for taxonomically ordered data. J. Theor. Biol. 427, 1–7 (2017)

    Article  Google Scholar 

  12. Xiong, Y., et al.: A spectra partition algorithm based on spectral clustering for interval variable selection. Infrared Phys. Technol. 105, 103259 (2020)

    Google Scholar 

  13. Nguyen, T.P.Q., Kuo, R.J.: Partition-and-merge based fuzzy genetic clustering algorithm for categorical data. Appl. Soft Comput. J. 75, 254–264 (2019)

    Google Scholar 

  14. Sinharay, S.: Discrete probability distributions. Int. Encycl. Educ., 132–134 (2010). https://doi.org/10.1016/B978-0-08-044894-7.01721-8

  15. Herawan, T., Deris, M.M.: On multi-soft sets construction in information systems. In: Huang, DS., Jo, KH., Lee, HH., Kang, HJ., Bevilacqua, V. (eds.) ICIC 2009. LNCS, vol. 5755. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04020-7_12

  16. Malefaki, S., Iliopoulos, G.: Simulating from a multinomial distribution with large number of categories. Comput. Stat. Data Anal. 51(12), 5471–5476 (2007)

    Article  MathSciNet  Google Scholar 

  17. Molodtsov, D.: Soft set theory—first results. Comput. Math. with Appl. 37(4–5), 19–31 (1999)

    Article  MathSciNet  Google Scholar 

  18. Maji, P.K., Biswas, R., Roy, A.R.: Soft set theory. Comput. Math. with Appl. 45(4–5), 555–562 (2003)

    Article  MathSciNet  Google Scholar 

  19. Yang, M.S., Chiang, Y.H., Chen, C.C., Lai, C.Y.: A fuzzy k-partitions model for categorical data and its comparison to the GoM model. Fuzzy Sets Syst. 159(4), 390–405 (2008)

    Article  MathSciNet  Google Scholar 

  20. Dheeru, D., Karra Taniskidou, E.: UCI Machine Learning Repository (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Iwan Tri Riyadi Yanto .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yanto, I.T.R., Deris, M.M., Senan, N. (2022). PSS: New Parametric Based Clustering for Data Category. In: Ghazali, R., Mohd Nawi, N., Deris, M.M., Abawajy, J.H., Arbaiy, N. (eds) Recent Advances in Soft Computing and Data Mining. SCDM 2022. Lecture Notes in Networks and Systems, vol 457. Springer, Cham. https://doi.org/10.1007/978-3-031-00828-3_2

Download citation

Publish with us

Policies and ethics