Skip to main content
Log in

Balance-driven automatic clustering for probability density functions using metaheuristic optimization

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

For solving the clustering for probability density functions (CDF) problem with a given number of clusters, the metaheuristic optimization (MO) algorithms have been widely studied because of their advantages in searching for the global optimum. However, the existing approaches cannot be directly extended to the automatic CDF problem for determining the number of clusters k. Besides, balance-driven clustering, an essential research direction recently developed in the problem of discrete-element clustering, has not been considered in the field of CDF. This paper pioneers a technique to apply an MO algorithm for resolving the balance-driven automatic CDF. The proposed method not only can automatically determine the number of clusters but also can approximate the global optimal solution in which both the clustering compactness and the clusters’ size similarity are considered. The experiments on one-dimensional and multidimensional probability density functions demonstrate that the new method possesses higher quality clustering solutions than the other conventional techniques. The proposed method is also applied in analyzing the difficulty levels of entrance exam questions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

References

  1. Banerjee A, Ghosh J (2004) Frequency-sensitive competitive learning for scalable balanced clustering on high-dimensional hyperspheres. IEEE Trans Neural Netw 15(3):702–719

    Article  Google Scholar 

  2. Bezdek JC, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10(2–3):191–203

    Article  Google Scholar 

  3. Chen JH, Hung WL (2015) An automatic clustering algorithm for probability density functions. J Stat Comput Simul 85(15):3047–3063

    Article  MathSciNet  MATH  Google Scholar 

  4. Chen JH, Hung WL (2021) A jackknife entropy-based clustering algorithm for probability density functions. J Stat Comput Simul 91(5):861–875

    Article  MathSciNet  MATH  Google Scholar 

  5. Chen TL, Shiu SY (2007) A new clustering algorithm based on self-updating process. In: JSM proceedings, statistical computing section, Salt Lake City, Utah, pp 2034–2038

  6. Chen J, Chang Y, Hung W (2018) A robust automatic clustering algorithm for probability density functions with application to categorizing color images. Commun Stat Simul Comput 47(7):2152–2168

    Article  MathSciNet  MATH  Google Scholar 

  7. Costa LR, Aloise D, Mladenovic N (2017) Less is more: basic variable neighborhood search heuristic for balanced minimum sum-of-squares clustering. Inf Sci 415:247–253

    Article  Google Scholar 

  8. Deep K, Singh KP, Kansal ML et al (2009) A real coded genetic algorithm for solving integer and mixed integer optimization problems. Appl Math Comput 212(2):505–518

    MathSciNet  MATH  Google Scholar 

  9. Demiriz A, Bennett KP, Bradley PS (2008) Using assignment constraints to avoid empty clusters in k-means clustering. Constrained clustering: advances in algorithms, theory, and applications, p 201

  10. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc: Ser B (Methodol) 39(1):1–22

    MathSciNet  MATH  Google Scholar 

  11. Diem HK, Trung VD, Trung NT et al (2018) A differential evolution-based clustering for probability density functions. IEEE Access 6:41325–41336

    Article  Google Scholar 

  12. Elsisi M (2019) Future search algorithm for optimization. Evol Intel 12(1):21–31

    Article  Google Scholar 

  13. Ester M, Kriegel HP, Sander J et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Kdd, pp 226–231

  14. Everitt BS (1985) Mixture distributions-I. Encyclopedia of statistical sciences

  15. Fayyad UM, Reina C, Bradley PS (1998) Initialization of iterative refinement clustering algorithms. In: KDD, pp 194–198

  16. Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(2):179–188

    Article  Google Scholar 

  17. Fukunaga K (2013) Introduction to statistical pattern recognition. Academic Press Inc, San Diego

    MATH  Google Scholar 

  18. Goh A, Vidal R (2008) Unsupervised Riemannian clustering of probability density functions. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, pp 377–392

  19. Hellinger E (1909) Neue begründung der theorie quadratischer formen von unendlichvielen veränderlichen. Journal für die Reine und Angewandte Mathematik 1909(136):210–271

    Article  MATH  Google Scholar 

  20. Ho-Kieu D, Vo-Van T, Nguyen-Trang T (2018) Clustering for probability density functions by new-medoids method. Scientific Programming

  21. Holland JH et al (1992) Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT Press, London

    Book  Google Scholar 

  22. Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall Inc, Hoboken

    MATH  Google Scholar 

  23. Kaufmann L (1987) Clustering by means of medoids. In: Proc. Statistical Data Analysis Based on the L1 Norm Conference, Neuchatel, pp 405–416

  24. Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95-international conference on neural networks, IEEE, pp 1942–1948

  25. Kim J, Billard L (2018) Double monothetic clustering for histogram-valued data. Commun Stat Appl Methods 25(3):263–274

    Google Scholar 

  26. Lebesgue H (1902) Intégrale, longueur, aire. Annali di Matematica Pura ed Applicata (1898-1922) 7(1):231–359

    Article  MATH  Google Scholar 

  27. Li L, Zhou X, Li Y et al (2020) An improved genetic algorithm with Lagrange and density method for clustering. Concurr Comput Pract Exp 32(24):e5969

    Article  Google Scholar 

  28. Liao Y, Qi H, Li W (2012) Load-balanced clustering algorithm with distributed self-organization for wireless sensor networks. IEEE Sens J 13(5):1498–1506

    Article  Google Scholar 

  29. Liu H, Han J, Nie F et al (2017) Balanced clustering with least square regression. In: Proceedings of the AAAI Conference on Artificial Intelligence

  30. MacQueen J et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Oakland, CA, USA, pp 281–297

  31. Malinen MI, Fränti P (2014) Balanced k-means for clustering. In: Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR), Springer, pp 32–41

  32. Matusita K (1967) On the notion of affinity of several distributions and some of its applications. Ann Inst Stat Math 19(1):181–192

    Article  MathSciNet  MATH  Google Scholar 

  33. Montanari A, Calò DG (2013) Model-based clustering of probability density functions. Adv Data Anal Classif 7(3):301–319

    Article  MathSciNet  MATH  Google Scholar 

  34. Mukhopadhyay A, Maulik U, Bandyopadhyay S (2015) A survey of multiobjective evolutionary clustering. ACM Comput Surv (CSUR) 47(4):1–46

    Article  Google Scholar 

  35. Nguyen-Trang T, Nguyen-Thoi T, Truong-Khac T et al (2019) An efficient hybrid optimization approach using adaptive elitist differential evolution and spherical quadratic steepest descent and its application for clustering. Scientific Programming

  36. Pham-Toan D, Vo-Van T, Pham-Chau A et al (2019) A new binary adaptive elitist differential evolution based automatic k-medoids clustering for probability density functions. Mathematical Problems in Engineering

  37. Storn R, Price K (1997) Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J Global Optim 11(4):341–359

    Article  MathSciNet  MATH  Google Scholar 

  38. Tai V, Thao N, Ha C (2016) Clustering for probability density functions based on genetic algorithm. In: Applied Mathematics in Engineering and Reliability, Proceedings of the 1st International Conference on Applied Mathematics in Engineering and Reliability (Ho Chi Minh City, Vietnam, May 2016), pp 51–57

  39. Toussaint GT (1972) Feature evaluation criteria and contextual decoding algorithms in statistical pattern recognition. PhD thesis, University of British Columbia

  40. Van Vo T, Pham-Gia T (2010) Clustering probability distributions. J Appl Stat 37(11):1891–1910

    Article  MathSciNet  MATH  Google Scholar 

  41. Vo-Van T, Nguyen-Thoi T, Vo-Duy T et al (2017) Modified genetic algorithm-based clustering for probability density functions. J Stat Comput Simul. https://doi.org/10.1080/00949655.2017.1300663

    Article  MathSciNet  MATH  Google Scholar 

  42. Vo-Van T, Nguyen-Hai A, Tat-Hong M et al (2020) A new clustering algorithm and its application in assessing the quality of underground water. Scientific Programming

  43. Vovan T (2019) Cluster width of probability density functions. Intell Data Anal 23(2):385–405

    Article  MathSciNet  Google Scholar 

  44. VoVan T, NguyenTrang T (2018) Similar coefficient for cluster of probability density functions. Commun Stat Theory Methods 47(8):1792–1811

    Article  MathSciNet  MATH  Google Scholar 

  45. Webb AR (2003) Statistical pattern recognition. Wiley, England

    MATH  Google Scholar 

  46. Xu L, Hu Q, Hung E et al (2015) Large margin clustering on uncertain data by considering probability distribution similarity. Neurocomputing 158:81–89

    Article  Google Scholar 

  47. Zhang Y, Wang JZ, Li J (2015) Parallel massive clustering of discrete distributions. ACM Trans Multimed Comput Commun Appl (TOMM) 11(4):1–24

    Article  Google Scholar 

  48. Zhou Q, Hao JK, Wu Q (2021) Responsive threshold search based memetic algorithm for balanced minimum sum-of-squares clustering. Inf Sci 569:184–204

    Article  MathSciNet  Google Scholar 

  49. Zong Y, Xu G, Zhang Y et al (2010) A robust iterative refinement clustering algorithm with smoothing search space. Knowl-Based Syst 23(5):389–396

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tai Vo-Van.

Ethics declarations

Conflict of interest

No potential conflict of interest was reported by the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nguyen-Trang, T., Nguyen-Thoi, T., Nguyen-Thi, KN. et al. Balance-driven automatic clustering for probability density functions using metaheuristic optimization. Int. J. Mach. Learn. & Cyber. 14, 1063–1078 (2023). https://doi.org/10.1007/s13042-022-01683-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-022-01683-8

Keywords

Navigation