Skip to main content
Log in

OPE-HCA: an optimal probabilistic estimation approach for hierarchical clustering algorithm

  • Theory and Applications of Soft Computing Methods
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The Survival of the Fittest is a principle which selects the superior and eliminates the inferior in the nature. This principle has been used in many fields, especially in optimization problem-solving. Clustering in data mining community endeavors to discover unknown representations or patterns hidden in datasets. Hierarchical clustering algorithm (HCA) is a method of cluster analysis which searches the optimal distribution of clusters by a hierarchical structure. Strategies for hierarchical clustering generally have two types: agglomerative with a bottom-up procedure and divisive with a top-down procedure. However, most of the clustering approaches have two disadvantages: the use of distance-based measurement and the difficulty of the clusters integration. In this paper, we propose an optimal probabilistic estimation (OPE) approach by exploiting the Survival of the Fittest principle. We devise a hierarchical clustering algorithm (HCA) based on OPE, also called OPE-HCA. The OPE-HCA combines optimization with probability and agglomerative HCA. Experimental results show that the OPE-HCA has the ability of searching and discovering patterns at different description levels and can also obtain better performance than many clustering algorithms according to NMI and clustering accuracy measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Tan P-N, Steinbach M, Kumar V (2005) Introduction to data mining. Pearson Addison Wesley, London

    Google Scholar 

  2. Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann, San Francisco

    MATH  Google Scholar 

  3. Aggarwal CC, Reddy CK (eds) (2013) Data clustering: algorithms and applications. CRC Press, Boca Raton, FL

    Google Scholar 

  4. MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Le Cam LM, Neyman J (eds) Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol I, statistics, 281–297

  5. Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics Philadelphia, PA, pp 1027–1035

  6. Nazeer KAA, Sebastian MP (2010) Clustering biological data using enhanced k-means algorithm. Electronic Engineering and Computing Technology. Springer, Berlin, pp 433–442

    Google Scholar 

  7. Kaufman L, Rousseeuw P (1990) Finding Groups in data: an introduction to cluster analysis. Wiley, New York

    Book  MATH  Google Scholar 

  8. Park HS, Jun CH (2009) A simple and fast algorithm for K-medoids clustering. Expert Syst Appl 36(2):3336–3341

    Article  Google Scholar 

  9. Bezdek JC, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10(2–3):191–203

    Article  Google Scholar 

  10. Pal NR, Pal K, Keller JM et al (2005) A possibilistic fuzzy c-means clustering algorithm. IEEE Trans Fuzzy Syst 13(4):517–530

    Article  Google Scholar 

  11. Zhang T, Ramakrishnan R, Livny M (1997) BIRCH: a new data clustering algorithm and its applications. Data Min Knowl Discov 1(2):141–182

    Article  Google Scholar 

  12. P. Smyth. Probabilistic model-based clustering of multivariate and sequential data. Proceedings of the Seventh International Workshop on AI and Statistics, San Francisco, CA: Morgan Kaufman, 1999: 299-304

  13. Cadez IV, Gaffney S, Smyth P (2000) A general probabilistic framework for clustering individuals and objects. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, 2000, pp 140–149

  14. Basu S, Bilenko M, Mooney RJ (2004) A probabilistic framework for semi-supervised clustering. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, 2004, pp 59–68

  15. Heller KA, Ghahramani Z (2005) Bayesian hierarchical clustering. In: Proceedings of the 22nd international conference on machine learning, Bonn, Germany, 2005, pp 297–304

  16. Papapetrou O, Siberski W, Fuhr N (2012) Decentralized probabilistic text clustering. IEEE Trans Knowl Data Eng 24(10):1848–1861

    Article  Google Scholar 

  17. Boudjeloud-Assala L (2012) Visual interactive evolutionary algorithm for high dimensional outlier detection and data clustering problems. Int J Bio-Inspir Comput 4(1):6–13

    Article  Google Scholar 

  18. Larrañaga P, Lozano JA (eds) (2002) Estimation of distribution algorithms: a new tool for evolutionary computation. Kluwer Academic Publishers, Boston

    MATH  Google Scholar 

  19. Furey E, Curran K, McKevitt P (2012) HABITS: a Bayesian filter approach to indoor tracking and location. Int J Bio-Inspir Comput 4(2):79–88

    Article  Google Scholar 

  20. Fan J, Liang Y, Xu Q, Jia R, Cui Z (2011) EDA-USL: unsupervised clustering algorithm based on estimation of distribution algorithm. Int J Wirel Mob Comput 5(1):88–97

    Article  Google Scholar 

  21. Fan J, Feng Z, Liu W et al (2014) Predicting yeast protein localization sites by a new clustering algorithm based on weighted feature ensemble. J Comput Theor Nanosci 11(6):1563–1568

    Article  Google Scholar 

  22. Yan D, Mukai H (1993) Optimization algorithm with probabilistic estimation. J Optim Theory Appl 79(2):345–371

    Article  MathSciNet  MATH  Google Scholar 

  23. Sánchez JA, Benedí JM (1997) Consistency of stochastic context-free grammars from probabilistic estimation based on growth transformations. IEEE Trans Pattern Anal Mach Intell 19(9):1052–1055

    Article  Google Scholar 

  24. Apte C, Grossman E, Pednault EP, Rosen BK, Tipu FA, White B (1999) Probabilistic estimation-based data mining for discovering insurance risks. IEEE Intell Syst 14(6):49–58

    Article  Google Scholar 

  25. Ferri C, Flach PA, Hernández-Orallo J (2003) Improving the AUC of probabilistic estimation trees. In: Machine learning: ECML 2003, pp 121–132. Springer, Berlin

  26. Jaulin L (2010) Probabilistic set-membership approach for robust regression. J Stat Theory Pract 4(1):155–167

    Article  MathSciNet  MATH  Google Scholar 

  27. Choi A, Woo W (2011) Multiple-criteria decision-making based on probabilistic estimation with contextual information for physiological signal monitoring. Int J Inf Technol Decis Mak 10(1):109–120

    Article  MathSciNet  Google Scholar 

  28. Han Y, Wen J, Cabric D, Villasenor JD (2011) Probabilistic estimation of the number of frequency-hopping transmitters. IEEE Trans Wirel Commun 10(10):3232–3240

    Article  Google Scholar 

  29. Jiang L, Cai Z, Wang D, Zhang H (2012) Improving Tree augmented Naive Bayes for class probability estimation. Knowl Based Syst 26(2):239–245

    Article  Google Scholar 

  30. Pimentel MA, Clifton DA, Clifton L, Tarassenko L (2013) Probabilistic estimation of respiratory rate using Gaussian processes. In: 2013 35th annual international conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp 2902–2905

  31. Duchi J, Wainwright MJ, Jordan MI (2013) Local privacy and minimax bounds: sharp rates for probability estimation. In: Advances in neural information processing systems, the 27th annual conference on neural information processing systems (NIPS 2013), Lake Tahoe, Nevada, pp 1529–1537

  32. Azad R, Davami F (2014) A robust and adaptable method for face detection based on color probabilistic estimation technique. arXiv preprint arXiv:1407.6318

  33. Friedman N (2003) Pcluster: probabilistic agglomerative clustering of gene expression profiles. Technical Report 2003-80, Hebrew University

  34. Segal E, Koller D (2002) Probabilistic hierarchical clustering for biological data. In: Proceedings of the sixth annual international conference on computational biology, ACM, pp 273–280

  35. Fan J, Xu Q, Liang Y (2012) A novel classification learning framework based on estimation of distribution algorithms. Int J Comput Sci Math 3(4):353–366

    Article  MathSciNet  Google Scholar 

  36. Hauschild M, Pelikan M (2011) An introduction and survey of estimation of distribution algorithms. Swarm Evolut Comput 1(3):111–128

    Article  Google Scholar 

  37. Newman DJ, Hettich S, Blake CL, Merz CJ (1998) UCI repository of machine learning databases. University of California, Department of Information and Computer Science, Irvine, CA. http://www.ics.uci.edu/~mlearn/MLRepository.html

  38. Cover TM, Thomas JA (1991) Elements of information theory. Wiley, New York

    Book  MATH  Google Scholar 

  39. Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315:972–976

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The author wishes to thank the anonymous reviewers and the JEO Assistant for their constructive comments and suggestions. The author thank the students of the laboratory Zheng Feng, Wenhua Liu, Yuhao Cai, and Tianyi Liang for participating in the experiment. This paper is supported by National Natural Science Foundation of China under Grant 61203305 and Shandong Provincial Natural Science Foundation of China under Grant ZR2012FM003.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiancong Fan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fan, J. OPE-HCA: an optimal probabilistic estimation approach for hierarchical clustering algorithm. Neural Comput & Applic 31, 2095–2105 (2019). https://doi.org/10.1007/s00521-015-1998-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-015-1998-5

Keywords

Navigation