OPE-HCA: an optimal probabilistic estimation approach for hierarchical clustering algorithm

Fan, Jiancong

doi:10.1007/s00521-015-1998-5

OPE-HCA: an optimal probabilistic estimation approach for hierarchical clustering algorithm

Theory and Applications of Soft Computing Methods
Published: 05 August 2015

Volume 31, pages 2095–2105, (2019)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Jiancong Fan^1,2,3

667 Accesses
19 Citations
Explore all metrics

Abstract

The Survival of the Fittest is a principle which selects the superior and eliminates the inferior in the nature. This principle has been used in many fields, especially in optimization problem-solving. Clustering in data mining community endeavors to discover unknown representations or patterns hidden in datasets. Hierarchical clustering algorithm (HCA) is a method of cluster analysis which searches the optimal distribution of clusters by a hierarchical structure. Strategies for hierarchical clustering generally have two types: agglomerative with a bottom-up procedure and divisive with a top-down procedure. However, most of the clustering approaches have two disadvantages: the use of distance-based measurement and the difficulty of the clusters integration. In this paper, we propose an optimal probabilistic estimation (OPE) approach by exploiting the Survival of the Fittest principle. We devise a hierarchical clustering algorithm (HCA) based on OPE, also called OPE-HCA. The OPE-HCA combines optimization with probability and agglomerative HCA. Experimental results show that the OPE-HCA has the ability of searching and discovering patterns at different description levels and can also obtain better performance than many clustering algorithms according to NMI and clustering accuracy measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Article 27 November 2022

References

Tan P-N, Steinbach M, Kumar V (2005) Introduction to data mining. Pearson Addison Wesley, London
Google Scholar
Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann, San Francisco
MATH Google Scholar
Aggarwal CC, Reddy CK (eds) (2013) Data clustering: algorithms and applications. CRC Press, Boca Raton, FL
Google Scholar
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Le Cam LM, Neyman J (eds) Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol I, statistics, 281–297
Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics Philadelphia, PA, pp 1027–1035
Nazeer KAA, Sebastian MP (2010) Clustering biological data using enhanced k-means algorithm. Electronic Engineering and Computing Technology. Springer, Berlin, pp 433–442
Google Scholar
Kaufman L, Rousseeuw P (1990) Finding Groups in data: an introduction to cluster analysis. Wiley, New York
Book MATH Google Scholar
Park HS, Jun CH (2009) A simple and fast algorithm for K-medoids clustering. Expert Syst Appl 36(2):3336–3341
Article Google Scholar
Bezdek JC, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10(2–3):191–203
Article Google Scholar
Pal NR, Pal K, Keller JM et al (2005) A possibilistic fuzzy c-means clustering algorithm. IEEE Trans Fuzzy Syst 13(4):517–530
Article Google Scholar
Zhang T, Ramakrishnan R, Livny M (1997) BIRCH: a new data clustering algorithm and its applications. Data Min Knowl Discov 1(2):141–182
Article Google Scholar
P. Smyth. Probabilistic model-based clustering of multivariate and sequential data. Proceedings of the Seventh International Workshop on AI and Statistics, San Francisco, CA: Morgan Kaufman, 1999: 299-304
Cadez IV, Gaffney S, Smyth P (2000) A general probabilistic framework for clustering individuals and objects. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, 2000, pp 140–149
Basu S, Bilenko M, Mooney RJ (2004) A probabilistic framework for semi-supervised clustering. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, 2004, pp 59–68
Heller KA, Ghahramani Z (2005) Bayesian hierarchical clustering. In: Proceedings of the 22nd international conference on machine learning, Bonn, Germany, 2005, pp 297–304
Papapetrou O, Siberski W, Fuhr N (2012) Decentralized probabilistic text clustering. IEEE Trans Knowl Data Eng 24(10):1848–1861
Article Google Scholar
Boudjeloud-Assala L (2012) Visual interactive evolutionary algorithm for high dimensional outlier detection and data clustering problems. Int J Bio-Inspir Comput 4(1):6–13
Article Google Scholar
Larrañaga P, Lozano JA (eds) (2002) Estimation of distribution algorithms: a new tool for evolutionary computation. Kluwer Academic Publishers, Boston
MATH Google Scholar
Furey E, Curran K, McKevitt P (2012) HABITS: a Bayesian filter approach to indoor tracking and location. Int J Bio-Inspir Comput 4(2):79–88
Article Google Scholar
Fan J, Liang Y, Xu Q, Jia R, Cui Z (2011) EDA-USL: unsupervised clustering algorithm based on estimation of distribution algorithm. Int J Wirel Mob Comput 5(1):88–97
Article Google Scholar
Fan J, Feng Z, Liu W et al (2014) Predicting yeast protein localization sites by a new clustering algorithm based on weighted feature ensemble. J Comput Theor Nanosci 11(6):1563–1568
Article Google Scholar
Yan D, Mukai H (1993) Optimization algorithm with probabilistic estimation. J Optim Theory Appl 79(2):345–371
Article MathSciNet MATH Google Scholar
Sánchez JA, Benedí JM (1997) Consistency of stochastic context-free grammars from probabilistic estimation based on growth transformations. IEEE Trans Pattern Anal Mach Intell 19(9):1052–1055
Article Google Scholar
Apte C, Grossman E, Pednault EP, Rosen BK, Tipu FA, White B (1999) Probabilistic estimation-based data mining for discovering insurance risks. IEEE Intell Syst 14(6):49–58
Article Google Scholar
Ferri C, Flach PA, Hernández-Orallo J (2003) Improving the AUC of probabilistic estimation trees. In: Machine learning: ECML 2003, pp 121–132. Springer, Berlin
Jaulin L (2010) Probabilistic set-membership approach for robust regression. J Stat Theory Pract 4(1):155–167
Article MathSciNet MATH Google Scholar
Choi A, Woo W (2011) Multiple-criteria decision-making based on probabilistic estimation with contextual information for physiological signal monitoring. Int J Inf Technol Decis Mak 10(1):109–120
Article MathSciNet Google Scholar
Han Y, Wen J, Cabric D, Villasenor JD (2011) Probabilistic estimation of the number of frequency-hopping transmitters. IEEE Trans Wirel Commun 10(10):3232–3240
Article Google Scholar
Jiang L, Cai Z, Wang D, Zhang H (2012) Improving Tree augmented Naive Bayes for class probability estimation. Knowl Based Syst 26(2):239–245
Article Google Scholar
Pimentel MA, Clifton DA, Clifton L, Tarassenko L (2013) Probabilistic estimation of respiratory rate using Gaussian processes. In: 2013 35th annual international conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp 2902–2905
Duchi J, Wainwright MJ, Jordan MI (2013) Local privacy and minimax bounds: sharp rates for probability estimation. In: Advances in neural information processing systems, the 27th annual conference on neural information processing systems (NIPS 2013), Lake Tahoe, Nevada, pp 1529–1537
Azad R, Davami F (2014) A robust and adaptable method for face detection based on color probabilistic estimation technique. arXiv preprint arXiv:1407.6318
Friedman N (2003) Pcluster: probabilistic agglomerative clustering of gene expression profiles. Technical Report 2003-80, Hebrew University
Segal E, Koller D (2002) Probabilistic hierarchical clustering for biological data. In: Proceedings of the sixth annual international conference on computational biology, ACM, pp 273–280
Fan J, Xu Q, Liang Y (2012) A novel classification learning framework based on estimation of distribution algorithms. Int J Comput Sci Math 3(4):353–366
Article MathSciNet Google Scholar
Hauschild M, Pelikan M (2011) An introduction and survey of estimation of distribution algorithms. Swarm Evolut Comput 1(3):111–128
Article Google Scholar
Newman DJ, Hettich S, Blake CL, Merz CJ (1998) UCI repository of machine learning databases. University of California, Department of Information and Computer Science, Irvine, CA. http://www.ics.uci.edu/~mlearn/MLRepository.html
Cover TM, Thomas JA (1991) Elements of information theory. Wiley, New York
Book MATH Google Scholar
Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315:972–976
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

The author wishes to thank the anonymous reviewers and the JEO Assistant for their constructive comments and suggestions. The author thank the students of the laboratory Zheng Feng, Wenhua Liu, Yuhao Cai, and Tianyi Liang for participating in the experiment. This paper is supported by National Natural Science Foundation of China under Grant 61203305 and Shandong Provincial Natural Science Foundation of China under Grant ZR2012FM003.

Author information

Authors and Affiliations

State Key Laboratory of Mining Disaster Prevention and Control Co-founded by Shandong Province and the Ministry of Science and Technology, Shandong University of Science and Technology, Qingdao, 266590, China
Jiancong Fan
College of Information Science and Engineering, Shandong University of Science and Technology, Qingdao, 266590, China
Jiancong Fan
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China
Jiancong Fan

Authors

Jiancong Fan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiancong Fan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fan, J. OPE-HCA: an optimal probabilistic estimation approach for hierarchical clustering algorithm. Neural Comput & Applic 31, 2095–2105 (2019). https://doi.org/10.1007/s00521-015-1998-5

Download citation

Received: 26 February 2015
Accepted: 21 July 2015
Published: 05 August 2015
Issue Date: 01 July 2019
DOI: https://doi.org/10.1007/s00521-015-1998-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

OPE-HCA: an optimal probabilistic estimation approach for hierarchical clustering algorithm

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

OPE-HCA: an optimal probabilistic estimation approach for hierarchical clustering algorithm

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation