Abstract
Clustering analysis elicits the natural groupings of a dataset without requiring information about the sample class and has been widely used in various fields. Although numerous clustering algorithms have been proposed and proven to perform reasonably well, no consensus exists about which one performs best in real situations. In this study, we propose a nonparametric clustering method based on recursive binary partitioning that was implemented in a classification and regression tree model. The proposed clustering algorithm has two key advantages: (1) users do not have to specify any parameters before running it; (2) the final clustering result is represented by a set of if–then rules, thereby facilitating analysis of the clustering results. Experiments with the simulations and real datasets demonstrate the effectiveness and usefulness of the proposed algorithm.
Similar content being viewed by others
References
Agarwal S, Yadav S, Singh K (2012) k-means versus k-means++ clustering technique. In: Students conference on engineering and systems (SCES). IEEE, pp 1–6
Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) OPTICS: ordering points to identify the clustering structure. ACM SIGMOD Rec 28(2):49–60
Banfield JD, Raftery AE (1993) Model-Based Gaussian and Non-Gaussian Clustering. Biometrics 49(3):803–821
Baraldi A, Alpaydin E (2002) Constructive feedforward ART clustering networks. I. IEEE Trans Neural Netw 13(3):645–661
Belhassen S, Zaidi H (2010) A novel fuzzy C-means algorithm for unsupervised heterogeneous tumor quantification in PET. Med Phys 37:1309
Berkhin P (2006) A survey of clustering data mining techniques. In: Grouping multidimensional data. Springer, Berlin, pp 25–71
Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619
Dasgupta S (2008) The hardness of k-means clustering. Department of Computer Science and Engineering, University of California, San Diego
Davies ER (2004) Machine vision: theory, algorithms, practicalities. Elsevier, Amsterdam
Deepa M, Revathy P, Student PG (2012) Validation of Document Clustering based on Purity and Entropy measures. Int J Adv Res Comput Commun Eng 1(3):147–152
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. KDD 96:226–231
Frigui H, Krishnapuram R (1999) A robust competitive clustering algorithm with applications in computer vision. IEEE Trans Pattern Anal Mach Intell 21(5):450–465
Fung GA (2001) Comprehensive overview of basic clustering algorithms
Gordon AD (1996) Null models in cluster validation. In: From data to knowledge. Springer Berlin Heidelberg, pp 32–34
Hamerly GJ (2003) Learning structure and concepts in data through data clustering. Doctoral dissertation, University of California, San Diego
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Springer, New York
Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31(8):651–666
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Inc., USA
Jain AK, Murty MN, Flynn MN (1999) ACM computing surveys (CSUR). dl.acm.org
Jolliffe IT (2005) Principal component analysis. John Wiley & Sons, Ltd
Jordan F, Bach F (2004) Learning spectral clustering. Adv Neural Inf Process Syst 16:305–312
Kaufman L, Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis, vol 344. Wiley, New York
Kriegel HP, Kröger P, Sander J, Zimek A (2011) Density-based clustering. Wiley Interdiscip Rev Data Min Knowl Discov 1(3):231–240
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval, vol 1. Cambridge University Press, Cambridge
Milligan GW, Cooper MC (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika 50(2):159–179
Omran MG, Engelbrecht AP, Salman A (2009) Bare bones differential evolution. Eur J Oper Res 196(1):128–139
Ronen S, Shenkar O (1985) Clustering countries on attitudinal dimensions: a review and synthesis. Acad Manag Rev 10(3):435–445
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Roy S, Bhattacharyya DK (2005) An approach to find embedded clusters using density based techniques. In: Distributed computing and internet technology. Springer, Berlin, pp 523–535
Tan Y, Hu RF, Yin GF (2008) DBSCAN with multi-thresholds. J Comput Appl 28:745–748
Turi RH (2001) Clustering-based colour image segmentation. Ph.D. thesis, Monash University
Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678
Acknowledgments
The authors would like to thank the editor and the reviewers for their useful comments and suggestions, which were greatly helpful in improving the quality of the paper. This research was supported by Brain Korea PLUS, Basic Science Research Program through the National Research Foundation of Korea funded by the Ministry of Science, ICT and Future Planning (2013007724), and the Ministry of Knowledge Economy in Korea under the IT R&D Infrastructure Program supervised by the National IT Industry Promotion Agency (NIPA) [NIPA-2011-(B1110-1101-0002)].
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kang, J.H., Park, C.H. & Kim, S.B. Recursive partitioning clustering tree algorithm. Pattern Anal Applic 19, 355–367 (2016). https://doi.org/10.1007/s10044-014-0399-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-014-0399-1