K-means tree: an optimal clustering tree for unsupervised learning

Tavallali, Pooya; Tavallali, Peyman; Singhal, Mukesh

doi:10.1007/s11227-020-03436-2

K-means tree: an optimal clustering tree for unsupervised learning

Published: 06 October 2020

Volume 77, pages 5239–5266, (2021)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Pooya Tavallali¹,
Peyman Tavallali² &
Mukesh Singhal¹

1811 Accesses
21 Citations
Explore all metrics

A Correction to this article was published on 16 November 2023

This article has been updated

Abstract

Tree construction is one of the popular methods for tackling any supervised task in machine learning. However, there has been little effort in applying trees for unsupervised tasks. The traditional unsupervised trees are based on recursively partitioning the space such that the achieved partitions contain similar samples. Sense of similarity depends on the models and applications. This paper tackles the issue of learning optimal clustering oblique trees for the first time and proposes a linear time algorithm for training it. Optimizing performance of infrastructures and energy consumption in the field of Internet of things can be mentioned as applications of tree and clustering, respectively. The motivation of unsupervised tree models is to preserve the data manifold, while keeping the query, while keeping the query time fast. Popular unsupervised models consist of k-d trees, random projection (RP trees), principal component analysis trees (PCA trees) and clustering trees. However, all existing methods for unsupervised tree are sub-optimal. Additionally, existing clustering trees are limited to axis-aligned trees. Further, some of the mentioned methods suffer from curse of dimensionality such as k-d trees. Despite the mentioned challenges, trees are fast in query time. On the other hand, a non-hierarchical clustering such as k-means has both: It performs well in high-dimensional problems and is locally optimal. Its learning algorithm is efficient. However, k-means clustering is not fast in query time. To address the mentioned issues, this paper proposes a novel k-means tree, a tree that outputs the centroids of clusters. The advantages of such tree are being fast in query time and also learning as good cluster centroids as k-means. As a result, problem of learning such trees is to learn both centroids and the tree parameters optimally and jointly. In this paper, this problem is first cast as a constrained minimization problem and then solved using quadratic penalty method. The method consists of learning clusters from k-means and gradually adapting centroids to the outputs of an optimal oblique tree. The alternating optimization is used, and alternation steps consist of weighted k-means clustering and tree optimization. Additionally, the training complexity of proposed algorithm is efficient. Proposed algorithm is optimal in the sense of learned clusters and tree jointly. Trees used in the k-means tree are oblique, and as per our knowledge, this is the first time that oblique trees are applied to the task of clustering. As a side product of the proposed method, sample reduction is explored and shown its merits. It is shown that computational complexity of training KMT (K-means tree) as a sample reduction method is faster than training K-means as a sample reduction. The training complexity of KMT sample reduction algorithm is logarithmic over the size of reduced train set, while training complexity of K-means is linear over the size of reduced dataset. Finally, proposed method is compared to other tree-based clustering algorithms and its superiority in terms of reconstruction error is shown. Additionally, its query complexity is compared with k-means.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Multi objective-based incremental clustering by fast search technique for dynamically creating and updating clusters in large data

Article 20 January 2022

Sivadi Balakrishna

OPE-HCA: an optimal probabilistic estimation approach for hierarchical clustering algorithm

Article 05 August 2015

Jiancong Fan

Robust Adaptive SOMs Challenges in a Varied Datasets Analytics

Change history

16 November 2023
A Correction to this paper has been published: https://doi.org/10.1007/s11227-023-05723-0

References

Bennett KP (1992) Decision tree construction via linear programming, in: Proc. 4th Midwest Artificial Intelligence and Cognitive Sience Society Conference, pp. 97–101
Bennett KP (1994) Global tree optimization: a non-greedy decision tree algorithm. Comput Sci Stat 26:156–160
Google Scholar
Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18:509–517
Article MATH Google Scholar
Bertsimas D, Dunn J (2017) Optimal classification trees. Mach Learn 106:1039–1082
Article MathSciNet MATH Google Scholar
Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin
MATH Google Scholar
Borchani H, Varando G, Bielza C, Larrañaga P (2015) A survey on multi-output regression. Wiley Interdiscip Rev Data Min Knowl Discov 5:216–233
Article Google Scholar
Breiman L (2001) Random forests. Mach Learn 45:5–32
Article MATH Google Scholar
Breiman LJ, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmont
MATH Google Scholar
Chamam A, Pierre S (2010) A distributed energy-efficient clustering protocol for wireless sensor networks. Comput Electr Eng 36:303–312
Article MATH Google Scholar
Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol (TIST) 2:27
Google Scholar
Cheng Y (1995) Mean shift, mode seeking, and clustering. IEEE Trans Pattern Anal Mach Intell 17:790–799
Article Google Scholar
Coates A, Ng AY (2012) Learning feature representations with k-means. In: Neural networks: tricks of the trade. Springer, pp 561–580
Criminisi A, Shotton J (2013) Decision forests for computer vision and medical image analysis. In: Advances in computer vision and pattern recognition. Springer
Dasgupta S, Freund Y (2008) Random projection trees and low dimensional manifolds. Proceedings of the fortieth annual ACM symposium on Theory of computing. ACM, 537–546
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodol) 39:1–38
MathSciNet MATH Google Scholar
Ester M, Kriegel HP, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Kdd, pp 226–231
Freund Y, Dasgupta S, Kabra M, Verma N (2008) Learning the structure of manifolds using random projections. In: Advances in neural information processing systems, pp 473–480
Freund Y, Schapire R, Abe N (1999) A short introduction to boosting. J Jpn Soc Artif Intell 14:1612
Google Scholar
Gifford H (2014) Hierarchical k-means for unsupervised learning
Hastie T, Tibshirani R, Friedman JH (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York
Book MATH Google Scholar
Heinzelman WR, Chandrakasan A, Balakrishnan H (2000) Energy-efficient communication protocol for wireless microsensor networks. In: Proceedings of the 33rd Annual Hawaii International Conference on System Sciences. IEEE, p 10
Hyafil L, Rivest RL (1975) Constructing optimal binary decision trees is NP-complete. Inf Process Lett 5:15–17
Article MathSciNet MATH Google Scholar
Ikonomovska E, Gama J, Džeroski S (2011) Incremental multi-target model trees for data streams. In: Proceedings of the 2011 ACM symposium on applied computing. ACM, pp 988–993
Jordan MI, Jacobs RA (1994) Hierarchical mixtures of experts and the EM algorithm. Neural Comput 6:181–214
Article Google Scholar
Kristan M, Skocaj D, Leonardis A (2008) Incremental learning with Gaussian mixture models. In: Computer Vision Winter Workshop, pp 25–32
Lee DT, Wong C (1977) Worst-case analysis for region and partial region searches in multidimensional binary search trees and balanced quad trees. Acta Inform 9:23–29
Article MathSciNet MATH Google Scholar
Lee YH, Kim HJ, Roh Bh, Yoo SW, Oh Y (2005) Tree-based classification algorithm for heterogeneous unique item id schemes. In: International Conference on Embedded and Ubiquitous Computing. Springer, pp 1078–1087
Levatić J, Ceci M, Kocev D, Džeroski S (2014) Semi-supervised learning for multi-target regression. In: International workshop on new frontiers in mining complex patterns. Springer, pp 3–18
Liu L, Wong WH (2014) Multivariate density estimation based on adaptive partitioning: convergence rate, variable selection and spatial adaptation. Department of Statistics, Stanford University
Loh WY, Shih YS (1997) Split selection methods for classification trees. Stat Sin 7:815–840
MathSciNet MATH Google Scholar
McCartin-Lim M, McGregor A, Wang R (2012) Approximate principal direction trees. arXiv preprint: arXiv:1206.4668
Müller P, Quintana FA (2004) Nonparametric Bayesian data analysis. Stat Sci 19:95–110
Article MathSciNet MATH Google Scholar
Murthy SK, Kasif S, Salzberg S (1994) A system for induction of oblique decision trees. J Artif Intell Res 2:1–32
Article MATH Google Scholar
Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), vol 2, pp 2161–2168
Nocedal J, Wright S (2006) Numerical optimization. Springer, Berlin
MATH Google Scholar
Norouzi M, Collins M, Johnson MA, Fleet DJ, Kohli P (2015a) Efficient non-greedy optimization of decision trees. In: Advances in neural information processing systems, pp 1729–1737
Norouzi M, Collins M, Johnson MA, Fleet DJ, Kohli P (2015b) Efficient non-greedy optimization of decision trees. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in neural information processing systems (NIPS). MIT Press, Cambridge, pp 1720–1728
Google Scholar
Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106
Article Google Scholar
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Francisco
Google Scholar
Quinlan JR (2014) C4. 5: programs for machine learning. Elsevier, Amsterdam
Google Scholar
Ram P, Gray AG (2011) Density estimation trees. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data mining. ACM, pp 627–635
Rokach L, Maimon O (2005a) Clustering methods. In: Data mining and knowledge discovery handbook. Springer, pp 321–352
Rokach L, Maimon O (2005b) Top-down induction of decision trees classifiers—a survey. IEEE Trans Syst Man Cybern Part C (Appl Rev) 35:476–487
Article Google Scholar
Schlimmer JC, Fisher D (1986) A case study of incremental concept induction. In: AAAI, pp 496–501
Silverman BW (2018) Density estimation for statistics and data analysis. Routledge, London
Book Google Scholar
Tavallali P, Tavallali P, Singhal M (2019) Optimization of hierarchical regression model with application to optimizing multi-response regression k-ary trees. In: Thirty-third AAAI Conference on Artificial Intelligence
Tsai CW, Lai CF, Chiang MC, Yang LT (2014) Data mining for internet of things: a survey. IEEE Commun Surv Tutor 16:77–97
Article Google Scholar
Uckelmann D, Harrison M, Michahelles F (2011) An architectural approach towards the future internet of things. In: Architecting the internet of things. Springer, pp 1–24
Utgoff PE (1989) Incremental induction of decision trees. Mach Learn 4:161–186
Article Google Scholar
Verma N, Kpotufe S, Dasgupta S (2009) Which spatial partition trees are adaptive to intrinsic dimension?. In: Proceedings of the Twenty-fifth Conference on Uncertainty in Artificial Intelligence. AUAI Press, pp 565–574
Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp 226–235
Wasserman L (2005) All of nonparametric statistics. Springer series in statistics. Springer, Berlin
Google Scholar
Yang K, Wong WH (2014) Density estimation via adaptive partition and discrepancy control. arXiv preprint arXiv:1404.1425

Download references

Acknowledgements

Peyman Tavallali’s research contribution to this paper was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration.

Author information

Authors and Affiliations

Electrical Engineering and Computer Science, University of California, Merced, Merced, USA
Pooya Tavallali & Mukesh Singhal
Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA, 91109, USA
Peyman Tavallali

Authors

Pooya Tavallali
View author publications
You can also search for this author in PubMed Google Scholar
Peyman Tavallali
View author publications
You can also search for this author in PubMed Google Scholar
Mukesh Singhal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mukesh Singhal.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 Performance on synthetic data

In order to further visually understand different models, K-means centroid of two synthetic datasets is investigated. Figure 6 shows a rotated checkerboard dataset. This dataset is not an easy one for trees due to the kind of symmetric shape of the data. Red crosses are the learned centroids by different trees, green crosses are centroids learned by K-means, and blue dots are the train set samples. The K-means can easily recognize and cluster different modes of the data, therefore, learning globally optimal centroids for the problem. The global optimality of centroids is an empirical observation because as a synthetic data, it was originally designed with 16 modes and a K-means with 16 centroids has exactly learned different modes of the data. Whereas other unsupervised trees have not learned proper clustering of the data, the proposed method (K-means tree/KMT) has learned the exact centroids from the K-means. Hierarchical K-means clustering was also able to learn same or similar centroids to K-means, hence making it comparative to KMT. However, hierarchical K-means clustering is sub-optimal for the problem. This can cause issues in real datasets. For different trees, first, all of them are trained up to depth of 4, and then, a grouped K-means was run over their partitions of data.

In Fig. 6, it can be observed that for small values of K, various models could learn centroids close to K-means. However, conventional models’ performance drops for higher values of K. Among all models, only KMT and hierarchical K-means could learn the centroids properly for all values of K.

Figure 7 shows another more complicated synthetic dataset. In this dataset, KMT performed better than the other models. The purpose of this dataset is to find clusters along each wing of the dataset. However, because of its symmetric shape, finding proper centroids is difficult in practice. None of the conventional tree models (including HK model) could break the symmetry to learn proper centroids. It is while KMT could learn centroids similar to K-means.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Tavallali, P., Tavallali, P. & Singhal, M. K-means tree: an optimal clustering tree for unsupervised learning. J Supercomput 77, 5239–5266 (2021). https://doi.org/10.1007/s11227-020-03436-2

Download citation

Accepted: 14 September 2020
Published: 06 October 2020
Issue Date: May 2021
DOI: https://doi.org/10.1007/s11227-020-03436-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

K-means tree: an optimal clustering tree for unsupervised learning

Abstract

Access this article

Similar content being viewed by others

Multi objective-based incremental clustering by fast search technique for dynamically creating and updating clusters in large data

OPE-HCA: an optimal probabilistic estimation approach for hierarchical clustering algorithm

Robust Adaptive SOMs Challenges in a Varied Datasets Analytics

Change history

16 November 2023

References

Acknowledgements