Abstract
Clustering is a popular unsupervised learning task that consists of finding a partition of the data points that groups similar points together. Despite its popularity, most state-of-the-art algorithms do not provide any explanation of the obtained partition, making it hard to interpret. In recent years, several works have considered using decision trees to construct clusters that are inherently interpretable. However, these approaches do not scale to large datasets, do not account for uncertainty in results, and do not support advanced clustering objectives such as spectral clustering. In this work, we present soft clustering trees, an interpretable clustering approach that is based on soft decision trees that provide probabilistic cluster membership. We model soft clustering trees as continuous optimization problem that is amenable to efficient optimization techniques. Our approach is designed to output highly sparse decision trees to increase interpretability and to support tree-based spectral clustering. Extensive experiments show that our approach can produce clustering trees of significantly higher quality compared to the state-of-the-art and scale to large datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
If \(A_L(t) = \varnothing \) or \(A_R(t) = \varnothing \) then the corresponding products in Eq. (2) are equal to 1.0.
- 2.
Following [27], we keep \(\varGamma _t\) as variables rather than hyper-parameters.
- 3.
For this purpose, we arbitrarily select \(10^{-4}\) as the threshold for zeroing coefficients in the resultant clustering tree.
References
Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003)
Bertsimas, D., Orfanoudaki, A., Wiberg, H.: Interpretable clustering: an optimization approach. Mach. Learn. 110(1), 89–138 (2021)
Bezdek, J.C., Ehrlich, R., Full, W.: FCM: the fuzzy C-means clustering algorithm. Comput. Geosci. 10(2–3), 191–203 (1984)
Blanquero, R., Carrizosa, E., Molero-Río, C., Morales, D.R.: Sparsity in optimal randomized classification trees. Eur. J. Oper. Res. 284(1), 255–272 (2020)
Blanquero, R., Carrizosa, E., Molero-Río, C., Morales, D.R.: Optimal randomized classification trees. Comput. Oper. Res. 132, 105281 (2021)
Carrizosa, E., Kurishchenko, K., Marín, A., Morales, D.R.: Interpreting clusters via prototype optimization. Omega 107, 102543 (2022)
Carrizosa, E., Molero-Río, C., Romero Morales, D.: Mathematical optimization in classification and regression trees. TOP 29(1), 5–33 (2021). https://doi.org/10.1007/s11750-021-00594-1
Chen, J., et al.: Interpretable clustering via discriminative rectangle mixture model. In: 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 823–828. IEEE (2016)
Chhabra, A., Masalkovaitė, K., Mohapatra, P.: An overview of fairness in clustering. IEEE Access 9, 130698–130720 (2021)
Correia, G.M., Niculae, V., Martins, A.F.: Adaptively sparse transformers. In: Proceedings of the EMNLP-IJCNLP (2019, to appear)
Dao, T.B.H., Vrain, C., Duong, K.C., Davidson, I.: A framework for actionable clustering using constraint programming. In: Proceedings of the Twenty-Second European Conference on Artificial Intelligence, pp. 453–461 (2016)
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Dunning, I., Huchette, J., Lubin, M.: JuMP: a modeling language for mathematical optimization. SIAM Rev. 59(2), 295–320 (2017). https://doi.org/10.1137/15M1020575
Fraiman, R., Ghattas, B., Svarc, M.: Interpretable clustering using unsupervised binary trees. Adv. Data Anal. Classif. 7(2), 125–145 (2013)
Frosst, N., Hinton, G.: Distilling a neural network into a soft decision tree. arXiv preprint arXiv:1711.09784 (2017)
Frost, N., Moshkovitz, M., Rashtchian, C.: ExKMC: expanding explainable \( k \)-means clustering. arXiv preprint arXiv:2006.02399 (2020)
Gabidolla, M., Carreira-Perpiñán, M.Á.: Optimal interpretable clustering using oblique decision trees. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 400–410 (2022)
Gamlath, B., Jia, X., Polak, A., Svensson, O.: Nearly-tight and oblivious algorithms for explainable clustering. Adv. Neural. Inf. Process. Syst. 34, 28929–28939 (2021)
Hazimeh, H., Ponomareva, N., Mol, P., Tan, Z., Mazumder, R.: The tree ensemble layer: Differentiability meets conditional computation. In: International Conference on Machine Learning, pp. 4138–4148. PMLR (2020)
Hinton, G., Srivastava, N., Swersky, K.: Neural networks for machine learning lecture 6a overview of mini-batch gradient descent. Cited on 14(8), 2 (2012)
Hou, Q., Zhang, N., Kirschen, D.S., Du, E., Cheng, Y., Kang, C.: Sparse oblique decision tree for power system security rules extraction and embedding. IEEE Trans. Power Syst. 36(2), 1605–1615 (2020)
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)
Irsoy, O., Yildiz, O.T., Alpaydin, E.: Soft decision trees. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), pp. 1819–1822. IEEE (2012)
Kauffmann, J., Esders, M., Ruff, L., Montavon, G., Samek, W., Müller, K.R.: From clustering to cluster explanations via neural networks. IEEE Trans. Neural Netw. Learn. Syst. (2022)
Lawless, C., Kalagnanam, J., Nguyen, L.M., Phan, D., Reddy, C.: Interpretable clustering via multi-polytope machines. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 7309–7316 (2022)
Liu, B., Xia, Y., Yu, P.S.: Clustering via decision tree construction. In: Chu, W., Young Lin, T. (eds.) Foundations and Advances in Data Mining. Studies in Fuzziness and Soft Computing, vol. 180, pp. 97–124. Springer, Heidelberg (2005). https://doi.org/10.1007/11362197_5
Luo, H., Cheng, F., Yu, H., Yi, Y.: SDTR: soft decision tree regressor for tabular data. IEEE Access 9, 55999–56011 (2021)
Makarychev, K., Shan, L.: Near-optimal algorithms for explainable k-medians and k-means. In: International Conference on Machine Learning, pp. 7358–7367. PMLR (2021)
Martins, A., Astudillo, R.: From softmax to sparsemax: a sparse model of attention and multi-label classification. In: International Conference on Machine Learning, pp. 1614–1623. PMLR (2016)
Moshkovitz, M., Dasgupta, S., Rashtchian, C., Frost, N.: Explainable k-means and k-medians clustering. In: International Conference on Machine Learning, pp. 7055–7065. PMLR (2020)
Paszke, A., et al.: Pytorch: An imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32, pp. 8024–8035. Curran Associates, Inc. (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
Pelleg, D., Moore, A.: Mixtures of rectangles: interpretable soft clustering. In: ICML, vol. 2001, pp. 401–408 (2001)
Peters, B., Niculae, V., Martins, A.F.: Sparse sequence-to-sequence models. arXiv preprint arXiv:1905.05702 (2019)
Popov, S., Morozov, S., Babenko, A.: Neural oblivious decision ensembles for deep learning on tabular data. arXiv preprint arXiv:1909.06312 (2019)
Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Schölkopf, B., Smola, A., Müller, K.-R.: Kernel principal component analysis. In: Gerstner, W., Germond, A., Hasler, M., Nicoud, J.-D. (eds.) ICANN 1997. LNCS, vol. 1327, pp. 583–588. Springer, Heidelberg (1997). https://doi.org/10.1007/BFb0020217
Sculley, D.: Web-scale k-means clustering. In: Proceedings of the 19th International Conference on World Wide Web, pp. 1177–1178 (2010)
Smith, L.N.: Cyclical learning rates for training neural networks. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 464–472. IEEE (2017)
Strehl, A., Ghosh, J.: Cluster ensembles–a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)
Tanno, R., Arulkumaran, K., Alexander, D., Criminisi, A., Nori, A.: Adaptive neural trees. In: International Conference on Machine Learning, pp. 6166–6175. PMLR (2019)
Tavallali, P., Tavallali, P., Singhal, M.: K-means tree: an optimal clustering tree for unsupervised learning. J. Supercomput. 77(5), 5239–5266 (2021)
Ultsch, A., Lötsch, J.: The fundamental clustering and projection suite (FCPS): a dataset collection to test the performance of clustering and data projection algorithms. Data 5(1), 13 (2020)
Von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)
Wächter, A., Biegler, L.T.: On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Math. Program. 106(1), 25–57 (2006)
Xie, J., Girshick, R., Farhadi, A.: Unsupervised deep embedding for clustering analysis. In: International Conference on Machine Learning, pp. 478–487. PMLR (2016)
Yang, Y., Morillo, I.G., Hospedales, T.M.: Deep neural decision trees. In: ICML Workshop on Human Interpretability in Machine Learning (WHI) (2018)
Yoo, J., Sael, L.: EDiT: interpreting ensemble models via compact soft decision trees. In: 2019 IEEE International Conference on Data Mining (ICDM), pp. 1438–1443. IEEE (2019)
Zantedeschi, V., Kusner, M., Niculae, V.: Learning binary decision trees by argmin differentiation. In: International Conference on Machine Learning, pp. 12298–12309. PMLR (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Cohen, E. (2023). Interpretable Clustering via Soft Clustering Trees. In: Cire, A.A. (eds) Integration of Constraint Programming, Artificial Intelligence, and Operations Research. CPAIOR 2023. Lecture Notes in Computer Science, vol 13884. Springer, Cham. https://doi.org/10.1007/978-3-031-33271-5_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-33271-5_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33270-8
Online ISBN: 978-3-031-33271-5
eBook Packages: Computer ScienceComputer Science (R0)