Skip to main content

Interpretable Clustering via Soft Clustering Trees

  • Conference paper
  • First Online:
Integration of Constraint Programming, Artificial Intelligence, and Operations Research (CPAIOR 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13884))

  • 940 Accesses

Abstract

Clustering is a popular unsupervised learning task that consists of finding a partition of the data points that groups similar points together. Despite its popularity, most state-of-the-art algorithms do not provide any explanation of the obtained partition, making it hard to interpret. In recent years, several works have considered using decision trees to construct clusters that are inherently interpretable. However, these approaches do not scale to large datasets, do not account for uncertainty in results, and do not support advanced clustering objectives such as spectral clustering. In this work, we present soft clustering trees, an interpretable clustering approach that is based on soft decision trees that provide probabilistic cluster membership. We model soft clustering trees as continuous optimization problem that is amenable to efficient optimization techniques. Our approach is designed to output highly sparse decision trees to increase interpretability and to support tree-based spectral clustering. Extensive experiments show that our approach can produce clustering trees of significantly higher quality compared to the state-of-the-art and scale to large datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    If \(A_L(t) = \varnothing \) or \(A_R(t) = \varnothing \) then the corresponding products in Eq. (2) are equal to 1.0.

  2. 2.

    Following [27], we keep \(\varGamma _t\) as variables rather than hyper-parameters.

  3. 3.

    For this purpose, we arbitrarily select \(10^{-4}\) as the threshold for zeroing coefficients in the resultant clustering tree.

References

  1. Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003)

    Article  MATH  Google Scholar 

  2. Bertsimas, D., Orfanoudaki, A., Wiberg, H.: Interpretable clustering: an optimization approach. Mach. Learn. 110(1), 89–138 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bezdek, J.C., Ehrlich, R., Full, W.: FCM: the fuzzy C-means clustering algorithm. Comput. Geosci. 10(2–3), 191–203 (1984)

    Article  Google Scholar 

  4. Blanquero, R., Carrizosa, E., Molero-Río, C., Morales, D.R.: Sparsity in optimal randomized classification trees. Eur. J. Oper. Res. 284(1), 255–272 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  5. Blanquero, R., Carrizosa, E., Molero-Río, C., Morales, D.R.: Optimal randomized classification trees. Comput. Oper. Res. 132, 105281 (2021)

    Google Scholar 

  6. Carrizosa, E., Kurishchenko, K., Marín, A., Morales, D.R.: Interpreting clusters via prototype optimization. Omega 107, 102543 (2022)

    Google Scholar 

  7. Carrizosa, E., Molero-Río, C., Romero Morales, D.: Mathematical optimization in classification and regression trees. TOP 29(1), 5–33 (2021). https://doi.org/10.1007/s11750-021-00594-1

    Article  MathSciNet  MATH  Google Scholar 

  8. Chen, J., et al.: Interpretable clustering via discriminative rectangle mixture model. In: 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 823–828. IEEE (2016)

    Google Scholar 

  9. Chhabra, A., Masalkovaitė, K., Mohapatra, P.: An overview of fairness in clustering. IEEE Access 9, 130698–130720 (2021)

    Google Scholar 

  10. Correia, G.M., Niculae, V., Martins, A.F.: Adaptively sparse transformers. In: Proceedings of the EMNLP-IJCNLP (2019, to appear)

    Google Scholar 

  11. Dao, T.B.H., Vrain, C., Duong, K.C., Davidson, I.: A framework for actionable clustering using constraint programming. In: Proceedings of the Twenty-Second European Conference on Artificial Intelligence, pp. 453–461 (2016)

    Google Scholar 

  12. Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml

  13. Dunning, I., Huchette, J., Lubin, M.: JuMP: a modeling language for mathematical optimization. SIAM Rev. 59(2), 295–320 (2017). https://doi.org/10.1137/15M1020575

    Article  MathSciNet  MATH  Google Scholar 

  14. Fraiman, R., Ghattas, B., Svarc, M.: Interpretable clustering using unsupervised binary trees. Adv. Data Anal. Classif. 7(2), 125–145 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  15. Frosst, N., Hinton, G.: Distilling a neural network into a soft decision tree. arXiv preprint arXiv:1711.09784 (2017)

  16. Frost, N., Moshkovitz, M., Rashtchian, C.: ExKMC: expanding explainable \( k \)-means clustering. arXiv preprint arXiv:2006.02399 (2020)

  17. Gabidolla, M., Carreira-Perpiñán, M.Á.: Optimal interpretable clustering using oblique decision trees. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 400–410 (2022)

    Google Scholar 

  18. Gamlath, B., Jia, X., Polak, A., Svensson, O.: Nearly-tight and oblivious algorithms for explainable clustering. Adv. Neural. Inf. Process. Syst. 34, 28929–28939 (2021)

    Google Scholar 

  19. Hazimeh, H., Ponomareva, N., Mol, P., Tan, Z., Mazumder, R.: The tree ensemble layer: Differentiability meets conditional computation. In: International Conference on Machine Learning, pp. 4138–4148. PMLR (2020)

    Google Scholar 

  20. Hinton, G., Srivastava, N., Swersky, K.: Neural networks for machine learning lecture 6a overview of mini-batch gradient descent. Cited on 14(8), 2 (2012)

    Google Scholar 

  21. Hou, Q., Zhang, N., Kirschen, D.S., Du, E., Cheng, Y., Kang, C.: Sparse oblique decision tree for power system security rules extraction and embedding. IEEE Trans. Power Syst. 36(2), 1605–1615 (2020)

    Article  Google Scholar 

  22. Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)

    Article  MATH  Google Scholar 

  23. Irsoy, O., Yildiz, O.T., Alpaydin, E.: Soft decision trees. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), pp. 1819–1822. IEEE (2012)

    Google Scholar 

  24. Kauffmann, J., Esders, M., Ruff, L., Montavon, G., Samek, W., Müller, K.R.: From clustering to cluster explanations via neural networks. IEEE Trans. Neural Netw. Learn. Syst. (2022)

    Google Scholar 

  25. Lawless, C., Kalagnanam, J., Nguyen, L.M., Phan, D., Reddy, C.: Interpretable clustering via multi-polytope machines. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 7309–7316 (2022)

    Google Scholar 

  26. Liu, B., Xia, Y., Yu, P.S.: Clustering via decision tree construction. In: Chu, W., Young Lin, T. (eds.) Foundations and Advances in Data Mining. Studies in Fuzziness and Soft Computing, vol. 180, pp. 97–124. Springer, Heidelberg (2005). https://doi.org/10.1007/11362197_5

    Chapter  Google Scholar 

  27. Luo, H., Cheng, F., Yu, H., Yi, Y.: SDTR: soft decision tree regressor for tabular data. IEEE Access 9, 55999–56011 (2021)

    Article  Google Scholar 

  28. Makarychev, K., Shan, L.: Near-optimal algorithms for explainable k-medians and k-means. In: International Conference on Machine Learning, pp. 7358–7367. PMLR (2021)

    Google Scholar 

  29. Martins, A., Astudillo, R.: From softmax to sparsemax: a sparse model of attention and multi-label classification. In: International Conference on Machine Learning, pp. 1614–1623. PMLR (2016)

    Google Scholar 

  30. Moshkovitz, M., Dasgupta, S., Rashtchian, C., Frost, N.: Explainable k-means and k-medians clustering. In: International Conference on Machine Learning, pp. 7055–7065. PMLR (2020)

    Google Scholar 

  31. Paszke, A., et al.: Pytorch: An imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32, pp. 8024–8035. Curran Associates, Inc. (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf

  32. Pelleg, D., Moore, A.: Mixtures of rectangles: interpretable soft clustering. In: ICML, vol. 2001, pp. 401–408 (2001)

    Google Scholar 

  33. Peters, B., Niculae, V., Martins, A.F.: Sparse sequence-to-sequence models. arXiv preprint arXiv:1905.05702 (2019)

  34. Popov, S., Morozov, S., Babenko, A.: Neural oblivious decision ensembles for deep learning on tabular data. arXiv preprint arXiv:1909.06312 (2019)

  35. Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)

    Article  Google Scholar 

  36. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)

    Article  MATH  Google Scholar 

  37. Schölkopf, B., Smola, A., Müller, K.-R.: Kernel principal component analysis. In: Gerstner, W., Germond, A., Hasler, M., Nicoud, J.-D. (eds.) ICANN 1997. LNCS, vol. 1327, pp. 583–588. Springer, Heidelberg (1997). https://doi.org/10.1007/BFb0020217

    Chapter  Google Scholar 

  38. Sculley, D.: Web-scale k-means clustering. In: Proceedings of the 19th International Conference on World Wide Web, pp. 1177–1178 (2010)

    Google Scholar 

  39. Smith, L.N.: Cyclical learning rates for training neural networks. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 464–472. IEEE (2017)

    Google Scholar 

  40. Strehl, A., Ghosh, J.: Cluster ensembles–a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)

    MathSciNet  MATH  Google Scholar 

  41. Tanno, R., Arulkumaran, K., Alexander, D., Criminisi, A., Nori, A.: Adaptive neural trees. In: International Conference on Machine Learning, pp. 6166–6175. PMLR (2019)

    Google Scholar 

  42. Tavallali, P., Tavallali, P., Singhal, M.: K-means tree: an optimal clustering tree for unsupervised learning. J. Supercomput. 77(5), 5239–5266 (2021)

    Article  Google Scholar 

  43. Ultsch, A., Lötsch, J.: The fundamental clustering and projection suite (FCPS): a dataset collection to test the performance of clustering and data projection algorithms. Data 5(1), 13 (2020)

    Article  Google Scholar 

  44. Von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)

    Article  MathSciNet  Google Scholar 

  45. Wächter, A., Biegler, L.T.: On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Math. Program. 106(1), 25–57 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  46. Xie, J., Girshick, R., Farhadi, A.: Unsupervised deep embedding for clustering analysis. In: International Conference on Machine Learning, pp. 478–487. PMLR (2016)

    Google Scholar 

  47. Yang, Y., Morillo, I.G., Hospedales, T.M.: Deep neural decision trees. In: ICML Workshop on Human Interpretability in Machine Learning (WHI) (2018)

    Google Scholar 

  48. Yoo, J., Sael, L.: EDiT: interpreting ensemble models via compact soft decision trees. In: 2019 IEEE International Conference on Data Mining (ICDM), pp. 1438–1443. IEEE (2019)

    Google Scholar 

  49. Zantedeschi, V., Kusner, M., Niculae, V.: Learning binary decision trees by argmin differentiation. In: International Conference on Machine Learning, pp. 12298–12309. PMLR (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eldan Cohen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cohen, E. (2023). Interpretable Clustering via Soft Clustering Trees. In: Cire, A.A. (eds) Integration of Constraint Programming, Artificial Intelligence, and Operations Research. CPAIOR 2023. Lecture Notes in Computer Science, vol 13884. Springer, Cham. https://doi.org/10.1007/978-3-031-33271-5_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-33271-5_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-33270-8

  • Online ISBN: 978-3-031-33271-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics