Abstract
Clustering is one of the unsupervised learning methods for grouping similar data samples. While clustering has been used in a wide range, traditional clustering methods cannot provide clear interpretations of the resulting clusters. This has led to an increasing interest in interpretable clustering methods, which are mainly based on decision trees. However, the existing interpretable clustering methods are typically designed for tabular data and struggle when applied to time series data due to its complex nature. In this paper, we propose a novel interpretable time-series clustering method with decision trees. To address the interpretability challenges in time-series data, our method employs two separate feature sets, intuitive features for decision tree branching and original time-series observed values for evaluating a given clustering metric. This dual use enables us to construct interpretable clustering trees for time series data. In addition, to handle datasets with a large number of samples, we propose a new metric for evaluating clustering quality, called the surrogate silhouette coefficient, and present a heuristic algorithm for constructing a decision tree based on the metric. We show that the computational complexity for evaluating the proposed metric is much less than the silhouette coefficient, which is commonly used in decision tree-based clustering. Our numerical experiments demonstrated that our method constructed decision trees faster than the existing methods based on the silhouette coefficient while maintaining clustering quality. In addition, we applied our method to a time-series data on an e-commerce platform and succeeded in constructing an insightful decision tree.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
All code and scripts for our method are available at https://github.com/tokyotech-nakatalab/interpretable_time-series_clustering.
References
Basak, J., Krishnapuram, R.: Interpretable hierarchical clustering by constructing an unsupervised decision tree. IEEE Trans. Knowl. Data Eng. 17, 121–132 (2005)
Bertsimas, D., Orfanoudaki, A., Wiberg, H.: Interpretable clustering: an optimization approach. Mach. Learn. 110, 89–138 (2021)
Blockeel, H., De Raedt, L., Ramon, J.: Top-down induction of clustering trees. In: Proceedings of the 15th International Conference on Machine Learning, pp. 55–63 (1998)
Caron, M., Bojanowski, P., Joulin, A., Douze, M.: Deep clustering for unsupervised learning of visual features. In: Proceedings of the European Conference on Computer Vision, pp. 132–149 (2018)
Chang, E., Shen, X., Yeh, H-S., Demberg, V.: On training instance selection for few-shot neural text generation. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 9707–9718 (2021)
Coleman, G.B., Andrews, H.C.: Image segmentation by clustering. Proc. IEEE 67, 773–785 (1979)
Dasgupta, S., Frost, N., Moshkovitz, M., Rashtchian, C.: Explainable k-means and k-medians clustering. In: Proceedings of the 37th International Conference on Machine Learning, pp. 7055–7065 (2020)
De Raedt, L., Blockeel, H.: Using logical decision trees for clustering. In: International Conference on Inductive Logic Programming, pp. 133–140 (1997)
Fraiman, R., Ghattas, B., Svarc, M.: Interpretable clustering using unsupervised binary trees. Adv. Data Anal. Classif. 7, 125–145 (2013)
Ghattas, B., Michel, P., Boyer, L.: Clustering nominal data using unsupervised binary decision trees: comparisons with the state of the art methods. Pattern Recogn. 67, 177–185 (2017)
Joe, J., Ward, J., Jr.: Hierarchical grouping to optimize an objective function. Am. Stat. Assoc. 58, 236–244 (1963)
Kim, K., Ahn, H.: A recommender system using GA k-means clustering in an online shopping market. Expert Syst. Appl. 34, 1200–1209 (2008)
Liu, B., Yiyuan, X., Philip, S, Y.: Clustering through decision tree construction. In: Proceedings of the Ninth Conference on Information and Knowledge Management, pp. 20–29 (2000)
Lux, T., Marchesi, M.: Volatility clustering in financial markets: a micro-simulation of interacting agents. Int. J. Theor. Appl. Finan. 3, 675–702 (2000)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967)
Onnela, J.-P., Kaski, K., Kertész, J.: Clustering and information in correlation based financial networks. Eur. Phys. J. B 38(2), 353–362 (2004). https://doi.org/10.1140/epjb/e2004-00128-7
Punj, G., Stewart, D.W.: Cluster analysis in marketing research: review and suggestions for application. J. Mark. Res. 20(2), 134–148 (1983)
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1, 81–106 (1986)
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Saisubramanian, S., Galhotra, S., Zilberstein, S.: Balancing the tradeoff between clustering value and interpretability. In: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, pp. 351–357 (2020)
Yang, H., Jiao, L., Pan, Q.: A survey on interpretable clustering. In: Proceedings of 40th Chinese Control Conference, pp. 7384–7388 (2021)
Yoon, S., Dernoncourt, F., Kim, D., Bui, T., Jung, K.: A compare-aggregate model with latent clustering for answer selection. In: Proceedings of the 28th International Conference on Information and Knowledge Management, pp. 2093–2096 (2019)
Acknowledgments
This study was conducted as a part of the Data Analysis Competition hosted by Joint Association Study Group of Management Science. The authors would like to thank the organizers and Rakuten Group, Inc. for providing us with a real data set.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Higashi, M. et al. (2024). Decision Tree Clustering for Time Series Data: An Approach for Enhanced Interpretability and Efficiency. In: Liu, F., Sadanandan, A.A., Pham, D.N., Mursanto, P., Lukose, D. (eds) PRICAI 2023: Trends in Artificial Intelligence. PRICAI 2023. Lecture Notes in Computer Science(), vol 14326. Springer, Singapore. https://doi.org/10.1007/978-981-99-7022-3_42
Download citation
DOI: https://doi.org/10.1007/978-981-99-7022-3_42
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7021-6
Online ISBN: 978-981-99-7022-3
eBook Packages: Computer ScienceComputer Science (R0)