Decision Tree Clustering for Time Series Data: An Approach for Enhanced Interpretability and Efficiency

Higashi, Masaki; Sung, Minje; Yamane, Daiki; Inamuro, Kenta; Nagai, Shota; Kobayashi, Ken; Nakata, Kazuhide

doi:10.1007/978-981-99-7022-3_42

Masaki Higashi¹²,
Minje Sung¹²,
Daiki Yamane¹²,
Kenta Inamuro¹²,
Shota Nagai¹²,
Ken Kobayashi¹² &
…
Kazuhide Nakata¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14326))

Included in the following conference series:

Pacific Rim International Conference on Artificial Intelligence

541 Accesses

Abstract

Clustering is one of the unsupervised learning methods for grouping similar data samples. While clustering has been used in a wide range, traditional clustering methods cannot provide clear interpretations of the resulting clusters. This has led to an increasing interest in interpretable clustering methods, which are mainly based on decision trees. However, the existing interpretable clustering methods are typically designed for tabular data and struggle when applied to time series data due to its complex nature. In this paper, we propose a novel interpretable time-series clustering method with decision trees. To address the interpretability challenges in time-series data, our method employs two separate feature sets, intuitive features for decision tree branching and original time-series observed values for evaluating a given clustering metric. This dual use enables us to construct interpretable clustering trees for time series data. In addition, to handle datasets with a large number of samples, we propose a new metric for evaluating clustering quality, called the surrogate silhouette coefficient, and present a heuristic algorithm for constructing a decision tree based on the metric. We show that the computational complexity for evaluating the proposed metric is much less than the silhouette coefficient, which is commonly used in decision tree-based clustering. Our numerical experiments demonstrated that our method constructed decision trees faster than the existing methods based on the silhouette coefficient while maintaining clustering quality. In addition, we applied our method to a time-series data on an e-commerce platform and succeeded in constructing an insightful decision tree.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Multi-operator Decision Trees for Explainable Time-Series Classification

Accelerating the discovery of unsupervised-shapelets

Article 07 May 2015

A Nested Two-Stage Clustering Method for Structured Temporal Sequence Data

Article 31 May 2021

Notes

1.
All code and scripts for our method are available at https://github.com/tokyotech-nakatalab/interpretable_time-series_clustering.

References

Basak, J., Krishnapuram, R.: Interpretable hierarchical clustering by constructing an unsupervised decision tree. IEEE Trans. Knowl. Data Eng. 17, 121–132 (2005)
Article Google Scholar
Bertsimas, D., Orfanoudaki, A., Wiberg, H.: Interpretable clustering: an optimization approach. Mach. Learn. 110, 89–138 (2021)
Article MathSciNet MATH Google Scholar
Blockeel, H., De Raedt, L., Ramon, J.: Top-down induction of clustering trees. In: Proceedings of the 15th International Conference on Machine Learning, pp. 55–63 (1998)
Google Scholar
Caron, M., Bojanowski, P., Joulin, A., Douze, M.: Deep clustering for unsupervised learning of visual features. In: Proceedings of the European Conference on Computer Vision, pp. 132–149 (2018)
Google Scholar
Chang, E., Shen, X., Yeh, H-S., Demberg, V.: On training instance selection for few-shot neural text generation. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 9707–9718 (2021)
Google Scholar
Coleman, G.B., Andrews, H.C.: Image segmentation by clustering. Proc. IEEE 67, 773–785 (1979)
Article Google Scholar
Dasgupta, S., Frost, N., Moshkovitz, M., Rashtchian, C.: Explainable k-means and k-medians clustering. In: Proceedings of the 37th International Conference on Machine Learning, pp. 7055–7065 (2020)
Google Scholar
De Raedt, L., Blockeel, H.: Using logical decision trees for clustering. In: International Conference on Inductive Logic Programming, pp. 133–140 (1997)
Google Scholar
Fraiman, R., Ghattas, B., Svarc, M.: Interpretable clustering using unsupervised binary trees. Adv. Data Anal. Classif. 7, 125–145 (2013)
Article MathSciNet MATH Google Scholar
Ghattas, B., Michel, P., Boyer, L.: Clustering nominal data using unsupervised binary decision trees: comparisons with the state of the art methods. Pattern Recogn. 67, 177–185 (2017)
Article Google Scholar
Joe, J., Ward, J., Jr.: Hierarchical grouping to optimize an objective function. Am. Stat. Assoc. 58, 236–244 (1963)
Article MathSciNet Google Scholar
Kim, K., Ahn, H.: A recommender system using GA k-means clustering in an online shopping market. Expert Syst. Appl. 34, 1200–1209 (2008)
Article Google Scholar
Liu, B., Yiyuan, X., Philip, S, Y.: Clustering through decision tree construction. In: Proceedings of the Ninth Conference on Information and Knowledge Management, pp. 20–29 (2000)
Google Scholar
Lux, T., Marchesi, M.: Volatility clustering in financial markets: a micro-simulation of interacting agents. Int. J. Theor. Appl. Finan. 3, 675–702 (2000)
Article MATH Google Scholar
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967)
Google Scholar
Onnela, J.-P., Kaski, K., Kertész, J.: Clustering and information in correlation based financial networks. Eur. Phys. J. B 38(2), 353–362 (2004). https://doi.org/10.1140/epjb/e2004-00128-7
Article Google Scholar
Punj, G., Stewart, D.W.: Cluster analysis in marketing research: review and suggestions for application. J. Mark. Res. 20(2), 134–148 (1983)
Article Google Scholar
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1, 81–106 (1986)
Article Google Scholar
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Article MATH Google Scholar
Saisubramanian, S., Galhotra, S., Zilberstein, S.: Balancing the tradeoff between clustering value and interpretability. In: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, pp. 351–357 (2020)
Google Scholar
Yang, H., Jiao, L., Pan, Q.: A survey on interpretable clustering. In: Proceedings of 40th Chinese Control Conference, pp. 7384–7388 (2021)
Google Scholar
Yoon, S., Dernoncourt, F., Kim, D., Bui, T., Jung, K.: A compare-aggregate model with latent clustering for answer selection. In: Proceedings of the 28th International Conference on Information and Knowledge Management, pp. 2093–2096 (2019)
Google Scholar

Download references

Acknowledgments

This study was conducted as a part of the Data Analysis Competition hosted by Joint Association Study Group of Management Science. The authors would like to thank the organizers and Rakuten Group, Inc. for providing us with a real data set.

Author information

Authors and Affiliations

School of Engineering, Tokyo Institute of Technology, Tokyo, Japan
Masaki Higashi, Minje Sung, Daiki Yamane, Kenta Inamuro, Shota Nagai, Ken Kobayashi & Kazuhide Nakata

Authors

Masaki Higashi
View author publications
You can also search for this author in PubMed Google Scholar
Minje Sung
View author publications
You can also search for this author in PubMed Google Scholar
Daiki Yamane
View author publications
You can also search for this author in PubMed Google Scholar
Kenta Inamuro
View author publications
You can also search for this author in PubMed Google Scholar
Shota Nagai
View author publications
You can also search for this author in PubMed Google Scholar
Ken Kobayashi
View author publications
You can also search for this author in PubMed Google Scholar
Kazuhide Nakata
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Masaki Higashi .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Fenrong Liu
SEEK Limited, Cremorne, NSW, Australia
Arun Anand Sadanandan
MIMOS (Malaysia), Kuala Lumpur, Malaysia
Duc Nghia Pham
Universitas Indonesia, Depok, Indonesia
Petrus Mursanto
Tabcorp Holdings Limited, Melbourne, VIC, Australia
Dickson Lukose

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Higashi, M. et al. (2024). Decision Tree Clustering for Time Series Data: An Approach for Enhanced Interpretability and Efficiency. In: Liu, F., Sadanandan, A.A., Pham, D.N., Mursanto, P., Lukose, D. (eds) PRICAI 2023: Trends in Artificial Intelligence. PRICAI 2023. Lecture Notes in Computer Science(), vol 14326. Springer, Singapore. https://doi.org/10.1007/978-981-99-7022-3_42

Download citation

DOI: https://doi.org/10.1007/978-981-99-7022-3_42
Published: 10 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7021-6
Online ISBN: 978-981-99-7022-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Decision Tree Clustering for Time Series Data: An Approach for Enhanced Interpretability and Efficiency

Abstract

Access this chapter

Similar content being viewed by others

Multi-operator Decision Trees for Explainable Time-Series Classification

Accelerating the discovery of unsupervised-shapelets

A Nested Two-Stage Clustering Method for Structured Temporal Sequence Data

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Decision Tree Clustering for Time Series Data: An Approach for Enhanced Interpretability and Efficiency

Abstract

Access this chapter

Similar content being viewed by others

Multi-operator Decision Trees for Explainable Time-Series Classification

Accelerating the discovery of unsupervised-shapelets

A Nested Two-Stage Clustering Method for Structured Temporal Sequence Data

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation