Abstract
Analyzing the properties of subsequences within time series can reveal hidden patterns and improve the quality of time series clustering. However, most existing methods for subsequence analysis require point-to-point alignment, which is sensitive to shifts and noise. In this paper, we propose a clustering method named CTDS that treats time series as a set of independent and identically distributed (iid) points in \(\mathbb {R}^d\) extracted by a sliding window in local regions. CTDS utilises a distributional measure called Isolation Distributional Kernel (IDK) that can capture the subtle differences between probability distributions of subsequences without alignment. It has the ability to cluster large non-stationary and complex datasets. We evaluate CTDS on UCR time series benchmark datasets and demonstrate its superior performance than other state-of-the-art clustering methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The source code is available at https://github.com/LeisureGong/CTDS.
- 2.
The training and testing subsets of each dataset are merged for clustering evaluation.
- 3.
For short time series data with length less than 150, we directly generate all subsequences without split the data into segments.
- 4.
We also evaluated their performance using RI and found a similar result. All experimental details can be found in the supplementary file.
- 5.
For methods that solely produce representations or kernel matrices, we use K-means or Kernel K-means as the clustering algorithm.
References
Aghabozorgi, S., Shirkhorshidi, A.S., Wah, T.Y.: Time-series clustering - a decade review. Inf. Syst. 53, 16–38 (2015)
Begum, N., Ulanova, L., Wang, J., Keogh, E.: Accelerating dynamic time warping clustering with a novel admissible pruning strategy. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 49–58 (2015)
Bock, C., Togninalli, M., Ghisu, E., Gumbsch, T., Rieck, B., Borgwardt, K.: A Wasserstein subsequence kernel for time series. In: 2019 IEEE International Conference on Data Mining (ICDM), pp. 964–969. IEEE (2019)
Dau, H.A., et al.: The UCR time series archive. IEEE/CAA J. Automatica Sinica 6(6), 1293–1305 (2019)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
He, Y., Chu, X., Wang, Y.: Neighbor profile: bagging nearest neighbors for unsupervised time series mining. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp. 373–384. IEEE (2020)
Lafabregue, B., Weber, J., Gançarski, P., Forestier, G.: End-to-end deep representation learning for time series clustering: a comparative study. Data Min. Knowl. Disc. 36(1), 29–81 (2022)
Lei, Q., Yi, J., Vaculin, R., Wu, L., Dhillon, I.S.: Similarity preserving representation learning for time series clustering. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI (2017)
Li, X., Lin, J., Zhao, L.: Time series clustering in linear time complexity. Data Min. Knowl. Disc. 35, 2369–2388 (2021)
Ma, Q., Zheng, J., Li, S., Cottrell, G.W.: Learning representations for time series clustering. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11) (2008)
MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, vol. 1, pp. 281–297 (1967)
Madiraju, N.S.: Deep temporal clustering: fully unsupervised learning of time-domain features. Ph.D. thesis, Arizona State University (2018)
Paparrizos, J., Gravano, L.: k-shape: Efficient and accurate clustering of time series. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1855–1870 (2015)
Qin, X., Ting, K.M., Zhu, Y., Lee, V.: Nearest-neighbour-induced isolation similarity and its impact on density-based clustering. In: Proceedings of the 33rd AAAI Conference on AI (AAAI 2019). AAAI Press (2019)
Ting, K.M., Liu, Z., Zhang, H., Zhu, Y.: A new distributional treatment for time series and an anomaly detection investigation. Proc. VLDB Endow. 15(11), 2321–2333 (2022)
Ting, K.M., Xu, B.C., Washio, T., Zhou, Z.H.: Isolation distributional kernel: a new tool for kernel based anomaly detection. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 198–206 (2020)
Ting, K.M., Zhu, Y., Zhou, Z.H.: Isolation kernel and its effect on SVM. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2329–2337 (2018)
Tonekaboni, S., Eytan, D., Goldenberg, A.: Unsupervised representation learning for time series with temporal neighborhood coding. In: International Conference on Learning Representations (2021)
Ulanova, L., Begum, N., Keogh, E.: Scalable clustering of time series with u-shapelets. In: Proceedings of the 2015 SIAM International Conference on Data Mining, pp. 900–908. SIAM (2015)
Vallender, S.: Calculation of the Wasserstein distance between probability distributions on the line. Theory Probab. Appl. 18(4), 784–786 (1974)
Ye, L., Keogh, E.: Time series shapelets: a new primitive for data mining. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 947–956 (2009)
Yeh, C.C.M., et al.: Matrix profile i: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 1317–1322. IEEE (2016)
Yue, Z., et al.: Ts2vec: towards universal representation of time series. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 8980–8987 (2022)
Zakaria, J., Mueen, A., Keogh, E.: Clustering time series using unsupervised-shapelets. In: 2012 IEEE 12th International Conference on Data Mining, pp. 785–794. IEEE (2012)
Zhao, Y., Ye, L., Li, Z., Song, X., Lang, Y., Su, J.: A novel bidirectional mechanism based on time series model for wind power forecasting. Appl. Energy 177, 793–803 (2016)
Acknowledgements
This project is supported by National Natural Science Foundation of China (Grant No. 62076120).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Gong, L., Zhang, H., Liu, Z., Ting, K.M., Cao, Y., Zhu, Y. (2024). Local Subsequence-Based Distribution for Time Series Clustering. In: Yang, DN., Xie, X., Tseng, V.S., Pei, J., Huang, JW., Lin, J.CW. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2024. Lecture Notes in Computer Science(), vol 14645. Springer, Singapore. https://doi.org/10.1007/978-981-97-2242-6_21
Download citation
DOI: https://doi.org/10.1007/978-981-97-2242-6_21
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2241-9
Online ISBN: 978-981-97-2242-6
eBook Packages: Computer ScienceComputer Science (R0)