Local Subsequence-Based Distribution for Time Series Clustering

Gong, Lei; Zhang, Hang; Liu, Zongyou; Ting, Kai Ming; Cao, Yang; Zhu, Ye

doi:10.1007/978-981-97-2242-6_21

Lei Gong^13,14,
Hang Zhang^13,14,
Zongyou Liu^13,14,
Kai Ming Ting^13,14,
Yang Cao¹⁵ &
…
Ye Zhu¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14645))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

188 Accesses

Abstract

Analyzing the properties of subsequences within time series can reveal hidden patterns and improve the quality of time series clustering. However, most existing methods for subsequence analysis require point-to-point alignment, which is sensitive to shifts and noise. In this paper, we propose a clustering method named CTDS that treats time series as a set of independent and identically distributed (iid) points in \(\mathbb {R}^d\) extracted by a sliding window in local regions. CTDS utilises a distributional measure called Isolation Distributional Kernel (IDK) that can capture the subtle differences between probability distributions of subsequences without alignment. It has the ability to cluster large non-stationary and complex datasets. We evaluate CTDS on UCR time series benchmark datasets and demonstrate its superior performance than other state-of-the-art clustering methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The source code is available at https://github.com/LeisureGong/CTDS.
2.
The training and testing subsets of each dataset are merged for clustering evaluation.
3.
For short time series data with length less than 150, we directly generate all subsequences without split the data into segments.
4.
We also evaluated their performance using RI and found a similar result. All experimental details can be found in the supplementary file.
5.
For methods that solely produce representations or kernel matrices, we use K-means or Kernel K-means as the clustering algorithm.

References

Aghabozorgi, S., Shirkhorshidi, A.S., Wah, T.Y.: Time-series clustering - a decade review. Inf. Syst. 53, 16–38 (2015)
Article Google Scholar
Begum, N., Ulanova, L., Wang, J., Keogh, E.: Accelerating dynamic time warping clustering with a novel admissible pruning strategy. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 49–58 (2015)
Google Scholar
Bock, C., Togninalli, M., Ghisu, E., Gumbsch, T., Rieck, B., Borgwardt, K.: A Wasserstein subsequence kernel for time series. In: 2019 IEEE International Conference on Data Mining (ICDM), pp. 964–969. IEEE (2019)
Google Scholar
Dau, H.A., et al.: The UCR time series archive. IEEE/CAA J. Automatica Sinica 6(6), 1293–1305 (2019)
Article Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet Google Scholar
He, Y., Chu, X., Wang, Y.: Neighbor profile: bagging nearest neighbors for unsupervised time series mining. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp. 373–384. IEEE (2020)
Google Scholar
Lafabregue, B., Weber, J., Gançarski, P., Forestier, G.: End-to-end deep representation learning for time series clustering: a comparative study. Data Min. Knowl. Disc. 36(1), 29–81 (2022)
Article MathSciNet Google Scholar
Lei, Q., Yi, J., Vaculin, R., Wu, L., Dhillon, I.S.: Similarity preserving representation learning for time series clustering. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI (2017)
Google Scholar
Li, X., Lin, J., Zhao, L.: Time series clustering in linear time complexity. Data Min. Knowl. Disc. 35, 2369–2388 (2021)
Article MathSciNet Google Scholar
Ma, Q., Zheng, J., Li, S., Cottrell, G.W.: Learning representations for time series clustering. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11) (2008)
Google Scholar
MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, vol. 1, pp. 281–297 (1967)
Google Scholar
Madiraju, N.S.: Deep temporal clustering: fully unsupervised learning of time-domain features. Ph.D. thesis, Arizona State University (2018)
Google Scholar
Paparrizos, J., Gravano, L.: k-shape: Efficient and accurate clustering of time series. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1855–1870 (2015)
Google Scholar
Qin, X., Ting, K.M., Zhu, Y., Lee, V.: Nearest-neighbour-induced isolation similarity and its impact on density-based clustering. In: Proceedings of the 33rd AAAI Conference on AI (AAAI 2019). AAAI Press (2019)
Google Scholar
Ting, K.M., Liu, Z., Zhang, H., Zhu, Y.: A new distributional treatment for time series and an anomaly detection investigation. Proc. VLDB Endow. 15(11), 2321–2333 (2022)
Article Google Scholar
Ting, K.M., Xu, B.C., Washio, T., Zhou, Z.H.: Isolation distributional kernel: a new tool for kernel based anomaly detection. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 198–206 (2020)
Google Scholar
Ting, K.M., Zhu, Y., Zhou, Z.H.: Isolation kernel and its effect on SVM. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2329–2337 (2018)
Google Scholar
Tonekaboni, S., Eytan, D., Goldenberg, A.: Unsupervised representation learning for time series with temporal neighborhood coding. In: International Conference on Learning Representations (2021)
Google Scholar
Ulanova, L., Begum, N., Keogh, E.: Scalable clustering of time series with u-shapelets. In: Proceedings of the 2015 SIAM International Conference on Data Mining, pp. 900–908. SIAM (2015)
Google Scholar
Vallender, S.: Calculation of the Wasserstein distance between probability distributions on the line. Theory Probab. Appl. 18(4), 784–786 (1974)
Article Google Scholar
Ye, L., Keogh, E.: Time series shapelets: a new primitive for data mining. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 947–956 (2009)
Google Scholar
Yeh, C.C.M., et al.: Matrix profile i: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 1317–1322. IEEE (2016)
Google Scholar
Yue, Z., et al.: Ts2vec: towards universal representation of time series. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 8980–8987 (2022)
Google Scholar
Zakaria, J., Mueen, A., Keogh, E.: Clustering time series using unsupervised-shapelets. In: 2012 IEEE 12th International Conference on Data Mining, pp. 785–794. IEEE (2012)
Google Scholar
Zhao, Y., Ye, L., Li, Z., Song, X., Lang, Y., Su, J.: A novel bidirectional mechanism based on time series model for wind power forecasting. Appl. Energy 177, 793–803 (2016)
Article Google Scholar

Download references

Acknowledgements

This project is supported by National Natural Science Foundation of China (Grant No. 62076120).

Author information

Authors and Affiliations

National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Lei Gong, Hang Zhang, Zongyou Liu & Kai Ming Ting
School of Artificial Intelligence, Nanjing University, Nanjing, China
Lei Gong, Hang Zhang, Zongyou Liu & Kai Ming Ting
School of Information Technology, Deakin University, Burwood, VIC, Australia
Yang Cao & Ye Zhu

Authors

Lei Gong
View author publications
You can also search for this author in PubMed Google Scholar
Hang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zongyou Liu
View author publications
You can also search for this author in PubMed Google Scholar
Kai Ming Ting
View author publications
You can also search for this author in PubMed Google Scholar
Yang Cao
View author publications
You can also search for this author in PubMed Google Scholar
Ye Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lei Gong .

Editor information

Editors and Affiliations

Academia Sinica, Taipei, Taiwan
De-Nian Yang
Microsoft Research Asia, Beijing, China
Xing Xie
National Yang Ming Chiao Tung University, Hsinchu, Taiwan
Vincent S. Tseng
Duke University, Durham, NC, USA
Jian Pei
National Cheng Kung University, Tainan, Taiwan
Jen-Wei Huang
Silesian University of Technology, Gliwice, Poland
Jerry Chun-Wei Lin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gong, L., Zhang, H., Liu, Z., Ting, K.M., Cao, Y., Zhu, Y. (2024). Local Subsequence-Based Distribution for Time Series Clustering. In: Yang, DN., Xie, X., Tseng, V.S., Pei, J., Huang, JW., Lin, J.CW. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2024. Lecture Notes in Computer Science(), vol 14645. Springer, Singapore. https://doi.org/10.1007/978-981-97-2242-6_21

Download citation

DOI: https://doi.org/10.1007/978-981-97-2242-6_21
Published: 25 April 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2241-9
Online ISBN: 978-981-97-2242-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Local Subsequence-Based Distribution for Time Series Clustering