Skip to main content

Local Subsequence-Based Distribution for Time Series Clustering

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2024)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14645))

Included in the following conference series:

  • 188 Accesses

Abstract

Analyzing the properties of subsequences within time series can reveal hidden patterns and improve the quality of time series clustering. However, most existing methods for subsequence analysis require point-to-point alignment, which is sensitive to shifts and noise. In this paper, we propose a clustering method named CTDS that treats time series as a set of independent and identically distributed (iid) points in \(\mathbb {R}^d\) extracted by a sliding window in local regions. CTDS utilises a distributional measure called Isolation Distributional Kernel (IDK) that can capture the subtle differences between probability distributions of subsequences without alignment. It has the ability to cluster large non-stationary and complex datasets. We evaluate CTDS on UCR time series benchmark datasets and demonstrate its superior performance than other state-of-the-art clustering methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The source code is available at https://github.com/LeisureGong/CTDS.

  2. 2.

    The training and testing subsets of each dataset are merged for clustering evaluation.

  3. 3.

    For short time series data with length less than 150, we directly generate all subsequences without split the data into segments.

  4. 4.

    We also evaluated their performance using RI and found a similar result. All experimental details can be found in the supplementary file.

  5. 5.

    For methods that solely produce representations or kernel matrices, we use K-means or Kernel K-means as the clustering algorithm.

References

  1. Aghabozorgi, S., Shirkhorshidi, A.S., Wah, T.Y.: Time-series clustering - a decade review. Inf. Syst. 53, 16–38 (2015)

    Article  Google Scholar 

  2. Begum, N., Ulanova, L., Wang, J., Keogh, E.: Accelerating dynamic time warping clustering with a novel admissible pruning strategy. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 49–58 (2015)

    Google Scholar 

  3. Bock, C., Togninalli, M., Ghisu, E., Gumbsch, T., Rieck, B., Borgwardt, K.: A Wasserstein subsequence kernel for time series. In: 2019 IEEE International Conference on Data Mining (ICDM), pp. 964–969. IEEE (2019)

    Google Scholar 

  4. Dau, H.A., et al.: The UCR time series archive. IEEE/CAA J. Automatica Sinica 6(6), 1293–1305 (2019)

    Article  Google Scholar 

  5. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  Google Scholar 

  6. He, Y., Chu, X., Wang, Y.: Neighbor profile: bagging nearest neighbors for unsupervised time series mining. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp. 373–384. IEEE (2020)

    Google Scholar 

  7. Lafabregue, B., Weber, J., Gançarski, P., Forestier, G.: End-to-end deep representation learning for time series clustering: a comparative study. Data Min. Knowl. Disc. 36(1), 29–81 (2022)

    Article  MathSciNet  Google Scholar 

  8. Lei, Q., Yi, J., Vaculin, R., Wu, L., Dhillon, I.S.: Similarity preserving representation learning for time series clustering. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI (2017)

    Google Scholar 

  9. Li, X., Lin, J., Zhao, L.: Time series clustering in linear time complexity. Data Min. Knowl. Disc. 35, 2369–2388 (2021)

    Article  MathSciNet  Google Scholar 

  10. Ma, Q., Zheng, J., Li, S., Cottrell, G.W.: Learning representations for time series clustering. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  11. Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11) (2008)

    Google Scholar 

  12. MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, vol. 1, pp. 281–297 (1967)

    Google Scholar 

  13. Madiraju, N.S.: Deep temporal clustering: fully unsupervised learning of time-domain features. Ph.D. thesis, Arizona State University (2018)

    Google Scholar 

  14. Paparrizos, J., Gravano, L.: k-shape: Efficient and accurate clustering of time series. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1855–1870 (2015)

    Google Scholar 

  15. Qin, X., Ting, K.M., Zhu, Y., Lee, V.: Nearest-neighbour-induced isolation similarity and its impact on density-based clustering. In: Proceedings of the 33rd AAAI Conference on AI (AAAI 2019). AAAI Press (2019)

    Google Scholar 

  16. Ting, K.M., Liu, Z., Zhang, H., Zhu, Y.: A new distributional treatment for time series and an anomaly detection investigation. Proc. VLDB Endow. 15(11), 2321–2333 (2022)

    Article  Google Scholar 

  17. Ting, K.M., Xu, B.C., Washio, T., Zhou, Z.H.: Isolation distributional kernel: a new tool for kernel based anomaly detection. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 198–206 (2020)

    Google Scholar 

  18. Ting, K.M., Zhu, Y., Zhou, Z.H.: Isolation kernel and its effect on SVM. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2329–2337 (2018)

    Google Scholar 

  19. Tonekaboni, S., Eytan, D., Goldenberg, A.: Unsupervised representation learning for time series with temporal neighborhood coding. In: International Conference on Learning Representations (2021)

    Google Scholar 

  20. Ulanova, L., Begum, N., Keogh, E.: Scalable clustering of time series with u-shapelets. In: Proceedings of the 2015 SIAM International Conference on Data Mining, pp. 900–908. SIAM (2015)

    Google Scholar 

  21. Vallender, S.: Calculation of the Wasserstein distance between probability distributions on the line. Theory Probab. Appl. 18(4), 784–786 (1974)

    Article  Google Scholar 

  22. Ye, L., Keogh, E.: Time series shapelets: a new primitive for data mining. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 947–956 (2009)

    Google Scholar 

  23. Yeh, C.C.M., et al.: Matrix profile i: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 1317–1322. IEEE (2016)

    Google Scholar 

  24. Yue, Z., et al.: Ts2vec: towards universal representation of time series. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 8980–8987 (2022)

    Google Scholar 

  25. Zakaria, J., Mueen, A., Keogh, E.: Clustering time series using unsupervised-shapelets. In: 2012 IEEE 12th International Conference on Data Mining, pp. 785–794. IEEE (2012)

    Google Scholar 

  26. Zhao, Y., Ye, L., Li, Z., Song, X., Lang, Y., Su, J.: A novel bidirectional mechanism based on time series model for wind power forecasting. Appl. Energy 177, 793–803 (2016)

    Article  Google Scholar 

Download references

Acknowledgements

This project is supported by National Natural Science Foundation of China (Grant No. 62076120).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Gong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gong, L., Zhang, H., Liu, Z., Ting, K.M., Cao, Y., Zhu, Y. (2024). Local Subsequence-Based Distribution for Time Series Clustering. In: Yang, DN., Xie, X., Tseng, V.S., Pei, J., Huang, JW., Lin, J.CW. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2024. Lecture Notes in Computer Science(), vol 14645. Springer, Singapore. https://doi.org/10.1007/978-981-97-2242-6_21

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-2242-6_21

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-2241-9

  • Online ISBN: 978-981-97-2242-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics