Abstract
As technology advances, a large number of time series data have emerged in all walks of life. Clustering is a key technique for analysing time series data. However, most of the existing clustering methods calculate the distance of a single discrete data point, but cannot be applied to continuous time-series data with structural distortion (e.g., expansion, contraction, and drift) and noise (e.g., pseudo-event), resulting in low clustering accuracy. In this paper, a novel time series event clustering approach called CBR(Clustering Based on Representative sequences) is proposed. We first introduce a cross-correlation method to measure the distance between sequences with structural distortion, and propose an r-nearest neighbor evaluation system for sequences to construct candidate sets of R-Seqs(Representative sequences) and eliminate pseudo-event interference. Secondly, we formulate composite selection approaches for R-Seqs based on combinatorial optimization and diversifying top-k query to rapidly derive the R-Seqs optimal solution from the candidate sets. Finally, relying on the dynamically constructed distance matrix of R-Seqs and dataset, a matrix clustering method based on K-means is proposed to achieve an efficient division of event classes. Experimental results demonstrate that CBR is superior to the existing approaches in clustering accuracy, efficiency and denoising quality, especially the clustering accuracy is improved by more than 30% on average .
Similar content being viewed by others
References
Aghabozorgi S, Shirkhorshidi AS, Wah TY (2015) Time-series clustering-a decade review. Inf Syst 53:16–38
Rodriguez MZ, Comin CH, Casanova D, Bruno OM, Amancio DR, Costa LdF, Rodrigues FA (2019) Clustering algorithms: a comparative approach. PloS One 14(1):e0210236
Saxena A, Prasad M, Gupta A, Bharill N, Patel OP, Tiwari A, Er MJ, Ding W, Lin CT (2017) A review of clustering techniques and developments. Neurocomputing 267:664–681
Hastie T, Tibshirani R, Friedman J (2009) Unsupervised learning. the elements of statistical learning. Springer, New York
Celebi ME, Aydin K (2016) Unsupervised learning algorithms. Springer, New York
Bagnall A, Lines J, Bostrom A, Large J, Keogh E (2017) The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min Knowl Discov 31(3):606–660
Fawaz HI, Forestier G, Weber J, Idoumghar L, Muller PA (2019) Deep learning for time series classification: a review. Data Min Knowl Discov 33(4):917–963
Sohail MN, Jiadong R, Uba MM, Irshad M (2019) A comprehensive looks at data mining techniques contributing to medical data growth: a survey of researcher reviews. recent developments in intelligent computing, communication and devices. Springer, New York
Zhao J, Itti L (2018) Shapedtw: shape dynamic time warping. Pattern Recogn 74:171–184
Gidea M, Goldsmith D, Katz Y, Roldan P, Shmalo Y (2020) Topological recognition of critical transitions in time series of cryptocurrencies. Phys A: Stat Mech Appl pp, 123843
Li D, Tian Y (2018) Survey and experimental study on metric learning methods. Neural Netw 105:447–462
Barnett AH, Magland J, af Klinteberg L (2019) A parallel nonuniform fast fourier transform library based on an “exponential of semicircle’’ kernel. SIAM J Scientif Comput 41(5):C479–C504
Bryant A, Cios K (2018) Rnn-dbscan: a density-based clustering algorithm using reverse nearest neighbor density estimates. IEEE Trans Knowl Data Eng 30(6):1109–1121
Qin L, Yu JX, Chang L (2012) Diversifying top-k results. http://arxiv.org/abs/1208.0076
Hallac D, Vare S, Boyd S, Leskovec J (2017) Toeplitz inverse covariance-based clustering of multivariate time series data. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 215–223
Wei LY (2016) A hybrid anfis model based on empirical mode decomposition for stock time series forecasting. Appl Soft Comput 42:368–376
Nguyen H, Drebenstedt C, Bui XN, Bui DT (2020) Prediction of blast-induced ground vibration in an open-pit mine by a novel hybrid model based on clustering and artificial neural network. Nat Resour Res 29(2):691–709
Liu Z, Li X, Luo P, Loy CC, Tang X (2017) Deep learning markov random field for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 40(8):1814–1828
Wang A, Cho K (2019) Bert has a mouth, and it must speak: Bert as a markov random field language model. http://arxiv.org/abs/1902.04094
Paparrizos J, Gravano L (2017) Fast and accurate time-series clustering. ACM Trans Database Syst(TODS) 42(2):1–49
Liu Y, Chen J, Wu S, Liu Z, Chao H (2018) Incremental fuzzy c medoids clustering of time series data using dynamic time warping distance. PloS One 13(5):e0197499
Senin P (2008) Dynamic time warping algorithm review. Information and Computer Science Department University of Hawaii at Manoa Honolulu, USA 855(1–23):40
Zakaria J, Mueen A, Keogh E (2012) Clustering time series using unsupervised-shapelets. In: 2012 IEEE 12th International Conference on Data Mining, IEEE, pp 785–794
Rashno E, Minaei-Bidgoli B, Guo Y (2020) An effective clustering method based on data indeterminacy in neutrosophic set domain. Eng Appl Artif Intell 89:103411
Ali M, Dat LQ, Smarandache F et al (2018) Interval complex neutrosophic set: formulation and applications in decision-making. Int J Fuzzy Syst 20(3):986–999
Bandara K, Bergmeir C, Smyl S (2020) Forecasting across time series databases using recurrent neural networks on groups of similar series: a clustering approach. Expert Syst Appl 140:112896
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The National Natural Science Foundation of China(No. 61502215, 51704138), the China Postdoctoral Science Foundation(No. 2020M672134), the Scientific Research Project of the Educational Department of Liaoning Province(No. LJC201913, No. LJKZ0094).
Rights and permissions
About this article
Cite this article
Wang, J., Ma, R., Xia, L. et al. CBR: An Effective Clustering Approach for Time Series Events. Neural Process Lett 54, 3401–3423 (2022). https://doi.org/10.1007/s11063-022-10763-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-022-10763-3