Abstract
Achieving accurate, fast, and interpretable time series classification (TSC) has attracted considerable attention from the data mining community over the past decades. In this paper, we propose an efficient algorithm, called Compressed Random Shapelet Forest (CRSF), to tackle this problem. Different from most of the shapelet-based TSC methods, CRSF obtains promising performance by greatly compressing the shapelet features space. In order to achieve the aim of compression, the time series dataset, as well as the shapelets, are represented by Symbolic Aggregate approXimation (SAX) at first. Then, the shapelet-based decision trees are built upon a pool of high-quality shapelet candidates of which the useless shapelets and the self-similar shapelets have been pre-pruned. A new function for measuring the distance between two SAX-represented time series is also introduced. Extensive experiments were conducted on 50 UCR time series datasets. The results show that (1) CRSF can achieve the highest average accuracy on the datasets and it outperforms most of the existing shapelet-based TSC methods; (2) CRSF is slightly superior to gRSF in terms of accuracy and is significantly superior to gRSF in terms of time cost. Specifically, it is on average 41 times faster than gRSF according to the experimental results.
Similar content being viewed by others
References
Bagnall A, Lines J, Bostrom A, Large J, Keogh E (2017) The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min Knowl Disc 31(3):606–660
Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2021) The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min Knowl Disc 35(2):401–449
Gordon D, Hendler D, Kontorovich A, Rokach L (2015) Local-shapelets for fast classification of spectrographic measurements. Expert Syst Appl 42(6):3150–3158
Li G, Yan W, Wu Z (2019) Discovering shapelets with key points in time series classification. Expert Sys Appl 132:76–86
Hong JY, Park SH, Baek J-G (2020) SSDTW: shape segment dynamic time warping. Expert Syst Appl 150:113291
Lahreche A, Boucheham B (2021) A fast and accurate similarity measure for long time series classification based on local extrema and dynamic time warping Expert Sys Appl 168:114374
Wang Z, Yan W, Oates T (2017) Time series classification from scratch with deep neural networks: a strong baseline. In: 2017 international joint conference on neural networks (IJCNN'17), pp 1578–85
Ismail Fawaz H, Forestier G, Weber J, Idoumghar L, Muller P-A (2019) Deep learning for time series classification: a review. Data Min Knowl Disc 33(4):917–963
Ismail Fawaz H, Lucas B, Forestier G, Pelletier C, Schmidt DF, Weber J, Webb GI, Idoumghar L, Muller P-A, Petitjean F (2020) InceptionTime: finding AlexNet for time series classification. Data Min Knowl Disc 34(6):1936–1962
Zhang X, Gao Y, Lin J, Lu C-T (2020) TapNet: multivariate time series classification with attentional prototypical network. In: proceedings of the AAAI conference on artificial intelligence (AAAI'20), pp. 6845–52
Baydogan MG, Runger G (2015) Time series representation and similarity based on local autopatterns. Data Min Knowl Disc 30(2):476–509
Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining. In: proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (KDD'09), pp 947–956
Ye L, Keogh E (2011) Time series shapelets: a novel technique that allows accurate, interpretable and fast classification. Data Min Knowl Disc 22:149–182
Dau HA, Bagnall A, Kamgar K, Yeh C-C M, Zhu Y, Gharghabi S, Ratanamahatana CA, Keogh E (2019) The UCR time series archive. IEEE/CAA Journal of Automatica Sinica 6(6):1293–1305
Mueen A, Keogh E, Young N (2011) Logical-Shapelets: an expressive primitive for time series classification. In: proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining (KDD'11), pp 1154–1162
Górecki T, Łuczak M (2012) Using derivatives in time series classification. Data Min Knowl Disc 26(2):310–331
Rakthanmanon T, Keogh E (2013) Fast Shapelets: a scalable algorithm for discovering time series Shapelets. In: proceedings of the 2013 SIAM international conference on data mining (SDM'13), pp 668–76
Grabocka J, Schilling N, Wistuba M, Schmidt-Thieme L (2014) Learning time-series shapelets. In: proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining (KDD'14), pp 392–401
Grabocka J, Wistuba M, Schmidt-Thieme L (2015) Fast classification of univariate and multivariate time series through shapelet discovery. Knowl Inf Syst 49(2):429–454
Hou L, Kwok J T, Zurada J M (2016) Efficient learning of Timeseries Shapelets. In: proceedings of the thirtieth AAAI conference on artificial intelligence (AAAI'16), pp 1209–15
Karlsson I, Papapetrou P, Boström H (2016) Generalized random shapelet forests. Data Min Knowl Disc 30(5):1053–1085
Fang Z, Wang P, Wang W (2018) Efficient learning interpretable Shapelets for accurate time series classification. In: 2018 IEEE 34th international conference on data engineering (ICDE'18), pp 497–508
Li G, Choi B, Xu J, Bhowmick S S, Chun K-P, Wong G L-H (2020) Efficient Shapelet discovery for time series classification. IEEE transactions on knowledge and data engineering 34(3):1149–1163
Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Disc 15(2):107–144
Wang X, Mueen A, Ding H, Trajcevski G, Scheuermann P, Keogh E (2013) Experimental comparison of representation methods and distance measures for time series data. Data Min Knowl Disc 26(2):275–309
Agrawal R, Faloutsos C, Swami A (1993) Efficient similarity search in sequence databases. In: International Conference on Foundations of Data Organization and Algorithms (FODO'93), pp. 69–84
Chan FK-P, Fu AW-C, Yu C (2003) Haar wavelets for efficient similarity search of time-series: with and without time warping. IEEE Trans Knowl Data Eng 15(3):686–705
Marteau PF (2009) Time warp edit distance with stiffness adjustment for time series matching. IEEE Trans Pattern Anal Mach Intell 31(2):306–318
Jeong Y-S, Jeong MK, Omitaomu OA (2011) Weighted dynamic time warping for time series classification. Pattern Recogn 44(9):2231–2240
Batista GEAPA, Keogh EJ, Tataw OM, de Souza VMA (2013) CID: an efficient complexity-invariant distance for time series. Data Min Knowl Disc 28(3):634–669
Stefan A, Athitsos V, Das G (2013) The move-Split-merge metric for time series. IEEE Trans Knowl Data Eng 25(6):1425–1438
Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Dimensionality reduction for fast similarity search in Large time series databases. Knowl Inf Syst 3(3):263–286
Hills J, Lines J, Baranauskas E, Mapp J, Bagnall A (2013) Classification of time series by shapelet transformation. Data Min Knowl Disc 28(4):851–881
Acknowledgements
The authors thank the reviewers for their work and the contributors of the UCR archive. This work is supported by the open project fund of Intelligent Terminal Key Laboratory of Sichuan Province (Grant No. SCITLAB-1002), and the open fund of Key Laboratory of Internet Natural Language Processing of Sichuan Province Education Department (Grant No. INLP201906), and fund of Science and Technology Bureau of Leshan Town (Grant Nos. 21SZD092, 20GZD020).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yang, J., Jing, S. & Huang, G. Accurate and fast time series classification based on compressed random Shapelet Forest. Appl Intell 53, 5240–5258 (2023). https://doi.org/10.1007/s10489-022-03852-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03852-2