Random pairwise shapelets forest: an effective classifier for time series

Yuan, Jidong; Shi, Mohan; Wang, Zhihai; Liu, Haiyang; Li, Jinyang

doi:10.1007/s10115-021-01630-z

Random pairwise shapelets forest: an effective classifier for time series

Regular Paper
Published: 10 January 2022

Volume 64, pages 143–174, (2022)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Jidong Yuan ORCID: orcid.org/0000-0003-2654-3372¹,
Mohan Shi²,
Zhihai Wang¹,
Haiyang Liu¹ &
…
Jinyang Li³

723 Accesses
6 Citations
1 Altmetric
Explore all metrics

Abstract

Shapelet is a discriminative subsequence of time series. An advanced shapelet-based method is to embed shapelet into the accurate and fast random forest. However, there are several limitations. First, random shapelet forest requires a large training cost for split threshold searching. Second, a single shapelet provides limited information for only one branch of the decision tree, resulting in insufficient accuracy. Third, the randomized ensemble decreases comprehensibility. For that, this paper presents Random Pairwise Shapelets Forest (RPSF). RPSF combines a pair of shapelets from different classes to construct random forest. It omits threshold searching to be more efficient, includes more information about each node of the forest to be more effective. Moreover, a discriminability measure, Decomposed Mean Decrease Impurity, is proposed to identify the influential region for each class. Extensive experiments show that RPSF is competitive compared with other methods, while it improves the training speed of shapelet-based forest.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Random Pairwise Shapelets Forest

Generalized random shapelet forests

Article 12 July 2016

Isak Karlsson, Panagiotis Papapetrou & Henrik Boström

Early Random Shapelet Forest

Notes

An earlier version of the RPSF was presented with a limited empirical evaluation in [41].
They could be applied on multi-class time series directly.
The source code of RPSF is available on our website https://github.com/nephashi/RandomPairwiseShapeletsForest.
http://www.timeseriesclassification.com.
Some results of gRSF are not provided due to the huge required time consumption, while the reason for TSBF is the bug of the source code.

References

Bagnall AJ, Lines J, Bostrom A, Large J, Keogh EJ (2017) The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min Knowl Discov 31(3):606–660. https://doi.org/10.1007/s10618-016-0483-9
Article MathSciNet Google Scholar
Bagnall A, Lines J, Hills J, Bostrom A (2015) Time-series classification with cote: the collective of transformation-based ensembles. IEEE Trans Knowl Data Eng 27(9):2522–2535
Article Google Scholar
Batista GEAPA, Keogh EJ, Tataw OM, de Souza VMA (2014) CID: an efficient complexity-invariant distance for time series. Data Min Knowl Discov 28(3):634–669. https://doi.org/10.1007/s10618-013-0312-3
Article MathSciNet MATH Google Scholar
Baydogan MG, Runger GC (2016) Time series representation and similarity based on local autopatterns. Data Min Knowl Discov 30(2):476–509. https://doi.org/10.1007/s10618-015-0425-y
Article MathSciNet MATH Google Scholar
Baydogan MG, Runger GC, Tuv E (2013) A bag-of-features framework to classify time series. IEEE Trans Pattern Anal Mach Intell 35(11):2796–2802. https://doi.org/10.1109/TPAMI.2013.72
Article Google Scholar
Bostrom A, Bagnall A (2015) Binary shapelet transform for multiclass time series classification. In: International conference on big data analytics and knowledge discovery. Springer, pp 257–269
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article Google Scholar
Cetin MS, Mueen A, Calhoun VD (2015) Shapelet ensemble for multi-dimensional time series. In: Proceedings of the 2015 SIAM international conference on data mining. SIAM, pp 307–315
Cui Z, Chen W, Chen Y (2016) Multi-scale convolutional neural networks for time series classification. arXiv preprint. arXiv:1603.06995
Deng H, Runger G, Tuv E, Vladimir M (2013) A time series forest for classification and feature extraction. Inf Sci 239:142–153
Article MathSciNet Google Scholar
Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. Proc VLDB Endow 1(2):1542–1552
Article Google Scholar
Fawaz HI, Forestier G, Weber J, Idoumghar L, Muller P-A (2019) Deep learning for time series classification: a review. Data Min Knowl Discov 33(4):917–963
Article MathSciNet Google Scholar
Grabocka J, Schilling N, Wistuba M, Schmidt-Thieme L (2014) Learning time-series shapelets. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 392–401
Haghiri S, Garreau D, von Luxburg U (2018) Comparison-based random forests. In: Proceedings of the 35th international conference on machine learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10–15, 2018, pp 1866–1875
Hills J, Lines J, Baranauskas E, Mapp J, Bagnall A (2014) Classification of time series by shapelet transformation. Data Min Knowl Discov 28(4):851–881
Article MathSciNet Google Scholar
Hou L, Kwok JT, Zurada JM (2016) Efficient learning of timeseries shapelets. In: Thirtieth AAAI conference on artificial intelligence
Itakura F (1975) Minimum prediction residual principle applied to speech recognition. IEEE Trans Acoust Speech Signal Process 23(1):67–72
Article Google Scholar
Jansen T, Weyland D (2010) Analysis of evolutionary algorithms for the longest common subsequence problem. Algorithmica 57(1):170–186. https://doi.org/10.1007/s00453-008-9243-6
Article MathSciNet MATH Google Scholar
Jeong Y, Jeong M, Omitaomu O (2011) Weighted dynamic time warping for time series classification. Pattern Recognit 44:2231–2240
Article Google Scholar
Karlsson I, Papapetrou P, Asker L (2015) Multi-channel ecg classification using forests of randomized shapelet trees. In: Proceedings of the 8th ACM international conference on pervasive technologies related to assistive environments. ACM, p 43
Karlsson, I, Papapetrou P, Boström H (2015) Forests of randomized shapelet trees. In: International symposium on statistical learning and data sciences. Springer, pp 126–136
Karlsson I, Papapetrou P, Boström H (2016) Early random shapelet forest. In: International conference on discovery science. Springer, pp 261–276
Karlsson I, Papapetrou P, Boström H (2016b) Generalized random shapelet forests. Data Min Knowl Discov 30(5):1053–1085
Keogh EJ, Pazzani MJ (2001a) Derivative dynamic time warping. In: SDM, Vol. 1. SIAM, pp 5–7
Keogh E, Pazzani M (2001) Derivative dynamic time warping. In: First international conference on data mining
Le Guennec A, Malinowski S, Tavenard R (2016) Data augmentation for time series classification using convolutional neural networks. In: ECML/PKDD workshop on advanced analytics and learning on temporal data
Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing sax: a novel symbolic representation of time series. Data Min Knowl Discov 15(2):107–144
Article MathSciNet Google Scholar
Lin J, Khade R, Li Y (2012) Rotation-invariant similarity in time series using bag-of-patterns representation. J Intell Inf Syst 39(2):287–315. https://doi.org/10.1007/s10844-012-0196-5
Article Google Scholar
Lines J, Bagnall AJ (2015) Time series classification with ensembles of elastic distance measures. Data Min Knowl Discov 29(3):565–592. https://doi.org/10.1007/s10618-014-0361-2
Article MathSciNet MATH Google Scholar
Lines J, Davis LM, Hills J, Bagnall A (2012) A shapelet transform for time series classification. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 289–297
Marteau P (2009) Time warp edit distance with stiffness adjustment for time series matching. IEEE Trans Pattern Anal Mach Intell 31(2):306–318. https://doi.org/10.1109/TPAMI.2008.76
Article Google Scholar
Mueen A, Keogh E, Young N (2011) Logical-shapelets: an expressive primitive for time series classification. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1154–1162
Rakthanmanon T, Keogh E ( 2013) Fast shapelets: a scalable algorithm for discovering time series shapelets. In: Proceedings of the 2013 SIAM international conference on data mining. SIAM, pp 668–676
Renard X, Rifqi M, Erray W, Detyniecki M (2015) Random-shapelet: an algorithm for fast shapelet discovery. In: 2015 IEEE international conference on data science and advanced analytics (DSAA), 2015, 36678. IEEE, pp 1–10
Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 26(1):43–49
Article Google Scholar
Sathe S, Aggarwal CC (2017) Similarity forests. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 395–403
Schäfer P (2015) The BOSS is concerned with time series classification in the presence of noise. Data Min Knowl Discov 29(6):1505–1530. https://doi.org/10.1007/s10618-014-0377-7
Article MathSciNet MATH Google Scholar
Schäfer P, Leser U (2017) Fast and accurate time series classification with weasel. In: Proceedings of the 2017 ACM on conference on information and knowledge management. ACM, pp 637–646
Senin P, Malinchik S (2013) SAX-VSM: interpretable time series classification using SAX and vector space model. In: 2013 IEEE 13th international conference on data mining, Dallas, TX, USA, December 7–10, 2013, pp 1175–1180. https://doi.org/10.1109/ICDM.2013.52
Serrà J, Pascual S, Karatzoglou A (2018) Towards a universal neural network encoder for time series. In: CCIA, pp 120–129
Shi M, Wang Z, Yuan J, Liu H (2018) Random pairwise shapelets forest. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 68–80
Stefan A, Athitsos V, Das G (2013) The move-split-merge metric for time series. IEEE Trans Knowl Data Eng 25(6):1425–1438. https://doi.org/10.1109/TKDE.2012.88
Article Google Scholar
Tanisaro P, Heidemann G (2016) Time series classification using time warping invariant echo state networks. In: 2016 15th IEEE international conference on machine learning and applications (ICMLA). IEEE, pp 831–836
Wang Z, Yan W, Oates T (2017) Time series classification from scratch with deep neural networks: a strong baseline. In: International joint conference on neural networks
Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 947–956
Yuan J-D, Wang Z-H, Han M (2014) A discriminative shapelets transformation for time series classification. Int J Pattern Recognit Artif Intell 28(06):1450014
Article Google Scholar
Yuan J, Douzal-Chouakria A, Yazdi SV, Wang Z (2018) A large margin time series nearest neighbour classification under locally weighted time warps. Knowl Inf Syst, pp 1–19
Yuan J, Lin Q, Zhang W, Wang Z (2019) Locally slope-based dynamic time warping for time series classification. In: Proceedings of the 28th ACM international conference on information and knowledge management, pp 1713–1722
Yuan J, Wang Z, Han M, Sun Y (2015) A lazy associative classifier for time series. Intell Data Anal 19(5):983–1002
Article Google Scholar
Zhang Z, Zhang H, Wen Y, Zhang Y, Yuan X (2018) Discriminative extraction of features from time series. Neurocomputing 275:2317–2328
Article Google Scholar
Zhao B, Lu H, Chen S, Liu J, Wu D (2017) Convolutional neural networks for time series classification. J Syst Eng Electron 28(1):162–169
Article Google Scholar
Zhao J, Itti L (2018) shapedtw: shape dynamic time warping. Pattern Recognit 74:171–184. https://doi.org/10.1016/j.patcog.2017.09.020
Article Google Scholar
Zheng Y, Liu Q, Chen E, Ge Y, Zhao JL (2014) Time series classification using multi-channels deep convolutional neural networks. In: International conference on web-age information management. Springer, pp 298–310

Download references

Acknowledgements

The authors thank all the data donors of time series datasets, and the anonymous reviewers for their insightful comments and suggestions. This work is supported by Beijing Municipal Natural Science Foundation (No. 4214067) and National Natural Science Foundation of China (No. 61771058, 61702030).

Author information

Authors and Affiliations

School of Computer and Information Technology, Beijing Jiaotong University, Beijing, 100044, China
Jidong Yuan, Zhihai Wang & Haiyang Liu
Beijing Jingdong 360 Degree E-Commerce Co., Ltd., Beijing, China
Mohan Shi
Department of Computer Science, The University of Hong Kong, Pok Fu Lam, Hong Kong
Jinyang Li

Authors

Jidong Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Mohan Shi
View author publications
You can also search for this author in PubMed Google Scholar
Zhihai Wang
View author publications
You can also search for this author in PubMed Google Scholar
Haiyang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jinyang Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jidong Yuan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

The time complexity of RPSF is mostly determined by the number of instances in the training set N, and the number of candidate shapelets r. In order to achieve the accuracy of relatively large datasets, we decrease the percentage of candidate shapelets to 0.001, the sampling size to N/2. The corresponding results are shown in Table 4.

Table 4 Accuracy of RPSF on relatively large datasets

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yuan, J., Shi, M., Wang, Z. et al. Random pairwise shapelets forest: an effective classifier for time series. Knowl Inf Syst 64, 143–174 (2022). https://doi.org/10.1007/s10115-021-01630-z

Download citation

Received: 15 April 2019
Revised: 15 November 2021
Accepted: 20 November 2021
Published: 10 January 2022
Issue Date: January 2022
DOI: https://doi.org/10.1007/s10115-021-01630-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Random pairwise shapelets forest: an effective classifier for time series

Abstract

Access this article

Similar content being viewed by others

Random Pairwise Shapelets Forest

Generalized random shapelet forests

Early Random Shapelet Forest

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Random pairwise shapelets forest: an effective classifier for time series

Abstract

Access this article

Similar content being viewed by others

Random Pairwise Shapelets Forest

Generalized random shapelet forests

Early Random Shapelet Forest

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation