Fast Time Series Classification with Random Symbolic Subsequences

Nguyen, Thach Le; Ifrim, Georgiana

doi:10.1007/978-3-031-24378-3_4

Thach Le Nguyen¹³ &
Georgiana Ifrim¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13812))

Included in the following conference series:

International Workshop on Advanced Analytics and Learning on Temporal Data

424 Accesses
7 Citations

Abstract

Symbolic representations of time series have proven to be effective for time series classification, with many recent approaches including BOSS, WEASEL, and MrSEQL. These classifiers use various elaborate methods to select discriminative features from symbolic representations of time series. As a result, although they have competitive results regarding accuracy, their classification models are relatively expensive to train. Most if not all of these approaches have missed an important research question: are these elaborate feature selection methods actually necessary? ROCKET, a state-of-the-art time series classifier, outperforms all of them without utilizing any feature selection techniques. In this paper, we answer this question by contrasting these classifiers with a very simple method, named MrSQM. This method samples random subsequences from symbolic representations of time series. Our experiments on 112 datasets of the UEA/UCR benchmark demonstrate that MrSQM can quickly extract useful features and learn accurate classifiers with the logistic regression algorithm. MrSQM completes training and prediction on 112 datasets in 1.5 h for an accuracy comparable to existing efficient state-of-the-art methods, e.g., MrSEQL (10 h) and ROCKET (2.5 h). Furthermore, MrSQM enables the user to trade-off accuracy and speed by controlling the type and number of symbolic representations, thus further reducing the total runtime to 20 min for a similar level of accuracy. With these results, we show that random subsequences extracted from symbolic transformations can be as effective as the more sophisticated and expensive feature selection methods proposed in previous works. We propose MrSQM as a strong baseline for future research in time series classification, especially for approaches based on symbolic representations of time series.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/mlgig/mrsqm.
2.
FFTW is an open source C library for efficiently computing the Discrete Fourier Transform (DFT): https://www.fftw.org.
3.
https://github.com/mlgig/mrsqm.
4.
https://timeseriesclassification.com.
5.
https://www.sktime.org/en/stable/get_started.html.
6.
https://github.com/b0rxa/scmamp.

References

Bagnall, A., et al.: The UEA multivariate time series classification archive. arXiv preprint arXiv:1811.00075 (2018)
Bagnall, A., Flynn, M., Large, J., Lines, J., Middlehurst, M.: On the usage and performance of the hierarchical vote collective of transformation-based ensembles version 1.0 (HIVE-COTE v1.0). In: Lemaire, V., Malinowski, S., Bagnall, A., Guyet, T., Tavenard, R., Ifrim, G. (eds.) AALTD 2020. LNCS (LNAI), vol. 12588, pp. 3–18. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-65742-0_1
Chapter Google Scholar
Bagnall, A., Lines, J., Bostrom, A., Large, J., Keogh, E.: The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min. Knowl. Disc. 31(3), 606–660 (2016). https://doi.org/10.1007/s10618-016-0483-9
Article MathSciNet Google Scholar
Benavoli, A., Corani, G., Mangili, F.: Should we really use post-hoc tests based on mean-ranks? J. Mach. Learn. Res. 17(5), 1–10 (2016). http://jmlr.org/papers/v17/benavoli16a.html
Calvo, B., Santafé, G.: scmamp: statistical comparison of multiple algorithms in multiple problems. R J. 8(1), 248–256 (2016). https://doi.org/10.32614/RJ-2016-017
Article Google Scholar
Dempster, A., Petitjean, F., Webb, G.I.: ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels. Data Min. Knowl. Disc. 34(5), 1454–1495 (2020). https://doi.org/10.1007/s10618-020-00701-z
Article MathSciNet MATH Google Scholar
Dempster, A., Schmidt, D.F., Webb, G.I.: MINIROCKET: a very fast (almost) deterministic transform for time series classification. In: Zhu, F., Ooi, B.C., Miao, C. (eds.) KDD 2021: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, Singapore, 14–18 August 2021, pp. 248–257. ACM (2021). https://doi.org/10.1145/3447548.3467231
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006). http://dl.acm.org/citation.cfm?id=1248547.1248548
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
MATH Google Scholar
Ismail Fawaz, H., et al.: InceptionTime: finding AlexNet for time series classification. Data Min. Knowl. Disc. 34(6), 1936–1962 (2020). https://doi.org/10.1007/s10618-020-00710-y
Article MathSciNet Google Scholar
Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. Proc. IEEE 93(2), 216–231 (2005). Special issue on “Program Generation, Optimization, and Platform Adaptation”
Google Scholar
Frigo, M., Johnson, S.G.: Fastest Fourier transform in the west (2021). https://www.fftw.org
Garcia, S., Herrera, F.: An extension on “statistical comparisons of classifiers over multiple data sets’’ for all pairwise comparisons. J. Mach. Learn. Res. 9, 2677–2694 (2008)
MATH Google Scholar
Grabocka, J., Schilling, N., Wistuba, M., Schmidt-Thieme, L.: Learning time-series shapelets. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014, pp. 392–401. ACM, New York (2014). https://doi.org/10.1145/2623330.2623613, http://doi.acm.org/10.1145/2623330.2623613
Ifrim, G., Wiuf, C.: Bounded coordinate-descent for biological sequence classification in high dimensional predictor space. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2011, pp. 708–716. ACM, New York (2011). https://doi.org/10.1145/2020408.2020519, http://doi.acm.org/10.1145/2020408.2020519
Ismail Fawaz, H., Forestier, G., Weber, J., Idoumghar, L., Muller, P.-A.: Deep learning for time series classification: a review. Data Min. Knowl. Disc. 33(4), 917–963 (2019). https://doi.org/10.1007/s10618-019-00619-1
Article MathSciNet MATH Google Scholar
Le Nguyen, T., Gsponer, S., Ilie, I., O’Reilly, M., Ifrim, G.: Interpretable time series classification using linear models and multi-resolution multi-domain symbolic representations. Data Min. Knowl. Disc. 33(4), 1183–1222 (2019). https://doi.org/10.1007/s10618-019-00633-3
Article MathSciNet MATH Google Scholar
Lin, J., Keogh, E., Wei, L., Lonardi, S.: Experiencing SAX: a novel symbolic representation of time series. Data Min. Knowl. Discov. 15(2), 107–144 (2007). https://doi.org/10.1007/s10618-007-0064-z
Lin, J., Khade, R., Li, Y.: Rotation-invariant similarity in time series using bag-of-patterns representation. J. Intell. Inf. Syst. 39(2), 287–315 (2012). https://doi.org/10.1007/s10844-012-0196-5
Lines, J., Taylor, S., Bagnall, A.: HIVE-COTE: the hierarchical vote collective of transformation-based ensembles for time series classification. In: 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 1041–1046 (2016). https://doi.org/10.1109/ICDM.2016.0133
Middlehurst, M., Large, J., Flynn, M., Lines, J., Bostrom, A., Bagnall, A.J.: HIVE-COTE 2.0: a new meta ensemble for time series classification. Mach. Learn. 110(11), 3211–3243 (2021). https://doi.org/10.1007/s10994-021-06057-9
Article MathSciNet MATH Google Scholar
Middlehurst, M., Vickers, W., Bagnall, A.: Scalable dictionary classifiers for time series classification. In: Yin, H., Camacho, D., Tino, P., Tallón-Ballesteros, A.J., Menezes, R., Allmendinger, R. (eds.) IDEAL 2019. LNCS, vol. 11871, pp. 11–19. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33607-3_2
Chapter Google Scholar
Rakthanmanon, T., Keogh, E.: Fast shapelets: a scalable algorithm for discovering time series shapelets. In: Proceedings of the thirteenth SIAM conference on data mining (SDM), pp. 668–676. SIAM (2013)
Google Scholar
Schäfer, P.: The boss is concerned with time series classification in the presence of noise. Data Min. Knowl. Disc. 29(6), 1505–1530 (2015)
Article MathSciNet MATH Google Scholar
Schäfer, P.: Scalable time series classification. Data Min. Knowl. Disc. 30(5), 1273–1298 (2015). https://doi.org/10.1007/s10618-015-0441-y
Article MathSciNet MATH Google Scholar
Schäfer, P., Högqvist, M.: SFA: a symbolic Fourier approximation and index for similarity search in high dimensional datasets. In: Proceedings of the 15th International Conference on Extending Database Technology, EDBT 2012, pp. 516–527. ACM, New York (2012). https://doi.org/10.1145/2247596.2247656, http://doi.acm.org/10.1145/2247596.2247656
Schäfer, P., Leser, U.: Fast and accurate time series classification with weasel. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM 2017, pp. 637–646. ACM, New York (2017). https://doi.org/10.1145/3132847.3132980, http://doi.acm.org/10.1145/3132847.3132980
Senin, P., Malinchik, S.: SAX-VSM: interpretable time series classification using sax and vector space model. In: 2013 IEEE 13th International Conference on Data Mining (ICDM), pp. 1175–1180 (2013). https://doi.org/10.1109/ICDM.2013.52
Shifaz, A., Pelletier, C., Petitjean, F., Webb, G.: TS-CHIEF: a scalable and accurate forest algorithm for time series classification. Data Min. Knowl. Disc. 34, 742–775 (2020)
Article MathSciNet MATH Google Scholar
Ye, L., Keogh, E.: Time series shapelets: a new primitive for data mining. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 947–956. ACM (2009)
Google Scholar

Download references

Acknowledgments

This publication has emanated from research supported in part by a grant from Science Foundation Ireland through the VistaMilk SFI Research Centre (SFI/16/RC/3835) and the Insight Centre for Data Analytics (12/RC/2289_P2).

Author information

Authors and Affiliations

School of Computer Science, University College Dublin, Dublin, Ireland
Thach Le Nguyen & Georgiana Ifrim

Authors

Thach Le Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Georgiana Ifrim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thach Le Nguyen .

Editor information

Editors and Affiliations

Inria Grenoble - Rhône-Alpes Research Centre, Villeurbanne, France
Thomas Guyet
University College Dublin, Dublin, Ireland
Georgiana Ifrim
University of Rennes, Rennes, France
Simon Malinowski
University of East Anglia, Norwich, UK
Anthony Bagnall
University of Rennes, Rennes, France
Patrick Shafer
Orange Labs, Lannion, France
Vincent Lemaire

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, T.L., Ifrim, G. (2023). Fast Time Series Classification with Random Symbolic Subsequences. In: Guyet, T., Ifrim, G., Malinowski, S., Bagnall, A., Shafer, P., Lemaire, V. (eds) Advanced Analytics and Learning on Temporal Data. AALTD 2022. Lecture Notes in Computer Science(), vol 13812. Springer, Cham. https://doi.org/10.1007/978-3-031-24378-3_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-24378-3_4
Published: 04 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24377-6
Online ISBN: 978-3-031-24378-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Fast Time Series Classification with Random Symbolic Subsequences