Highly Scalable Time Series Classification for Very Large Datasets

Dempster, Angus; Tan, Chang Wei; Miller, Lynn; Foumani, Navid Mohammadi; Schmidt, Daniel F.; Webb, Geoffrey I.

doi:10.1007/978-3-031-77066-1_5

Angus Dempster¹⁴,
Chang Wei Tan¹⁴,
Lynn Miller¹⁴,
Navid Mohammadi Foumani¹⁴,
Daniel F. Schmidt¹⁴ &
…
Geoffrey I. Webb¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 15433))

Included in the following conference series:

International Workshop on Advanced Analytics and Learning on Temporal Data

119 Accesses

Abstract

Relatively little work in the field of time series classification focuses on learning effectively from very large quantities of data. Large datasets present significant practical challenges in terms of computational cost and memory complexity. We present strategies for extending two recent state-of-the-art methods for time series classification—namely, Hydra and Quant—to very large datasets. This allows for training these methods on large quantities of data with a fixed memory cost, while making effective use of appropriate computational resources. For Hydra, we fit a ridge regression classifier iteratively, using a single pass through the data, integrating the Hydra transform with the process of fitting the ridge regression model, allowing for a fixed memory cost, and allowing almost all computation to be performed on GPU. For Quant, we ‘spread’ subsets of extremely randomised trees over a given dataset such that each tree is trained using as much data as possible for a given amount of memory while minimising reads from the data, allowing for a simple tradeoff between error and computational cost. This allows for the straightforward application of both methods to very large quantities of data. We demonstrate these approaches with results (including learning curves) on a selection of large datasets with between approximately 85, 000 and 47 million training examples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances

Article Open access 23 November 2016

Fast classification of univariate and multivariate time series through shapelet discovery

Article 12 December 2015

A Hybrid Approach to Time Series Classification with Shapelets

References

Bagnall, A., et al.: The UEA multivariate time series classification archive. arXiv:1811.00075 (2018)
Bagnall, A., Lines, J., Bostrom, A., Large, J., Keogh, E.: The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min. Knowl. Disc. 31(3), 606–660 (2017)
Article MathSciNet MATH Google Scholar
Brain, D., Webb, G.I.: The need for low bias algorithms in classification learning from large data sets. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) Principles of Data Mining and Knowledge Discovery, pp. 62–73. Springer, Berlin (2002)
Chapter MATH Google Scholar
Breiman, L.: Pasting small votes for classification in large databases and on-line. Mach. Learn. 36(1), 85–103 (1999)
Article MATH Google Scholar
Cabello, N., Naghizade, E., Qi, J., Kulik, L.: Fast, accurate and explainable time series classification through randomization. Data Min. Knowl. Discov. (2023)
Google Scholar
City of Melbourne: Pedestrian counting system (2022). https://data.melbourne.vic.gov.au/explore/dataset/pedestrian-counting-system-monthly-counts-per-hour/information/. CC BY 4.0
Dau, H.A., et al.: The UCR time series archive. IEEE/CAA J. Autom. Sinica 6(6), 1293–1305 (2019)
Article MATH Google Scholar
Dempster, A., Petitjean, F., Webb, G.I.: ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels. Data Min. Knowl. Disc. 34(5), 1454–1495 (2020)
Article MathSciNet MATH Google Scholar
Dempster, A., Schmidt, D.F., Webb, G.I.: MiniRocket: a very fast (almost) deterministic transform for time series classification. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 248–257. ACM, New York (2021)
Google Scholar
Dempster, A., Schmidt, D.F., Webb, G.I.: Hydra: competing convolutional kernels for fast and accurate time series classification. Data Min. Knowl. Discov. (2023)
Google Scholar
Dempster, A., Schmidt, D.F., Webb, G.I.: Quant: a minimalist interval method for time series classification. Data Min. Knowl. Discov. (2024)
Google Scholar
Fanioudakis, E., Geismar, M., Potamitis, I.: Mosquito wingbeat analysis and classification using deep learning. In: 26th European Signal Processing Conference, pp. 2410–2414 (2018)
Google Scholar
Garnot, V.S.F., Landrieu, L., Giordano, S., Chehata, N.: Satellite image time series classification with pixel-set encoders and temporal self-attention (2020)
Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2009)
Book MATH Google Scholar
Hooker, S.: The hardware lottery. Commun. ACM 64(12), 58–65 (2021)
Article MATH Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of Machine Learning Research, pp. 448–456 (2015)
Google Scholar
Ismail-Fawaz, A., Devanne, M., Weber, J., Forestier, G.: Deep learning for time series classification using new hand-crafted convolution filters. In: IEEE International Conference on Big Data, pp. 972–981 (2022)
Google Scholar
Ismail Fawaz, H., et al.: InceptionTime: finding AlexNet for time series classification. Data Min. Knowl. Disc. 34(6), 1936–1962 (2020). https://doi.org/10.1007/s10618-020-00710-y
Article MathSciNet MATH Google Scholar
Louppe, G.: Understanding random forests: from theory to practice. Ph.D. thesis, University of Liège (2014). arXiv:2305.11921
Louppe, G., Geurts, P.: Ensembles on random patches. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) Machine Learning and Knowledge Discovery in Databases, pp. 346–361. Springer, Berlin (2012)
Chapter Google Scholar
Middlehurst, M., Large, J., Bagnall, A.: The canonical interval forest (CIF) classifier for time series classification. In: IEEE International Conference on Big Data, pp. 188–195 (2020)
Google Scholar
Middlehurst, M., Large, J., Flynn, M., Lines, J., Bostrom, A., Bagnall, A.: HIVE-COTE 2.0: a new meta ensemble for time series classification. Mach. Learn. 110, 3211–3243 (2021)
Google Scholar
Middlehurst, M., Schäfer, P., Bagnall, A.: Bake off redux: a review and experimental evaluation of recent time series classification algorithms. Data Min. Knowl. Discov. (2024)
Google Scholar
Miller, B.S., et al.: An open access dataset for developing automated detectors of Antarctic baleen whale sounds and performance evaluation of two commonly used detectors. Sci. Rep. 11 (2021)
Google Scholar
Miller, B.S., Stafford, K.M., Van Opzeeland, I., et al.: Whale sounds (2020). https://data.aad.gov.au/metadata/AcousticTrends_BlueFinLibrary. CC BY 4.0
Sainte Fare Garnot, V., Landrieu, L.: S2Agri pixel set (2022). https://zenodo.org/records/5815488. CC BY 4.0
Schäfer, P., Leser, U.: WEASEL 2.0: a random dilated dictionary transform for fast, accurate and memory constrained time series classification. Mach. Learn. 112(12), 4763–4788 (2023)
Google Scholar
Sutton, R.: The bitter lesson (2019). http://www.incompleteideas.net/IncIdeas/BitterLesson.html
Tan, C.W., Dempster, A., Bergmeir, C., Webb, G.I.: MultiRocket: multiple pooling operators and transformations for fast and effective time series classification. Data Min. Knowl. Disc. 36(5), 1623–1646 (2022)
Article MathSciNet MATH Google Scholar
Tew, S., Boley, M., Schmidt, D.F.: Bayes beats cross validation: efficient and accurate ridge regression via expectation maximization. In: 37th Conference on Neural Information Processing Systems (2023)
Google Scholar
The aeon Developers: aeon (2024). https://github.com/aeon-toolkit/aeon
Transport for NSW: NSW road traffic volume counts hourly (2023). https://opendata.dev.transport.nsw.gov.au/dataset/nsw-roads-traffic-volume-counts-api/resource/bca06c7e-30be-4a90-bc8b-c67428c0823a. CC BY 4.0

Download references

This work was supported by the Australian Research Council under award DP240100048.

Author information

Authors and Affiliations

Monash University, Melbourne, Australia
Angus Dempster, Chang Wei Tan, Lynn Miller, Navid Mohammadi Foumani, Daniel F. Schmidt & Geoffrey I. Webb

Authors

Angus Dempster
View author publications
You can also search for this author in PubMed Google Scholar
Chang Wei Tan
View author publications
You can also search for this author in PubMed Google Scholar
Lynn Miller
View author publications
You can also search for this author in PubMed Google Scholar
Navid Mohammadi Foumani
View author publications
You can also search for this author in PubMed Google Scholar
Daniel F. Schmidt
View author publications
You can also search for this author in PubMed Google Scholar
Geoffrey I. Webb
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Angus Dempster .

Editor information

Editors and Affiliations

Orange Labs, Lannion, France
Vincent Lemaire
School of Computer Science, University College Dublin, Dublin, Ireland
Georgiana Ifrim
School of Electronics and Computer Science, University of Southampton, Southampton, UK
Anthony Bagnall
Université Claude Bernard Lyon 1, Villeurbanne, France
Thomas Guyet
University of Rennes, Rennes Cedex, France
Simon Malinowski
Department of Computer Science, Humboldt University of Berlin, Berlin, Germany
Patrick Schäfer
Université de Rennes 2, Rennes, France
Romain Tavenard

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dempster, A., Tan, C.W., Miller, L., Foumani, N.M., Schmidt, D.F., Webb, G.I. (2025). Highly Scalable Time Series Classification for Very Large Datasets. In: Lemaire, V., et al. Advanced Analytics and Learning on Temporal Data. AALTD 2024. Lecture Notes in Computer Science(), vol 15433. Springer, Cham. https://doi.org/10.1007/978-3-031-77066-1_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-77066-1_5
Published: 01 January 2025
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-77065-4
Online ISBN: 978-3-031-77066-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Highly Scalable Time Series Classification for Very Large Datasets