Skip to main content

Highly Scalable Time Series Classification for Very Large Datasets

  • Conference paper
  • First Online:
Advanced Analytics and Learning on Temporal Data (AALTD 2024)

Abstract

Relatively little work in the field of time series classification focuses on learning effectively from very large quantities of data. Large datasets present significant practical challenges in terms of computational cost and memory complexity.  We present strategies for extending two recent state-of-the-art methods for time series classification—namely, Hydra and Quant—to very large datasets. This allows for training these methods on large quantities of data with a fixed memory cost, while making effective use of appropriate computational resources. For Hydra, we fit a ridge regression classifier iteratively, using a single pass through the data, integrating the Hydra transform with the process of fitting the ridge regression model, allowing for a fixed memory cost, and allowing almost all computation to be performed on GPU. For Quant, we ‘spread’ subsets of extremely randomised trees over a given dataset such that each tree is trained using as much data as possible for a given amount of memory while minimising reads from the data, allowing for a simple tradeoff between error and computational cost. This allows for the straightforward application of both methods to very large quantities of data.  We demonstrate these approaches with results (including learning curves) on a selection of large datasets with between approximately 85, 000 and 47 million training examples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bagnall, A., et al.: The UEA multivariate time series classification archive. arXiv:1811.00075 (2018)

  2. Bagnall, A., Lines, J., Bostrom, A., Large, J., Keogh, E.: The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min. Knowl. Disc. 31(3), 606–660 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  3. Brain, D., Webb, G.I.: The need for low bias algorithms in classification learning from large data sets. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) Principles of Data Mining and Knowledge Discovery, pp. 62–73. Springer, Berlin (2002)

    Chapter  MATH  Google Scholar 

  4. Breiman, L.: Pasting small votes for classification in large databases and on-line. Mach. Learn. 36(1), 85–103 (1999)

    Article  MATH  Google Scholar 

  5. Cabello, N., Naghizade, E., Qi, J., Kulik, L.: Fast, accurate and explainable time series classification through randomization. Data Min. Knowl. Discov. (2023)

    Google Scholar 

  6. City of Melbourne: Pedestrian counting system (2022). https://data.melbourne.vic.gov.au/explore/dataset/pedestrian-counting-system-monthly-counts-per-hour/information/. CC BY 4.0

  7. Dau, H.A., et al.: The UCR time series archive. IEEE/CAA J. Autom. Sinica 6(6), 1293–1305 (2019)

    Article  MATH  Google Scholar 

  8. Dempster, A., Petitjean, F., Webb, G.I.: ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels. Data Min. Knowl. Disc. 34(5), 1454–1495 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  9. Dempster, A., Schmidt, D.F., Webb, G.I.: MiniRocket: a very fast (almost) deterministic transform for time series classification. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 248–257. ACM, New York (2021)

    Google Scholar 

  10. Dempster, A., Schmidt, D.F., Webb, G.I.: Hydra: competing convolutional kernels for fast and accurate time series classification. Data Min. Knowl. Discov. (2023)

    Google Scholar 

  11. Dempster, A., Schmidt, D.F., Webb, G.I.: Quant: a minimalist interval method for time series classification. Data Min. Knowl. Discov. (2024)

    Google Scholar 

  12. Fanioudakis, E., Geismar, M., Potamitis, I.: Mosquito wingbeat analysis and classification using deep learning. In: 26th European Signal Processing Conference, pp. 2410–2414 (2018)

    Google Scholar 

  13. Garnot, V.S.F., Landrieu, L., Giordano, S., Chehata, N.: Satellite image time series classification with pixel-set encoders and temporal self-attention (2020)

    Google Scholar 

  14. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2009)

    Book  MATH  Google Scholar 

  15. Hooker, S.: The hardware lottery. Commun. ACM 64(12), 58–65 (2021)

    Article  MATH  Google Scholar 

  16. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of Machine Learning Research, pp. 448–456 (2015)

    Google Scholar 

  17. Ismail-Fawaz, A., Devanne, M., Weber, J., Forestier, G.: Deep learning for time series classification using new hand-crafted convolution filters. In: IEEE International Conference on Big Data, pp. 972–981 (2022)

    Google Scholar 

  18. Ismail Fawaz, H., et al.: InceptionTime: finding AlexNet for time series classification. Data Min. Knowl. Disc. 34(6), 1936–1962 (2020). https://doi.org/10.1007/s10618-020-00710-y

    Article  MathSciNet  MATH  Google Scholar 

  19. Louppe, G.: Understanding random forests: from theory to practice. Ph.D. thesis, University of Liège (2014). arXiv:2305.11921

  20. Louppe, G., Geurts, P.: Ensembles on random patches. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) Machine Learning and Knowledge Discovery in Databases, pp. 346–361. Springer, Berlin (2012)

    Chapter  Google Scholar 

  21. Middlehurst, M., Large, J., Bagnall, A.: The canonical interval forest (CIF) classifier for time series classification. In: IEEE International Conference on Big Data, pp. 188–195 (2020)

    Google Scholar 

  22. Middlehurst, M., Large, J., Flynn, M., Lines, J., Bostrom, A., Bagnall, A.: HIVE-COTE 2.0: a new meta ensemble for time series classification. Mach. Learn. 110, 3211–3243 (2021)

    Google Scholar 

  23. Middlehurst, M., Schäfer, P., Bagnall, A.: Bake off redux: a review and experimental evaluation of recent time series classification algorithms. Data Min. Knowl. Discov. (2024)

    Google Scholar 

  24. Miller, B.S., et al.: An open access dataset for developing automated detectors of Antarctic baleen whale sounds and performance evaluation of two commonly used detectors. Sci. Rep. 11 (2021)

    Google Scholar 

  25. Miller, B.S., Stafford, K.M., Van Opzeeland, I., et al.: Whale sounds (2020). https://data.aad.gov.au/metadata/AcousticTrends_BlueFinLibrary. CC BY 4.0

  26. Sainte Fare Garnot, V., Landrieu, L.: S2Agri pixel set (2022). https://zenodo.org/records/5815488. CC BY 4.0

  27. Schäfer, P., Leser, U.: WEASEL 2.0: a random dilated dictionary transform for fast, accurate and memory constrained time series classification. Mach. Learn. 112(12), 4763–4788 (2023)

    Google Scholar 

  28. Sutton, R.: The bitter lesson (2019). http://www.incompleteideas.net/IncIdeas/BitterLesson.html

  29. Tan, C.W., Dempster, A., Bergmeir, C., Webb, G.I.: MultiRocket: multiple pooling operators and transformations for fast and effective time series classification. Data Min. Knowl. Disc. 36(5), 1623–1646 (2022)

    Article  MathSciNet  MATH  Google Scholar 

  30. Tew, S., Boley, M., Schmidt, D.F.: Bayes beats cross validation: efficient and accurate ridge regression via expectation maximization. In: 37th Conference on Neural Information Processing Systems (2023)

    Google Scholar 

  31. The aeon Developers: aeon (2024). https://github.com/aeon-toolkit/aeon

  32. Transport for NSW: NSW road traffic volume counts hourly (2023). https://opendata.dev.transport.nsw.gov.au/dataset/nsw-roads-traffic-volume-counts-api/resource/bca06c7e-30be-4a90-bc8b-c67428c0823a. CC BY 4.0

Download references

This work was supported by the Australian Research Council under award DP240100048.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Angus Dempster .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dempster, A., Tan, C.W., Miller, L., Foumani, N.M., Schmidt, D.F., Webb, G.I. (2025). Highly Scalable Time Series Classification for Very Large Datasets. In: Lemaire, V., et al. Advanced Analytics and Learning on Temporal Data. AALTD 2024. Lecture Notes in Computer Science(), vol 15433. Springer, Cham. https://doi.org/10.1007/978-3-031-77066-1_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-77066-1_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-77065-4

  • Online ISBN: 978-3-031-77066-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics