Skip to main content
Log in

Scalable time series classification

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Time series classification tries to mimic the human understanding of similarity. When it comes to long or larger time series datasets, state-of-the-art classifiers reach their limits because of unreasonably high training or testing times. One representative example is the 1-nearest-neighbor dynamic time warping classifier (1-NN DTW) that is commonly used as the benchmark to compare to. It has several shortcomings: it has a quadratic time complexity in the time series length and its accuracy degenerates in the presence of noise. To reduce the computational complexity, early abandoning techniques, cascading lower bounds, or recently, a nearest centroid classifier have been introduced. Still, classification times on datasets of a few thousand time series are in the order of hours. We present our Bag-Of-SFA-Symbols in Vector Space classifier that is accurate, fast and robust to noise. We show that it is significantly more accurate than 1-NN DTW while being multiple orders of magnitude faster. Its low computational complexity combined with its good classification accuracy makes it relevant for use cases like long or large amounts of time series or real-time analytics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. UCR Time Series Classification/Clustering Homepage. http://www.cs.ucr.edu/~eamonn/time_series_data (2014).

  2. The BOSS in Vector Space Results. http://www.zib.de/patrick.schaefer/bossVS/ (2015).

  3. http://www.physionet.org/physiobank/database/chfdb/ (2015).

References

  • Aucouturier JJ, Defreville B, Pachet F (2007) The bag-of-frames approach to audio pattern recognition: a sufficient model for urban soundscapes but not for polyphonic music. J Acoust Soc Am 122(2):881–891

    Article  Google Scholar 

  • Bagnall A, Lines J, Hills J, Bostrom A (2015) Time-series classification with COTE: the collective of transformation-based ensembles. IEEE Trans Knowl Data Eng 27(9):2522–2535

    Article  Google Scholar 

  • Bagnall A, Davis LM, Hills J, Lines J (2012) Transformation based ensembles for time series classification. In: Proceedings of the 2012 SIAM international conference on data mining, vol 12. SIAM, pp 307–318

  • Bagnall A, Lines J (2014) An experimental evaluation of nearest neighbour time series classification. arXiv:1406.4757

  • Batista G, Wang X, Keogh EJ (2011) A complexity-invariant distance measure for time series. In: Proceedings of the 2011 SIAM international conference on data mining. SIAM, pp 699–710

  • Baydogan MG, Runger G, Tuv E (2013) A bag-of-features framework to classify time series. IEEE Trans Pattern Anal Mach Intell 35(11):2796–2802

    Article  Google Scholar 

  • Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  • Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: experimental comparison of representationsx and distance measures. In: Proceedings of the VLDB endowment. Number 2, VLDB Endowment, pp 1542–1552

  • Esling P, Agon C (2012) Time-series data mining. ACM Comput Surv 45(1):12:1–12:34

    Article  MATH  Google Scholar 

  • Grabocka J, Schilling N, Wistuba M, Schmidt-Thieme L (2014) Learning time-series shapelets. In: Proceedings of the 2014 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM 392–401

  • Hu B, Chen Y, Keogh E (2013) Time series classification under more realistic assumptions. In: Proceedings of the 2013 SIAM international conference on data mining. SIAM, pp 578–586

  • Jerzak Z, Ziekow H (2014) The DEBS 2014 grand challenge. In: Proceedings of the 2014 ACM international conference on distributed event-based systems. ACM, pp 266–269

  • Kumar N, Lolla VN, Keogh EJ, Lonardi S, Ratanamahatana CA (2005) Time-series bitmaps: a practical visualization tool for working with large time series databases. In: Proceedings of the 2005 SIAM international conference on data mining. SIAM, pp 531–535

  • Lin J, Keogh EJ, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Discov 15(2):107–144

    Article  MathSciNet  Google Scholar 

  • Lin J, Khade R, Li Y (2012) Rotation-invariant similarity in time series using bag-of-patterns representation. J Intell Inf Syst 39(2):287–315

    Article  Google Scholar 

  • Lines J, Bagnall A (2014) Time series classification with ensembles of elastic distance measures. Data Min Knowl Discov 29(3):565–592

    Article  MathSciNet  Google Scholar 

  • Mueen A, Keogh EJ, Young N (2011) Logical-shapelets: an expressive primitive for time series classification. In: Proceedings of the 2011 ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1154–1162

  • Mutschler C, Ziekow H, Jerzak Z (2013) The DEBS 2013 grand challenge. In: Proceedings of the 2013 ACM international conference on distributed event-based systems. ACM, pp 289–294

  • Petitjean F, Forestier G, Webb GI, Nicholson AE, Chen Y, Keogh E (2014) Dynamic Time Warping averaging of time series allows faster and more accurate classification. In: Proceedings of the 2014 IEEE international conference on data mining, IEEE, pp 470–479

  • Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 2012 ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 262–270

  • Rakthanmanon T, Keogh E (2013) Fast shapelets: a scalable algorithm for discovering time series shapelets. In: Proceedings of the 2013 SIAM international conference on data mining. SIAM

  • Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620

    Article  MATH  Google Scholar 

  • Schäfer P (2014) Towards time series classification without human preprocessing. In: Machine learning and data mining in pattern recognition. Springer, Berlin, pp 228–242

  • Schäfer P (2015) The BOSS is concerned with time series classification in the presence of noise. Data Min Knowl Discov 29(6):1505–1530

    Article  MathSciNet  Google Scholar 

  • Schäfer P, Högqvist M (2012) SFA: a symbolic fourier approximation and index for similarity search in high dimensional datasets. In: Proceedings of the 2012 international conference on extending database technology. ACM, pp 516–527

  • Senin P, Malinchik S (2013) SAX-VSM: Interpretable time series classification using SAX and vector space model. In: Proceedings of the 2013 IEEE international conference on data mining. IEEE, pp 1175–1180

  • Urbanski J, Weber M (2012) Big Data im Praxiseinsatz–Szenarien, Beispiele, Effekte. http://www.bitkom.org/files/documents/BITKOM_LF_big_data_2012_online(1)

Download references

Acknowledgments

The author would like to thank Claudia Eichert-Schäfer, Florian Schintke, the anonymous reviewers and the owners of the datasets.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patrick Schäfer.

Ethics declarations

Funding

This project was motived and partially funded by the German Federal Ministry of Education and Research through the project “Berlin Big Data Center (BBDC)”, Funding mark: 01IS14013A.

Conflict of Interest

The author P. Schäfer received research grants from this project.

Additional information

Responsible editor: Joao Gama, Indre Zliobaite, Alipio Jorge and Concha Bielza.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Schäfer, P. Scalable time series classification. Data Min Knowl Disc 30, 1273–1298 (2016). https://doi.org/10.1007/s10618-015-0441-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-015-0441-y

Keywords

Navigation