Abstract
Time series classification tries to mimic the human understanding of similarity. When it comes to long or larger time series datasets, state-of-the-art classifiers reach their limits because of unreasonably high training or testing times. One representative example is the 1-nearest-neighbor dynamic time warping classifier (1-NN DTW) that is commonly used as the benchmark to compare to. It has several shortcomings: it has a quadratic time complexity in the time series length and its accuracy degenerates in the presence of noise. To reduce the computational complexity, early abandoning techniques, cascading lower bounds, or recently, a nearest centroid classifier have been introduced. Still, classification times on datasets of a few thousand time series are in the order of hours. We present our Bag-Of-SFA-Symbols in Vector Space classifier that is accurate, fast and robust to noise. We show that it is significantly more accurate than 1-NN DTW while being multiple orders of magnitude faster. Its low computational complexity combined with its good classification accuracy makes it relevant for use cases like long or large amounts of time series or real-time analytics.
Similar content being viewed by others
Notes
UCR Time Series Classification/Clustering Homepage. http://www.cs.ucr.edu/~eamonn/time_series_data (2014).
The BOSS in Vector Space Results. http://www.zib.de/patrick.schaefer/bossVS/ (2015).
References
Aucouturier JJ, Defreville B, Pachet F (2007) The bag-of-frames approach to audio pattern recognition: a sufficient model for urban soundscapes but not for polyphonic music. J Acoust Soc Am 122(2):881–891
Bagnall A, Lines J, Hills J, Bostrom A (2015) Time-series classification with COTE: the collective of transformation-based ensembles. IEEE Trans Knowl Data Eng 27(9):2522–2535
Bagnall A, Davis LM, Hills J, Lines J (2012) Transformation based ensembles for time series classification. In: Proceedings of the 2012 SIAM international conference on data mining, vol 12. SIAM, pp 307–318
Bagnall A, Lines J (2014) An experimental evaluation of nearest neighbour time series classification. arXiv:1406.4757
Batista G, Wang X, Keogh EJ (2011) A complexity-invariant distance measure for time series. In: Proceedings of the 2011 SIAM international conference on data mining. SIAM, pp 699–710
Baydogan MG, Runger G, Tuv E (2013) A bag-of-features framework to classify time series. IEEE Trans Pattern Anal Mach Intell 35(11):2796–2802
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: experimental comparison of representationsx and distance measures. In: Proceedings of the VLDB endowment. Number 2, VLDB Endowment, pp 1542–1552
Esling P, Agon C (2012) Time-series data mining. ACM Comput Surv 45(1):12:1–12:34
Grabocka J, Schilling N, Wistuba M, Schmidt-Thieme L (2014) Learning time-series shapelets. In: Proceedings of the 2014 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM 392–401
Hu B, Chen Y, Keogh E (2013) Time series classification under more realistic assumptions. In: Proceedings of the 2013 SIAM international conference on data mining. SIAM, pp 578–586
Jerzak Z, Ziekow H (2014) The DEBS 2014 grand challenge. In: Proceedings of the 2014 ACM international conference on distributed event-based systems. ACM, pp 266–269
Kumar N, Lolla VN, Keogh EJ, Lonardi S, Ratanamahatana CA (2005) Time-series bitmaps: a practical visualization tool for working with large time series databases. In: Proceedings of the 2005 SIAM international conference on data mining. SIAM, pp 531–535
Lin J, Keogh EJ, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Discov 15(2):107–144
Lin J, Khade R, Li Y (2012) Rotation-invariant similarity in time series using bag-of-patterns representation. J Intell Inf Syst 39(2):287–315
Lines J, Bagnall A (2014) Time series classification with ensembles of elastic distance measures. Data Min Knowl Discov 29(3):565–592
Mueen A, Keogh EJ, Young N (2011) Logical-shapelets: an expressive primitive for time series classification. In: Proceedings of the 2011 ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1154–1162
Mutschler C, Ziekow H, Jerzak Z (2013) The DEBS 2013 grand challenge. In: Proceedings of the 2013 ACM international conference on distributed event-based systems. ACM, pp 289–294
Petitjean F, Forestier G, Webb GI, Nicholson AE, Chen Y, Keogh E (2014) Dynamic Time Warping averaging of time series allows faster and more accurate classification. In: Proceedings of the 2014 IEEE international conference on data mining, IEEE, pp 470–479
Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 2012 ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 262–270
Rakthanmanon T, Keogh E (2013) Fast shapelets: a scalable algorithm for discovering time series shapelets. In: Proceedings of the 2013 SIAM international conference on data mining. SIAM
Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620
Schäfer P (2014) Towards time series classification without human preprocessing. In: Machine learning and data mining in pattern recognition. Springer, Berlin, pp 228–242
Schäfer P (2015) The BOSS is concerned with time series classification in the presence of noise. Data Min Knowl Discov 29(6):1505–1530
Schäfer P, Högqvist M (2012) SFA: a symbolic fourier approximation and index for similarity search in high dimensional datasets. In: Proceedings of the 2012 international conference on extending database technology. ACM, pp 516–527
Senin P, Malinchik S (2013) SAX-VSM: Interpretable time series classification using SAX and vector space model. In: Proceedings of the 2013 IEEE international conference on data mining. IEEE, pp 1175–1180
Urbanski J, Weber M (2012) Big Data im Praxiseinsatz–Szenarien, Beispiele, Effekte. http://www.bitkom.org/files/documents/BITKOM_LF_big_data_2012_online(1)
Acknowledgments
The author would like to thank Claudia Eichert-Schäfer, Florian Schintke, the anonymous reviewers and the owners of the datasets.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Funding
This project was motived and partially funded by the German Federal Ministry of Education and Research through the project “Berlin Big Data Center (BBDC)”, Funding mark: 01IS14013A.
Conflict of Interest
The author P. Schäfer received research grants from this project.
Additional information
Responsible editor: Joao Gama, Indre Zliobaite, Alipio Jorge and Concha Bielza.
Rights and permissions
About this article
Cite this article
Schäfer, P. Scalable time series classification. Data Min Knowl Disc 30, 1273–1298 (2016). https://doi.org/10.1007/s10618-015-0441-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-015-0441-y