Abstract
Time-series classification is an important problem for the data mining community due to the wide range of application domains involving time-series data. A recent paradigm, called shapelets, represents patterns that are highly predictive for the target variable. Shapelets are discovered by measuring the prediction accuracy of a set of potential (shapelet) candidates. The candidates typically consist of all the segments of a dataset; therefore, the discovery of shapelets is computationally expensive. This paper proposes a novel method that avoids measuring the prediction accuracy of similar candidates in Euclidean distance space, through an online clustering/pruning technique. In addition, our algorithm incorporates a supervised shapelet selection that filters out only those candidates that improve classification accuracy. Empirical evidence on 45 univariate datasets from the UCR collection demonstrates that our method is 3–4 orders of magnitudes faster than the fastest existing shapelet discovery method, while providing better prediction accuracy. In addition, we extended our method to multivariate time-series data. Runtime results over four real-life multivariate datasets indicate that our method can classify MB-scale data in a matter of seconds and GB-scale data in a matter of minutes. The achievements do not compromise quality; on the contrary, our method is even superior to the multivariate baseline in terms of classification accuracy.









Similar content being viewed by others
References
Allan J, Papka R, Lavrenko V (1998) On-line new event detection and tracking. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval (SIGIR ’98). ACM, New York, NY, USA, pp 37–45
Banos O, Garcia R, Holgado-Terriza J, Damas M, Pomares H, Rojas I, Saez A, Villalonga C (2014) mhealthdroid: a novel framework for agile development of mobile health applications. In: Pecchia L, Chen L, Nugent C, Bravo J, (eds) Ambient assisted living and daily activities, vol 8868 of lecture notes in computer science. Springer, New York, pp 91–98
Banos O, Toth MA, Damas M, Pomares H, Rojas I (2014) Dealing with the effects of sensor displacement in wearable activity recognition. Sensors 14(6):9995–10023
Bruno B, Mastrogiovanni F, Sgorbissa A, Vernazza T, Zaccaria R (2013) Analysis of human behavior recognition algorithms based on acceleration data. In: IEEE international conference on robotics and automation (ICRA), pp 1602–1607
Cetin MS, Mueen A, Calhoun VD (2015) Shapelet ensemble for multi-dimensional time series. In: SDM
Chakrabarti K, Keogh E, Mehrotra S, Pazzani M (2002) Locally adaptive dimensionality reduction for indexing large time series databases. ACM Trans Database Syst 27(2):188–228
Chang K-W, Deka B, Hwu W-M W, Roth D (2012) Efficient pattern-based time series classification on gpu. In: Proceedings of the 12th IEEE international conference on data mining
Ghalwash M, Obradovic Z (2012) Early classification of multivariate temporal observations by extraction of interpretable shapelets. BMC Bioinform. doi:10.1186/1471-2105-13-195
Grabocka J, Schilling N, Wistuba M, Schmidt-Thieme L (2014) Learning time-series shapelets. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’14). ACM, New York, NY, USA, pp 392–401. doi:10.1145/2623330.2623613
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Hartmann B, Link N (2010) Gesture recognition with inertial sensors and optimized DTW prototypes. In: IEEE international conference on systems man and cybernetics
Hartmann B, Schwab I, Link N (2010) Prototype optimization for temporarily and spatially distorted time series. In: The AAAI spring symposia
He Q, Zhuang F, Shang T, Shi Z et al (2012) Fast time series classification based on infrequent shapelets. In: 11th IEEE international conference on machine learning and applications
Hills J, Lines J, Baranauskas E, Mapp J, Bagnall A (2014) Classification of time series by shapelet transformation. Data Min Knowl Discov 28(4):851–881. doi:10.1007/s10618-013-0322-1
Keogh E, Zhu Q, Hu B, Y, H, Xi X, Wei L, Ratanamahatana CA (2011) The UCR time series classification/clustering. www.cs.ucr.edu/~eamonn/time_series_data/
Lines J, Bagnall A (2012) Alternative quality measures for time series shapelets. In: Yin, Hujun, Costa, José AF, Barreto, Guilherme (eds) Intelligent data engineering and automated learning. Lecture notes in computer science, vol 7435. Springer, Heidelberg pp 475–483
Mueen A, Keogh E, Young N (2011) Logical-shapelets: an expressive primitive for time series classification. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining
Rakthanmanon T, Keogh E (2013) Fast shapelets: a scalable algorithm for discovering time series shapelets. In: Proceedings of the 13th SIAM international conference on data mining
Sivakumar P, Shajina T (2012) Human gait recognition and classification using time series shapelets. In: IEEE international conference on advances in computing and communications
Williams B, Toussaint M, Storkey A (2006) Extracting motion primitives from natural handwriting data. In: Kollias S, Stafylopatis A, Duch W, Oja E (eds) Artificial neural networks ICANN 2006, vol 4132. Lecture notes in computer science. Springer, Berlin, pp 634–643
Xing Z, Pei J, Yu P (2012) Early classification on time series. Knowl Inf Syst 31(1):105–127
Xing Z, Pei J, Yu P, Wang K (2011) Extracting interpretable features for early classification on time series. In: Proceedings of the 11th SIAM international conference on data mining
Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining
Ye L, Keogh E (2011) Time series shapelets: a novel technique that allows accurate, interpretable and fast classification. Data Min Knowl Discov 22(1):149–182
Zakaria J, Mueen A, Keogh E (2012) Clustering time series using unsupervised-shapelets. In: Proceedings of the 12th IEEE international conference on data mining
Acknowledgments
This study was partially co-funded by the Seventh Framework Programme (FP7) of the European Commission, through Project REDUCTION (www.reduction-project.eu) (# 288254).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Grabocka, J., Wistuba, M. & Schmidt-Thieme, L. Fast classification of univariate and multivariate time series through shapelet discovery. Knowl Inf Syst 49, 429–454 (2016). https://doi.org/10.1007/s10115-015-0905-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-015-0905-9