Abstract
Much of the vast literature on time series classification makes several assumptions about data and the algorithm’s eventual deployment that are almost certainly unwarranted. For example, many research efforts assume that the beginning and ending points of the pattern of interest can be correctly identified, during both the training phase and later deployment. Another example is the common assumption that queries will be made at a constant rate that is known ahead of time, thus computational resources can be exactly budgeted. In this work, we argue that these assumptions are unjustified, and this has in many cases led to unwarranted optimism about the performance of the proposed algorithms. As we shall show, the task of correctly extracting individual gait cycles, heartbeats, gestures, behaviors, etc., is generally much more difficult than the task of actually classifying those patterns. Likewise, gesture classification systems deployed on a device such as Google Glass may issue queries at frequencies that range over an order of magnitude, making it difficult to plan computational resources. We propose to mitigate these problems by introducing an alignment-free time series classification framework. The framework requires only very weakly annotated data, such as “in this ten minutes of data, we see mostly normal heartbeats\(\ldots \),” and by generalizing the classic machine learning idea of data editing to streaming/continuous data, allows us to build robust, fast and accurate anytime classifiers. We demonstrate on several diverse real-world problems that beyond removing unwarranted assumptions and requiring essentially no human intervention, our framework is both extremely fast and significantly more accurate than current state-of-the-art approaches.
Similar content being viewed by others
Notes
Note that only some ECG classification systems do beat extraction then classification (Faezipour et al. 2010). Many researchers believe that robust beat extraction can be a harder problem than classification itself (cf. Figs. 1 and 2), and thus present every subsequence extracted by a sliding window for classification. This is the approach we consider in Sect. 4.2, as we assume bedside monitoring.
Note the fact that the two patterns are out of phase does not make them non-redundant, as at query time only queries half their length are used, and they are sliding across the entire length of the patterns. Details in Sect. 4.2.
Where tractability is an issue, we may sample a subset of the queries.
We defer the discussion on how to choose a query length to Sect. 6.
The reader may ask why not Dynamic Time Warping? Empirically, we tried it and it does not help. Moreover, we should not expect it to help this problem; https://sites.google.com/site/dmkdrealistic/.
We only show the rejected queries in the first case study. See Project (https://sites.google.com/site/dmkdrealistic/) for examples of rejected queries from the other case studies.
Experimental results show that the threshold distances for D built with Euclidean distance and Uniform Scaling distance are almost identical. Therefore, we only report one threshold distance.
References
Andino SLG et al (2000) Measuring the complexity of time series: an application to neurophysiological signals. Hum Brain Map 11(1):46–57
Aspelin K (2005) Establishing pedestrian walking speeds. Portland State University. www.usroads.com/journals/p/rej/9710/re971001.htm. Accessed 24 Aug 2009
Aziz W, Arif M (2006) Complexity analysis of stride interval time series by threshold dependent symbolic entropy. EJAP 98(1):30–40
Batista G, Keogh E, Mafra-Neto A, Rowton E (2011) Sensors and software to allow computational entomology, an emerging application of data mining. SIGKDD demo paper
Batista G, Wang X, Keogh E (2011) A complexity-invariant distance measure for time series. In: SDM
Bao L, Intille SS (2004) Acitivity recognition from user-annotated acceleration data. In: Proceedings of the 2nd international conference on pervasive computing, pp 1–17
Cavagna GA, Heglund NC, Taylor CR (1977) Mechanical work in terrestrial locomotion: two basic mechanisms for minimizing energy expenditure. J Physiol 233(5):R243–R261
Chen L, Özsu MT, Oria V (2005) Robust and fast similarity search for moving object trajectories. In: Proceedings of the ACM SIGMOD
CMU Graphics Lab Motion Capture Database. www.mocap.cs.cmu.edu/. Accessed 24 April 2012
de Chazal P, O’Dwyer M, Reilly RB (2004) Automatic classification of ECG heartbeats using ECG morphology and heartbeat interval features. IEEE Trans Biomed Eng 51:1196-06
Electrocardiography, http://en.wikipedia.org/wiki/Electrocardiography
Faezipour M, Saeed A, Bulusu S, Nourani M, Minn H, Tamil L (2010) A patient-adaptive profiling scheme for ECG beat classification. IEEE Trans Inform Technol Biomed 14(5):1153–1165
Gafurov D, Helkala K, Søndrol T (2006) Biometric gait authentication using accelerometer sensor. J Comput 1(7):51–59
Gafurov D, Snekkenes E (2008) Towards understanding the uniqueness of gait biometric. In: 8th IEEE International Conference on Automatic Face & Gesture Recognition
Grass J, Zilberstein S (1995) Anytime algorithm development tools. Technical Report. UMI Order Number: UM-CS-1995-094, University of Massachusetts
Hanson MA, Powell Jr HC, Barth AT, Lach J, Brown MBC (2009) Neural network gait classification for on-body inerital sensors. In: Proceedings of the 2009 sixth international workshop on wearable and implantable body sensor networks
Hao Y, Chen Y, Zakaria J, Hu B, Rakthanmanon T, Keogh E (2013) Towards never-ending learning from time series streams. In: SIGKDD
Hu B, Chen Y, Keogh E (2013) Time series classification under more realistic assumptions. In: SDM
Hu B, Chen Y, Zakaria J, Ulanova L, Keogh E (2013) Classification of multi-dimensional streaming time series by weighting each classifier’s track record. In: ICDM
Hu B, Rakthanmanon TR, Hao Y, Evans S, Lonardi S, Keogh E (2011) Discovering the intrinsic cardinality and dimensionality of time series using MDL. In: ICDM
Keogh E, Zhu Q, Hu B, Hao Y, Xi X, Wei L, Ratanamahatana CA (2006) The UCR time series classification/clustering homepage. www.cs.ucr.edu/~eamonn/time_series_data/
Keogh E, Lonardi S, Ratanamahatana C (2004) Towards parameter-free data mining. In: Proceedings of the tenth ACM SIGKDD
Keogh E, Palpanas T, Zordan VB, Gunopulos D, Cardle M (2004) Indexing large human-motion databases. In: VLDB
Koch P, Konen W, Hein K (2010) Gesture recognition on few training data using slow feature analysis and parametric bootstrap. In: IJCNN
Kranen P, Seidl T (2009) Harnessing the strengths of anytime algorithms for constant data stremas. J Data Min Knowl Discov 19(2):245–260
Lester J, Choudhury T, Kern N, Borriello G, Hannaford B (2005) A hybrid discriminative/generative approach for modeling human activities. In: IJCAI
Li L, Prakash BA (2011) Time series clustering: complex is simpler. In: ICML
Li M, Vitanyi P (1997) An introduction to Kolmogorov complexity and its applications, 2nd edn. Springer Verlag, New York
Liu J, Yu K, Zhang Y, Huang Y (2010) Training conditional random fields using transfer learning for gesture recognition. In: ICDM
McMahon TA, Cheng GC (1990) The mechanics of running: how does stiffness couple with speed. J Biomech 23:65–78
Morse M, Patel JM (2007) An efficient and accurate method for evaluating time series similarity. In: Proceedings of SIGMOD
Niennattrakul V, Keogh E, Ratanamahatana CA (2010) Data editing techniques to allow the application of distance-based outlier detection to streams. In: ICDM
PAMAP, Physical activity monitoring for aging people. www.pamap.org/demo.html. Accessed 12 May 2012
Pärkkä J, Ermes M, Korpipää P, Mäntyjärvi J, Peltola J, Korhonen I (2006) Activity classification using realistic data from wearable sensors. IEEE Trans Inf Technol Biomed 10:119–128
Pekalska E, Duin RPW, Paclík P (2006) Prototype selection for dissimilarity-based classifiers. Pattern Recognit 39:189–208
Pham C, Plötz T, Olivier P (2010) A dynamic time warping approach to real-time activity recognition for food preparation. In: Proceedings of the first international joint conference on Ambient intelligence
Project URL: https://sites.google.com/site/dmkdrealistic/
Raptis M, Kirovski D, Hoppes H (2011) Real-time classification of dance gestures from skeleton animation. In: Proceedings of the ACM SIGGRAPH symposium on computer animation
Raptis M, Wnuk K, Soatto S (2008) Flexible dictionaries for action recognition. In: Proceedings of the 1st international workshop on machine learning for vision-based motion analysis
Rakthanmanon T, Keogh E, Lonardi S, Evans S (2011) Time series epenthesis: clustering time series streams requires ignoring some data. In: ICDM
Ratanamahatana CA (2012) Personal communcation. May 2012
Ratanamahatana CA, Keogh E (2004) Making time-series classification more accurate using learned constraints. In: SDM
Reiss A, Stricker D (2011) Introducing a modular activity monitoring system. In: 33th international EMBC
Shieh J, Keogh E (2010) Polishing the right apple: anytime classification also benefits data streams with constant arrival times. In: ICDM
Song J, Kim D (2006) Simultaneous gesture segmentation and recognition based on forward spotting accumulative HMM. In: Proceedings of the 18th ICPR
The BIDMC congestive heart failure database, www.physionet.org/physiobank/database/chfdb/
Ueno K, Xi X, Keogh E, Lee D (2010) Anytime classification using the nearest neighbor algorithm with applications to stream mining. In: ICDM
Usabiaga J, Bebis G, Erol A, Nicolescu M (2007) Recognizing simple human actions using 3D head movement. Comput Intell 23(4):484–496
Vatavu RD (2011) The effect of sampling rate on the performance of template-based gesture recognizers. In: Proceedings of ICMI
Xi X, Keogh E, Shelton C, Wei L, Ratanamahatana C (2006) Fast time series classification using numerosity reduction. In: ICML, pp 1033–1040
Ye L, Wang X, Keogh E, Mafra-Neto A (2009) Autocannibalistic and anyspace indexing algorithms with applications to sensor data mining. In: SDM
Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining. In: KDD, pp 947–956
Yang AY, Giani A, Giannatonio R, Gilani K et al (2009) Distributed human action recognition via wearable motion sensor networks. www.eecs.berkeley.edu/~yang/software/WAR/index.html
Yang K, Jiang H, Dong J, Zhang C, Wang Z (2012) An adaptive real-time method for fetal heart rate extraction based on phonocardiography. In: 2012 IEEE biomedical circuits and systems conference. BioCAS, pp 356–359
Zilberstein S, Russell S (1995) Approximate reasoning using anytime algorithms. In: Imprecise and approximate computation. Kluwer Academic Publishers, Dordrecht
Zhang M, Sawchuk AA (2012) USC-HAD: a daily activity recognition using wearable sensors. ACM international conference on ubiquitous computing (UbiComp) workshop on situation, activity and goal awareness(SAGAware)
Acknowledgments
We thank all the donors of datasets. We would like to acknowledge the financial support for our research provided by NSF Grants IIS – 1161997 and an award from Vodafone.
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible Editor: G. Karypis.
Rights and permissions
About this article
Cite this article
Hu, B., Chen, Y. & Keogh, E. Classification of streaming time series under more realistic assumptions. Data Min Knowl Disc 30, 403–437 (2016). https://doi.org/10.1007/s10618-015-0415-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-015-0415-0