Classification of streaming time series under more realistic assumptions

Hu, Bing; Chen, Yanping; Keogh, Eamonn

doi:10.1007/s10618-015-0415-0

Classification of streaming time series under more realistic assumptions

Published: 03 June 2015

Volume 30, pages 403–437, (2016)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Bing Hu¹,
Yanping Chen¹ &
Eamonn Keogh¹

1898 Accesses
21 Citations
6 Altmetric
Explore all metrics

Abstract

Much of the vast literature on time series classification makes several assumptions about data and the algorithm’s eventual deployment that are almost certainly unwarranted. For example, many research efforts assume that the beginning and ending points of the pattern of interest can be correctly identified, during both the training phase and later deployment. Another example is the common assumption that queries will be made at a constant rate that is known ahead of time, thus computational resources can be exactly budgeted. In this work, we argue that these assumptions are unjustified, and this has in many cases led to unwarranted optimism about the performance of the proposed algorithms. As we shall show, the task of correctly extracting individual gait cycles, heartbeats, gestures, behaviors, etc., is generally much more difficult than the task of actually classifying those patterns. Likewise, gesture classification systems deployed on a device such as Google Glass may issue queries at frequencies that range over an order of magnitude, making it difficult to plan computational resources. We propose to mitigate these problems by introducing an alignment-free time series classification framework. The framework requires only very weakly annotated data, such as “in this ten minutes of data, we see mostly normal heartbeats\(\ldots \),” and by generalizing the classic machine learning idea of data editing to streaming/continuous data, allows us to build robust, fast and accurate anytime classifiers. We demonstrate on several diverse real-world problems that beyond removing unwarranted assumptions and requiring essentially no human intervention, our framework is both extremely fast and significantly more accurate than current state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An end-to-end machine learning approach with explanation for time series with varying lengths

Article Open access 19 February 2024

Learning from imbalanced data: open challenges and future directions

Article Open access 22 April 2016

The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances

Article Open access 18 December 2020

Notes

Note that only some ECG classification systems do beat extraction then classification (Faezipour et al. 2010). Many researchers believe that robust beat extraction can be a harder problem than classification itself (cf. Figs. 1 and 2), and thus present every subsequence extracted by a sliding window for classification. This is the approach we consider in Sect. 4.2, as we assume bedside monitoring.
Note the fact that the two patterns are out of phase does not make them non-redundant, as at query time only queries half their length are used, and they are sliding across the entire length of the patterns. Details in Sect. 4.2.
Where tractability is an issue, we may sample a subset of the queries.
We defer the discussion on how to choose a query length to Sect. 6.
The reader may ask why not Dynamic Time Warping? Empirically, we tried it and it does not help. Moreover, we should not expect it to help this problem; https://sites.google.com/site/dmkdrealistic/.
We only show the rejected queries in the first case study. See Project (https://sites.google.com/site/dmkdrealistic/) for examples of rejected queries from the other case studies.
Experimental results show that the threshold distances for D built with Euclidean distance and Uniform Scaling distance are almost identical. Therefore, we only report one threshold distance.

References

Andino SLG et al (2000) Measuring the complexity of time series: an application to neurophysiological signals. Hum Brain Map 11(1):46–57
Article Google Scholar
Aspelin K (2005) Establishing pedestrian walking speeds. Portland State University. www.usroads.com/journals/p/rej/9710/re971001.htm. Accessed 24 Aug 2009
Aziz W, Arif M (2006) Complexity analysis of stride interval time series by threshold dependent symbolic entropy. EJAP 98(1):30–40
Google Scholar
Batista G, Keogh E, Mafra-Neto A, Rowton E (2011) Sensors and software to allow computational entomology, an emerging application of data mining. SIGKDD demo paper
Batista G, Wang X, Keogh E (2011) A complexity-invariant distance measure for time series. In: SDM
Bao L, Intille SS (2004) Acitivity recognition from user-annotated acceleration data. In: Proceedings of the 2nd international conference on pervasive computing, pp 1–17
Cavagna GA, Heglund NC, Taylor CR (1977) Mechanical work in terrestrial locomotion: two basic mechanisms for minimizing energy expenditure. J Physiol 233(5):R243–R261
Google Scholar
Chen L, Özsu MT, Oria V (2005) Robust and fast similarity search for moving object trajectories. In: Proceedings of the ACM SIGMOD
CMU Graphics Lab Motion Capture Database. www.mocap.cs.cmu.edu/. Accessed 24 April 2012
de Chazal P, O’Dwyer M, Reilly RB (2004) Automatic classification of ECG heartbeats using ECG morphology and heartbeat interval features. IEEE Trans Biomed Eng 51:1196-06
Google Scholar
Electrocardiography, http://en.wikipedia.org/wiki/Electrocardiography
Faezipour M, Saeed A, Bulusu S, Nourani M, Minn H, Tamil L (2010) A patient-adaptive profiling scheme for ECG beat classification. IEEE Trans Inform Technol Biomed 14(5):1153–1165
Article Google Scholar
Gafurov D, Helkala K, Søndrol T (2006) Biometric gait authentication using accelerometer sensor. J Comput 1(7):51–59
Article Google Scholar
Gafurov D, Snekkenes E (2008) Towards understanding the uniqueness of gait biometric. In: 8th IEEE International Conference on Automatic Face & Gesture Recognition
Grass J, Zilberstein S (1995) Anytime algorithm development tools. Technical Report. UMI Order Number: UM-CS-1995-094, University of Massachusetts
Hanson MA, Powell Jr HC, Barth AT, Lach J, Brown MBC (2009) Neural network gait classification for on-body inerital sensors. In: Proceedings of the 2009 sixth international workshop on wearable and implantable body sensor networks
Hao Y, Chen Y, Zakaria J, Hu B, Rakthanmanon T, Keogh E (2013) Towards never-ending learning from time series streams. In: SIGKDD
Hu B, Chen Y, Keogh E (2013) Time series classification under more realistic assumptions. In: SDM
Hu B, Chen Y, Zakaria J, Ulanova L, Keogh E (2013) Classification of multi-dimensional streaming time series by weighting each classifier’s track record. In: ICDM
Hu B, Rakthanmanon TR, Hao Y, Evans S, Lonardi S, Keogh E (2011) Discovering the intrinsic cardinality and dimensionality of time series using MDL. In: ICDM
Keogh E, Zhu Q, Hu B, Hao Y, Xi X, Wei L, Ratanamahatana CA (2006) The UCR time series classification/clustering homepage. www.cs.ucr.edu/~eamonn/time_series_data/
Keogh E, Lonardi S, Ratanamahatana C (2004) Towards parameter-free data mining. In: Proceedings of the tenth ACM SIGKDD
Keogh E, Palpanas T, Zordan VB, Gunopulos D, Cardle M (2004) Indexing large human-motion databases. In: VLDB
Koch P, Konen W, Hein K (2010) Gesture recognition on few training data using slow feature analysis and parametric bootstrap. In: IJCNN
Kranen P, Seidl T (2009) Harnessing the strengths of anytime algorithms for constant data stremas. J Data Min Knowl Discov 19(2):245–260
Article MathSciNet Google Scholar
Lester J, Choudhury T, Kern N, Borriello G, Hannaford B (2005) A hybrid discriminative/generative approach for modeling human activities. In: IJCAI
Li L, Prakash BA (2011) Time series clustering: complex is simpler. In: ICML
Li M, Vitanyi P (1997) An introduction to Kolmogorov complexity and its applications, 2nd edn. Springer Verlag, New York
Book MATH Google Scholar
Liu J, Yu K, Zhang Y, Huang Y (2010) Training conditional random fields using transfer learning for gesture recognition. In: ICDM
McMahon TA, Cheng GC (1990) The mechanics of running: how does stiffness couple with speed. J Biomech 23:65–78
Article Google Scholar
Morse M, Patel JM (2007) An efficient and accurate method for evaluating time series similarity. In: Proceedings of SIGMOD
Niennattrakul V, Keogh E, Ratanamahatana CA (2010) Data editing techniques to allow the application of distance-based outlier detection to streams. In: ICDM
PAMAP, Physical activity monitoring for aging people. www.pamap.org/demo.html. Accessed 12 May 2012
Pärkkä J, Ermes M, Korpipää P, Mäntyjärvi J, Peltola J, Korhonen I (2006) Activity classification using realistic data from wearable sensors. IEEE Trans Inf Technol Biomed 10:119–128
Article Google Scholar
Pekalska E, Duin RPW, Paclík P (2006) Prototype selection for dissimilarity-based classifiers. Pattern Recognit 39:189–208
Article MATH Google Scholar
Pham C, Plötz T, Olivier P (2010) A dynamic time warping approach to real-time activity recognition for food preparation. In: Proceedings of the first international joint conference on Ambient intelligence
Project URL: https://sites.google.com/site/dmkdrealistic/
Raptis M, Kirovski D, Hoppes H (2011) Real-time classification of dance gestures from skeleton animation. In: Proceedings of the ACM SIGGRAPH symposium on computer animation
Raptis M, Wnuk K, Soatto S (2008) Flexible dictionaries for action recognition. In: Proceedings of the 1st international workshop on machine learning for vision-based motion analysis
Rakthanmanon T, Keogh E, Lonardi S, Evans S (2011) Time series epenthesis: clustering time series streams requires ignoring some data. In: ICDM
Ratanamahatana CA (2012) Personal communcation. May 2012
Ratanamahatana CA, Keogh E (2004) Making time-series classification more accurate using learned constraints. In: SDM
Reiss A, Stricker D (2011) Introducing a modular activity monitoring system. In: 33th international EMBC
Shieh J, Keogh E (2010) Polishing the right apple: anytime classification also benefits data streams with constant arrival times. In: ICDM
Song J, Kim D (2006) Simultaneous gesture segmentation and recognition based on forward spotting accumulative HMM. In: Proceedings of the 18th ICPR
The BIDMC congestive heart failure database, www.physionet.org/physiobank/database/chfdb/
Ueno K, Xi X, Keogh E, Lee D (2010) Anytime classification using the nearest neighbor algorithm with applications to stream mining. In: ICDM
Usabiaga J, Bebis G, Erol A, Nicolescu M (2007) Recognizing simple human actions using 3D head movement. Comput Intell 23(4):484–496
Article MathSciNet Google Scholar
Vatavu RD (2011) The effect of sampling rate on the performance of template-based gesture recognizers. In: Proceedings of ICMI
Xi X, Keogh E, Shelton C, Wei L, Ratanamahatana C (2006) Fast time series classification using numerosity reduction. In: ICML, pp 1033–1040
Ye L, Wang X, Keogh E, Mafra-Neto A (2009) Autocannibalistic and anyspace indexing algorithms with applications to sensor data mining. In: SDM
Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining. In: KDD, pp 947–956
Yang AY, Giani A, Giannatonio R, Gilani K et al (2009) Distributed human action recognition via wearable motion sensor networks. www.eecs.berkeley.edu/~yang/software/WAR/index.html
Yang K, Jiang H, Dong J, Zhang C, Wang Z (2012) An adaptive real-time method for fetal heart rate extraction based on phonocardiography. In: 2012 IEEE biomedical circuits and systems conference. BioCAS, pp 356–359
Zilberstein S, Russell S (1995) Approximate reasoning using anytime algorithms. In: Imprecise and approximate computation. Kluwer Academic Publishers, Dordrecht
Zhang M, Sawchuk AA (2012) USC-HAD: a daily activity recognition using wearable sensors. ACM international conference on ubiquitous computing (UbiComp) workshop on situation, activity and goal awareness(SAGAware)

Download references

Acknowledgments

We thank all the donors of datasets. We would like to acknowledge the financial support for our research provided by NSF Grants IIS – 1161997 and an award from Vodafone.

Author information

Authors and Affiliations

Department of Computer Science & Engineering, University of California, Riverside, CA, USA
Bing Hu, Yanping Chen & Eamonn Keogh

Authors

Bing Hu
View author publications
You can also search for this author in PubMed Google Scholar
Yanping Chen
View author publications
You can also search for this author in PubMed Google Scholar
Eamonn Keogh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bing Hu.

Additional information

Responsible Editor: G. Karypis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hu, B., Chen, Y. & Keogh, E. Classification of streaming time series under more realistic assumptions. Data Min Knowl Disc 30, 403–437 (2016). https://doi.org/10.1007/s10618-015-0415-0

Download citation

Received: 21 March 2014
Accepted: 11 April 2015
Published: 03 June 2015
Issue Date: March 2016
DOI: https://doi.org/10.1007/s10618-015-0415-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Classification of streaming time series under more realistic assumptions

Abstract

Access this article

Similar content being viewed by others

An end-to-end machine learning approach with explanation for time series with varying lengths

Learning from imbalanced data: open challenges and future directions

The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Classification of streaming time series under more realistic assumptions

Abstract

Access this article

Similar content being viewed by others

An end-to-end machine learning approach with explanation for time series with varying lengths

Learning from imbalanced data: open challenges and future directions

The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation