Skip to main content

Advertisement

Log in

Classification of streaming time series under more realistic assumptions

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Much of the vast literature on time series classification makes several assumptions about data and the algorithm’s eventual deployment that are almost certainly unwarranted. For example, many research efforts assume that the beginning and ending points of the pattern of interest can be correctly identified, during both the training phase and later deployment. Another example is the common assumption that queries will be made at a constant rate that is known ahead of time, thus computational resources can be exactly budgeted. In this work, we argue that these assumptions are unjustified, and this has in many cases led to unwarranted optimism about the performance of the proposed algorithms. As we shall show, the task of correctly extracting individual gait cycles, heartbeats, gestures, behaviors, etc., is generally much more difficult than the task of actually classifying those patterns. Likewise, gesture classification systems deployed on a device such as Google Glass may issue queries at frequencies that range over an order of magnitude, making it difficult to plan computational resources. We propose to mitigate these problems by introducing an alignment-free time series classification framework. The framework requires only very weakly annotated data, such as “in this ten minutes of data, we see mostly normal heartbeats\(\ldots \),” and by generalizing the classic machine learning idea of data editing to streaming/continuous data, allows us to build robust, fast and accurate anytime classifiers. We demonstrate on several diverse real-world problems that beyond removing unwarranted assumptions and requiring essentially no human intervention, our framework is both extremely fast and significantly more accurate than current state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Notes

  1. Note that only some ECG classification systems do beat extraction then classification (Faezipour et al. 2010). Many researchers believe that robust beat extraction can be a harder problem than classification itself (cf. Figs. 1 and 2), and thus present every subsequence extracted by a sliding window for classification. This is the approach we consider in Sect. 4.2, as we assume bedside monitoring.

  2. Note the fact that the two patterns are out of phase does not make them non-redundant, as at query time only queries half their length are used, and they are sliding across the entire length of the patterns. Details in Sect. 4.2.

  3. Where tractability is an issue, we may sample a subset of the queries.

  4. We defer the discussion on how to choose a query length to Sect. 6.

  5. The reader may ask why not Dynamic Time Warping? Empirically, we tried it and it does not help. Moreover, we should not expect it to help this problem; https://sites.google.com/site/dmkdrealistic/.

  6. We only show the rejected queries in the first case study. See Project (https://sites.google.com/site/dmkdrealistic/) for examples of rejected queries from the other case studies.

  7. Experimental results show that the threshold distances for D built with Euclidean distance and Uniform Scaling distance are almost identical. Therefore, we only report one threshold distance.

References

  • Andino SLG et al (2000) Measuring the complexity of time series: an application to neurophysiological signals. Hum Brain Map 11(1):46–57

    Article  Google Scholar 

  • Aspelin K (2005) Establishing pedestrian walking speeds. Portland State University. www.usroads.com/journals/p/rej/9710/re971001.htm. Accessed 24 Aug 2009

  • Aziz W, Arif M (2006) Complexity analysis of stride interval time series by threshold dependent symbolic entropy. EJAP 98(1):30–40

    Google Scholar 

  • Batista G, Keogh E, Mafra-Neto A, Rowton E (2011) Sensors and software to allow computational entomology, an emerging application of data mining. SIGKDD demo paper

  • Batista G, Wang X, Keogh E (2011) A complexity-invariant distance measure for time series. In: SDM

  • Bao L, Intille SS (2004) Acitivity recognition from user-annotated acceleration data. In: Proceedings of the 2nd international conference on pervasive computing, pp 1–17

  • Cavagna GA, Heglund NC, Taylor CR (1977) Mechanical work in terrestrial locomotion: two basic mechanisms for minimizing energy expenditure. J Physiol 233(5):R243–R261

    Google Scholar 

  • Chen L, Özsu MT, Oria V (2005) Robust and fast similarity search for moving object trajectories. In: Proceedings of the ACM SIGMOD

  • CMU Graphics Lab Motion Capture Database. www.mocap.cs.cmu.edu/. Accessed 24 April 2012

  • de Chazal P, O’Dwyer M, Reilly RB (2004) Automatic classification of ECG heartbeats using ECG morphology and heartbeat interval features. IEEE Trans Biomed Eng 51:1196-06

    Google Scholar 

  • Electrocardiography, http://en.wikipedia.org/wiki/Electrocardiography

  • Faezipour M, Saeed A, Bulusu S, Nourani M, Minn H, Tamil L (2010) A patient-adaptive profiling scheme for ECG beat classification. IEEE Trans Inform Technol Biomed 14(5):1153–1165

    Article  Google Scholar 

  • Gafurov D, Helkala K, Søndrol T (2006) Biometric gait authentication using accelerometer sensor. J Comput 1(7):51–59

    Article  Google Scholar 

  • Gafurov D, Snekkenes E (2008) Towards understanding the uniqueness of gait biometric. In: 8th IEEE International Conference on Automatic Face & Gesture Recognition

  • Grass J, Zilberstein S (1995) Anytime algorithm development tools. Technical Report. UMI Order Number: UM-CS-1995-094, University of Massachusetts

  • Hanson MA, Powell Jr HC, Barth AT, Lach J, Brown MBC (2009) Neural network gait classification for on-body inerital sensors. In: Proceedings of the 2009 sixth international workshop on wearable and implantable body sensor networks

  • Hao Y, Chen Y, Zakaria J, Hu B, Rakthanmanon T, Keogh E (2013) Towards never-ending learning from time series streams. In: SIGKDD

  • Hu B, Chen Y, Keogh E (2013) Time series classification under more realistic assumptions. In: SDM

  • Hu B, Chen Y, Zakaria J, Ulanova L, Keogh E (2013) Classification of multi-dimensional streaming time series by weighting each classifier’s track record. In: ICDM

  • Hu B, Rakthanmanon TR, Hao Y, Evans S, Lonardi S, Keogh E (2011) Discovering the intrinsic cardinality and dimensionality of time series using MDL. In: ICDM

  • Keogh E, Zhu Q, Hu B, Hao Y, Xi X, Wei L, Ratanamahatana CA (2006) The UCR time series classification/clustering homepage. www.cs.ucr.edu/~eamonn/time_series_data/

  • Keogh E, Lonardi S, Ratanamahatana C (2004) Towards parameter-free data mining. In: Proceedings of the tenth ACM SIGKDD

  • Keogh E, Palpanas T, Zordan VB, Gunopulos D, Cardle M (2004) Indexing large human-motion databases. In: VLDB

  • Koch P, Konen W, Hein K (2010) Gesture recognition on few training data using slow feature analysis and parametric bootstrap. In: IJCNN

  • Kranen P, Seidl T (2009) Harnessing the strengths of anytime algorithms for constant data stremas. J Data Min Knowl Discov 19(2):245–260

    Article  MathSciNet  Google Scholar 

  • Lester J, Choudhury T, Kern N, Borriello G, Hannaford B (2005) A hybrid discriminative/generative approach for modeling human activities. In: IJCAI

  • Li L, Prakash BA (2011) Time series clustering: complex is simpler. In: ICML

  • Li M, Vitanyi P (1997) An introduction to Kolmogorov complexity and its applications, 2nd edn. Springer Verlag, New York

    Book  MATH  Google Scholar 

  • Liu J, Yu K, Zhang Y, Huang Y (2010) Training conditional random fields using transfer learning for gesture recognition. In: ICDM

  • McMahon TA, Cheng GC (1990) The mechanics of running: how does stiffness couple with speed. J Biomech 23:65–78

    Article  Google Scholar 

  • Morse M, Patel JM (2007) An efficient and accurate method for evaluating time series similarity. In: Proceedings of SIGMOD

  • Niennattrakul V, Keogh E, Ratanamahatana CA (2010) Data editing techniques to allow the application of distance-based outlier detection to streams. In: ICDM

  • PAMAP, Physical activity monitoring for aging people. www.pamap.org/demo.html. Accessed 12 May 2012

  • Pärkkä J, Ermes M, Korpipää P, Mäntyjärvi J, Peltola J, Korhonen I (2006) Activity classification using realistic data from wearable sensors. IEEE Trans Inf Technol Biomed 10:119–128

    Article  Google Scholar 

  • Pekalska E, Duin RPW, Paclík P (2006) Prototype selection for dissimilarity-based classifiers. Pattern Recognit 39:189–208

    Article  MATH  Google Scholar 

  • Pham C, Plötz T, Olivier P (2010) A dynamic time warping approach to real-time activity recognition for food preparation. In: Proceedings of the first international joint conference on Ambient intelligence

  • Project URL: https://sites.google.com/site/dmkdrealistic/

  • Raptis M, Kirovski D, Hoppes H (2011) Real-time classification of dance gestures from skeleton animation. In: Proceedings of the ACM SIGGRAPH symposium on computer animation

  • Raptis M, Wnuk K, Soatto S (2008) Flexible dictionaries for action recognition. In: Proceedings of the 1st international workshop on machine learning for vision-based motion analysis

  • Rakthanmanon T, Keogh E, Lonardi S, Evans S (2011) Time series epenthesis: clustering time series streams requires ignoring some data. In: ICDM

  • Ratanamahatana CA (2012) Personal communcation. May 2012

  • Ratanamahatana CA, Keogh E (2004) Making time-series classification more accurate using learned constraints. In: SDM

  • Reiss A, Stricker D (2011) Introducing a modular activity monitoring system. In: 33th international EMBC

  • Shieh J, Keogh E (2010) Polishing the right apple: anytime classification also benefits data streams with constant arrival times. In: ICDM

  • Song J, Kim D (2006) Simultaneous gesture segmentation and recognition based on forward spotting accumulative HMM. In: Proceedings of the 18th ICPR

  • The BIDMC congestive heart failure database, www.physionet.org/physiobank/database/chfdb/

  • Ueno K, Xi X, Keogh E, Lee D (2010) Anytime classification using the nearest neighbor algorithm with applications to stream mining. In: ICDM

  • Usabiaga J, Bebis G, Erol A, Nicolescu M (2007) Recognizing simple human actions using 3D head movement. Comput Intell 23(4):484–496

    Article  MathSciNet  Google Scholar 

  • Vatavu RD (2011) The effect of sampling rate on the performance of template-based gesture recognizers. In: Proceedings of ICMI

  • Xi X, Keogh E, Shelton C, Wei L, Ratanamahatana C (2006) Fast time series classification using numerosity reduction. In: ICML, pp 1033–1040

  • Ye L, Wang X, Keogh E, Mafra-Neto A (2009) Autocannibalistic and anyspace indexing algorithms with applications to sensor data mining. In: SDM

  • Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining. In: KDD, pp 947–956

  • Yang AY, Giani A, Giannatonio R, Gilani K et al (2009) Distributed human action recognition via wearable motion sensor networks. www.eecs.berkeley.edu/~yang/software/WAR/index.html

  • Yang K, Jiang H, Dong J, Zhang C, Wang Z (2012) An adaptive real-time method for fetal heart rate extraction based on phonocardiography. In: 2012 IEEE biomedical circuits and systems conference. BioCAS, pp 356–359

  • Zilberstein S, Russell S (1995) Approximate reasoning using anytime algorithms. In: Imprecise and approximate computation. Kluwer Academic Publishers, Dordrecht

  • Zhang M, Sawchuk AA (2012) USC-HAD: a daily activity recognition using wearable sensors. ACM international conference on ubiquitous computing (UbiComp) workshop on situation, activity and goal awareness(SAGAware)

Download references

Acknowledgments

We thank all the donors of datasets. We would like to acknowledge the financial support for our research provided by NSF Grants IIS – 1161997 and an award from Vodafone.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bing Hu.

Additional information

Responsible Editor: G. Karypis.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, B., Chen, Y. & Keogh, E. Classification of streaming time series under more realistic assumptions. Data Min Knowl Disc 30, 403–437 (2016). https://doi.org/10.1007/s10618-015-0415-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-015-0415-0

Keywords

Navigation