Time series classification based on qualitative space fragmentation

https://doi.org/10.1016/j.aei.2008.07.006Get rights and content

Abstract

In knowledge discovery and data mining from time series the goal is to detect interesting patterns in the series that may help a human to better recognize the regularities in the observed variables and thereby improve the understanding of the system. Ideally, knowledge discovery algorithms use time series representations that are close to those that are used by a human. The impressive pattern recognition capabilities of the human brain help to establish connections between different time series or different parts of a single time series on the basis of their visual appearance. When dealing with time series data there are two main objectives: (i) prediction of future behavior based on past behaviors and (ii) description (explanation) of time series data. Description of time series data can be used for generalization, clustering and classification.

In this paper, a novel time series classification method based on Qualitative Space Fragmentation is presented. The main characteristics of the presented method are expansion and coding of quantitative time series data together with extraction of symbolic and numeric features based on human visual perception. The expansion and coding process results in the creation of a qualitative difference vector. The qualitative difference vector conveys full information on the variation of the particular time series and can be seen as a single point in m-dimensional qualitative-space. Symbolic and numeric features based on human visual perception are extracted from the qualitative space and used for the decision tree construction that is later employed in time series classification. The application of the proposed method is demonstrated through two different case studies. In the first case study, the method was tested in the context of synthetic Control Chart Pattern data, which are time series developed for the assessment of the statistical process control. The obtained results were compared with the standard Qualitative Similarity Index method. In the second case study the method was tested in the field of analytic chemistry – polarography, an electrochemical method for analyzing solutions containing reducible or oxidizable substances.

Introduction

Time series are a form of data occurring in virtually every effective process. Important time series include stock market prices, sales of a product, all kinds of scientific results, weather readings, medical records and so on. When dealing with time series data there are two main objectives: (i) prediction of future behavior based on past behaviors and (ii) description (explanation) of time series data. Description of time series data can be used for generalization, clustering and classification. Time series classification involves learning a function that maps a series into a class from a set of predefined classes. Time series classification finds application in analyzing medical records (e.g. electrocardiogram data), different weather readings (e.g. storm-cloud data), process monitoring (e.g. sensor readings), statistical process control, etc. A common task among all mentioned categories is the problem of feature extraction and similarity measure calculation. This paper focuses on both these issues. We believe that qualitative information embedded in the graphical form of tabular time series data is the key for proper similarity calculations. We propose the creation of qualitative space based on time series data expansion and coding. From the defined qualitative space symbolic and numeric features based on human visual perception are extracted and used for decision tree construction that is later used for time series classification. The presented method cannot be put in the context of pure similarity measure calculation methods since the key for proper classification is extraction of symbolic and numeric features based on human visual perception from the qualitative space. Time series data expansion, as a key step in the qualitative space construction, is an infrequent process that can significantly help when dealing with similar time series. Because of qualitative characteristics of the proposed approach, in the first case study we have compared our method with the acknowledged Qualitative Similarity Index method [1] that uses qualitative perspective of observed time series through the process of local differentiating. In the second case study the method was tested in the field of analytic chemistry – polarography, an electrochemical method for analyzing solutions containing reducible or oxidizable substances.

The outline of the paper is as follows: a brief description of related works is presented in Section 2. Qualitative Similarity Index method is briefly presented in Section 3. Our new methodology is presented in Section 4. The comparison of differentiation types based on the information gain is presented in Section 5. Experimental results are provided in Sections 6 Experimental set up and results, 7 Polarography. A selected discussion and conclusion is given at the end of the paper.

Section snippets

Related works

Time series data mining has attracted enormous attention in the last decade. In existing published papers a variety of methods has been developed and proposed to analyze and compare time series. The review below is necessarily brief; we refer interested readers to [2], [3] for a more in depth review. Mostly, authors have concentrated on the improvement of time series comparison speed rather than accuracy of the comparison process. When dealing with large time series databases it indeed seems

Qualitative Similarity Index (QSI) method

Qualitative Similarity Index (QSI) method, proposed by a group of authors, can be found in [1]. The main idea of the QSI technique is the inclusion of some qualitative knowledge in the comparison of time series. The evolution (progression) of time series is represented in terms of qualitative labels. Different series with qualitatively similar evolution produce the same sequence of labels. Similarity between any two time series is calculated by comparison of two strings obtained from time

Qualitative Space Fragmentation (QSF) method

Qualitative space construction together with extraction of symbolic and numeric features based on human visual perception is the main highlight of the proposed new method for time series classification. It consists of three steps: time series data expansion, coding and qualitative matrix construction. In the following subparagraphs each step will be briefly explained and a feature extraction process will be depicted at the end.

Information content of various differentiation methods

Although mathematically clear, the complete time series expansion is not often used in computing science. The reason is obvious – memory requirements for the complete expansion can be quite high, particularly when dealing with data from long time series. This drawback of the proposed QSF method is compensated by an information gain, as shown next.

In time series analysis special attention is paid to the pre-processing phase where time series complexity should be somehow determined. A possibility

Experimental set up and results

QSI and QSF methods were tested in the context of the Control Chart Pattern (CCP) data [30], [42], which are time series used in statistical process control. CCP is widely used to establish and maintain control of critical outputs from manufacturing and other complex processes. Recognition of patterns on control chart, especially for complicated patterns, can often be a complex problem. Thus, automatic pattern-recognition systems can be considerable help to quality engineers. Most of the

Polarography

Polarography or voltammetry, in analytic chemistry, is an electrochemical method for analyzing solutions containing reducible or oxidizable substances. It provokes and analyses the passage of electrons from or to a polarizable electrode because of reduction or oxidation of ions at the solution/electrode interface. Invented by a Czech chemist Jaroslav Heyrovský in 1922, polarography is, in general, a technique in which the electric voltage is varied in a regular manner between two electrodes

Discussion and conclusion

In this paper we have proposed a novel Qualitative Space Fragmentation (QSF) method for time series classification. The method is founded on quantitative complete data expansion and coding. The end result of these two steps is a qualitative difference vector, which is later transformed into the qualitative matrix. The qualitative difference vector can be regarded as a single point in m-dimensional qualitative-space. The construction of the qualitative matrix enables time series data analysis on

Željko Jagnjić was born in Osijek, Croatia and went to the University of Zagreb, where he studied electrical engineering and computing. He obtained B.Sc.E.E. and M.Sc.C.S. from the same university in 1997 and 2001 respectively. Currently he is attending PhD study at the University of Zagreb. From 1998 till 2004 he worked at the Faculty of Electrical Engineering, University of Osijek, in the Department for Computing in Laboratory for Artificial Intelligence as a young researcher. His research

References (67)

  • J.A. Swift et al.

    Out-of-control pattern recognition and analysis for quality control charts using Lisp-based system

    Computer and Industrial Engineering

    (1995)
  • F.J. Cuberos, J.A. Ortega, R.M. Gasca, M. Toro, QSI – qualitative similarity index, in: Proceeding of the 16th...
  • E. Keogh, S. Kasetty, On the need for time series data mining benchmarks: a survey and empirical demonstration, in:...
  • J.F. Roddick, K. Hornsby, M. Spiliopoulou, An updated bibliography of temporal, spatial and spatio-temporal data mining...
  • R. Agrawal, K.I. Lin, H.S. Sawhney, K. Shim, Fast similarity search in the presence of noise, scaling, and translation...
  • S. Singh, P. McAtackney, Dynamic time-series forecasting using local approximation, in: Proceedings of the IEEE 10th...
  • B.K. Yi, N.D. Sidiropoulos, T. Johnson, A. Biliris, H.V. Jagadish, C. Faloutsos, Online data mining for co-evolving...
  • R. Agrawal, C. Faloutsos, A. Swami, Efficient similarity search in sequence databases, in: Proceedings of the 4th...
  • K. Chu, M. Wong, Fast time-series searching with scaling and shifting, in: Proceedings of the 18th ACM Symposium on...
  • C. Faloutsos, H. Jagadish, A. Mendelzon, T. Milo, A signature technique for similarity-based queries, in: Proceedings...
  • T. Kahveci, A. Singh, A. Gurel, An efficient index structure for shift and scale invariant search of multi-attribute...
  • D. Rafiei, A.O. Mendelzon, Efficient retrieval of similar time sequences using DFT, in: Proceedings of the 5th...
  • D. Rafiei, On similarity-based queries for time series data, in: Proceedings of the 15th IEEE International Conference...
  • K. Chan et al.

    Haar wavelets for efficient similarity search of time series: with and without time warping

    IEEE Transactions on Knowledge and Data Engineering

    (2003)
  • K. Chan, A.W. Fu, Efficient time series matching by wavelets, in: Proceedings of the 15th IEEE International Conference...
  • T. Kahveci, A. Singh, Variable length queries for time series data, in: Proceedings of the 17th International...
  • I. Popivanov, R.J. Miller, Similarity search over time series data using wavelets, in: Proceedings of the 18th...
  • C. Shahabi, X. Tian, W. Zhao, TSA-tree: a wavelet based approach to improve the efficiency of multi-level surprise and...
  • C. Wang, X.S. Wang, Supporting content-based searches on time series via approximation, in: Proceedings of the 12th...
  • Y. Wu, D. Agrawal, A. El Abbadi, A comparison of DFT and DWT based similarity search in time-series databases, in:...
  • E. Keogh, K. Chakrabarti, M. Pazzani, S. Mehrotra, Locally adaptive dimensionality reduction for indexing large time...
  • F. Korn, H. Jagadish, C. Faloutsos, Efficiently supporting ad hoc queries in large datasets of time sequences, in:...
  • H. Ferhatosmanoglu, E. Tuncel, D. Agrawal, A. El Abbadi, Approximate nearest neighbor searching in multimedia...
  • B. Yi, C. Faloutsos, Fast time sequence indexing for arbitrary lp norms, in: Proceedings of the 26th International...
  • W. Loh, S. Kim, K. Whang, Index interpolation: an approach to subsequence matching supporting normalization transform...
  • E.J. Koegh, M.J. Pazzani, An enhanced representation of time series which allows fast and accurate classification,...
  • D. Berndt, J. Clifford, Using dynamic time warping to find patterns in time series, in: The Workshop on Knowledge...
  • E.J. Keogh, M.J. Pazzani, Derivative dynamic time warping, in: Proceedings of the 1st SIAM International Conference on...
  • E.J. Koegh, P. Smyth, A probabilistic approach to fast pattern matching in time series databases, in: Proceedings of...
  • P. Sebastian, M. Ramoni, P.R. Cohen, J. Warwick, J. Davis, Discovering dynamics using Bayesian clustering, in:...
  • R.J. Alcock, Y. Manolopoulos, Time-series similarity queries employing a feature-based approach, in: Proceedings of 7th...
  • V. Guralnik, D. Wijesekera, J. Srivastava, Pattern directed mining of sequence data, in: Proceedings of the 4th...
  • G. Das, K.I. Lin, H. Mannila, G. Renganathan, P. Smyth, Rule discovery from time-series, in: Proceedings of the 4th...
  • Cited by (0)

    Željko Jagnjić was born in Osijek, Croatia and went to the University of Zagreb, where he studied electrical engineering and computing. He obtained B.Sc.E.E. and M.Sc.C.S. from the same university in 1997 and 2001 respectively. Currently he is attending PhD study at the University of Zagreb. From 1998 till 2004 he worked at the Faculty of Electrical Engineering, University of Osijek, in the Department for Computing in Laboratory for Artificial Intelligence as a young researcher. His research interests encompass qualitative methods of modelling and development of fast algorithms for data analysis based on artificial intelligence methods and procedures for reasoning about complex systems. He is also interested in development of intelligence engines for games.

    In 2004 he has joined Slavonska banka d.d. Osijek where he works as the Director of ORG/IT division.

    View full text