Using derivatives in a longest common subsequence dissimilarity measure for time series classification

doi:10.1016/j.patrec.2014.03.009

Pattern Recognition Letters

Volume 45, 1 August 2014, Pages 99-105

https://doi.org/10.1016/j.patrec.2014.03.009 Get rights and content

Abstract

Over recent years the popularity of time series has soared. Given the widespread use of modern information technology, a large number of time series may be collected. As a consequence there has been a dramatic increase in the amount of interest in querying and mining such data. A vital component in many types of time series analyses is the choice of an appropriate dissimilarity measure. Numerous measures have been proposed to date, with the most successful ones based on dynamic programming. One of such measures is longest common subsequence (LCSS). In this paper, we propose a parametrical extension of LCSS based on derivatives. In contrast to well-known measures from the literature, our approach considers the general shape of a time series rather than point-to-point function comparison. The new dissimilarity measure is used in classification with the nearest neighbor rule. In order to provide a comprehensive comparison, we conducted a set of experiments, testing effectiveness on 47 real time series. Experiments show that our method provides a higher quality of classification compared with LCSS on examined data sets.

Introduction

Time series classification has been studied extensively by the machine learning and data mining communities. Such series are suitable for representing social, economic and natural phenomena, medical observations, and results of scientific and engineering experiments. The crucial point in time series classification is how to measure the dissimilarity of time series (a very good overview of dissimilarity measures can be found in [9]). The simplicity and efficiency of Euclidean distance [11] makes this the most popular dissimilarity measure in time series data mining [1], [18]. It requires that both input sequences be of the same length, and it is sensitive to distortions and shifting along the time axis [29], [25]. Such problems can be handled by elastic dissimilarity measures such as Dynamic Time Warping (DTW) [5] and Longest Common SubSequence (LCSS) [2], [28]. DTW searches for the best alignment between two time series, attempting to minimize the distance between them. LCSS finds the length of the longest matching subsequence. Compared with Euclidean distance, DTW and LCSS are more elastic, supporting local time shifts and variations in lengths of pairs of time series, but they are also more expensive to compute. Of the three measures, LCSS is the least sensitive to noise, because it includes a threshold to define a “match” [28].

The effectiveness of the nearest neighbor classifier depends on the dissimilarity measure used to compare objects in the classification process. At present, the dissimilarity functions used in time series classification mostly involve point-to-point comparison of time series. The measures often reduce such distortions as occur if two time series do not have the same length or are locally out of phase, etc. It seems that in the classification domain there could be objects for which function value comparison is not sufficient. There could be cases where assignment to one of the classes depends on the general shape of objects rather than on strict function value comparison. An object associated with a function that responds to its variability in “time” is the derivative of the function. The function’s derivative determines areas where the function is constant, increases or decreases, and the intensity of the changes. The derivative determines the general shape of the function rather than the value of the function at a particular point. The derivative shows what happens in the neighborhood of the point. While the first derivative gives some information about the shape of the function (increasing or decreasing), the second derivative adds additional information as to where the function is convex or concave. We cannot expect that it will be sufficient to compare only time series derivatives. It seems that the best approach is to create a method which considers both the function values of time series and values of the derivative (or derivatives) of the function (shape comparison). The intensity of the influence of these approaches should be parameterized. Then we can expect that for different time series the method will select the appropriate intensities of these comparisons and give the best classification results.

In this paper we construct a dissimilarity measure that considers the above-mentioned approaches to time series classification. Consequently we are able to deal with situations where the investigated sequences are not different enough. For a dissimilarity function, a new parameterized family of dissimilarity measures is formed, where a fixed dissimilarity measure is used to compute dissimilarities of time series (function values) and their variability in “time” (dissimilarities of their derivatives). The new dissimilarity functions so constructed are used in the nearest neighbor classification method. The use of derivatives in time series classification is not a novelty. Some ideas of dissimilarity between trajectories using derivatives were proposed by Kosmelj [20] and Carlier [6]. They used the concepts of velocity and acceleration to measure the dissimilarities between trajectories in cluster analysis. D’Urso and Vichi [10] and Coppi et al. [7] developed this idea and used it to perform cluster analysis of longitudinal data. The use of derivatives with DTW was proposed by Keogh and Pazzani [17]. However they used only the dissimilarity between the derivatives, rather than the standard dissimilarity between the time series. Górecki [13] and Górecki and Łuczak [14] presented results concerning derivative DTW where just the first derivative is added, while parameterization involves both the function and derivative. Such an approach was shown to give very good results. Górecki and Łuczak [15] also presented results where the second derivative is added. The parametric approach makes it possible to adapt to the data set, but without overtraining. Now we try to extend this methodology to LCSS, which is a better method than DTW in the presence of outliers [28] and generally is very close to the best dissimilarity measure DTW [21].

In this paper we first review the concept of time series and the longest common subsequence dissimilarity measure (Section 2). At the end of that section we introduce our dissimilarity measure based on derivatives. The data sets used and the experimental setup are described in Section 3. Section 4 contains the results of our experiments on the described real data sets, as well as statistical analysis of the results and analysis of the running times of the investigated methods. Conclusions are given in Section 5.

Section snippets

Longest common subsequence

The longest common subsequence dissimilarity measure is a variation of the edit dissimilarity measure used in speech recognition. The basic idea is to match two sequences by allowing them to stretch, without rearranging the sequence of the elements but allowing some elements to be unmatched or left out (e.g., outliers) – whereas in Euclidean Distance and DTW, all elements from both sequences must be used, even the outliers. The overall idea is to count the number of pairs of points from the two

Experimental setup

We performed experiments on 47 data sets. Information on the time series used is presented in Table 1. The time series originate from the UCR Time Series Classification/Clustering Homepage [19], which includes the majority of all of the world’s publicly available, labeled time series data sets. The length of time series varies from 24 to 1882 depending on the data set. The number of time series in the training (testing) set per data set varies from 16 (28) to 1800 (8236), and the number of

Results

The results are presented in Table 2. The absolute error rates on the test subset with the 1NN method are shown for each dissimilarity measure.

LCSS performed the best on 2 of the data sets, DD_LCSS on 6 and 2DD_LCSS on 23. On 16 data sets no method was clearly better than the others. Comparing the new dissimilarity measures with the standard LCSS, we can see a significant reduction in error rate for most data sets. This is especially clearly seen for the mean of relative errors presented in Table

Conclusions

In this paper we have introduced and studied new time series dissimilarity measures based on derivatives. We used these measures to classify time series in conjunction with the 1NN method. Our studies showed that these methods give very good results. Our measures are superior to the LCSS. The proposed methods, thanks to a parametrical approach, make it possible to choose an appropriate model for any data set. The experiments that we have conducted justify the power and usefulness of our

Acknowledgments

The authors wish to thank the editor and the two referees for their comments, which have helped to improve the presentation of this paper.

References (29)

R. Agrawal, C. Faloutsos, A. Swami, Efficient similarity search in sequence databases. in: 4th International Conference...
R. Agrawal et al.
Fast similarity search in the presence of noise, scaling and translation in time-series databases
G. Batista et al.
A complexity-invariant distance measure for time series
G. Bergmann et al.
Improvements of general multiple test procedures for redundant systems of hypotheses
D.J. Berndt et al.
Using dynamic time warping to find patterns in time series
A. Carlier
Factor analysis of evolution and cluster methods on trajectories
R. Coppi et al.
A Fuzzy clustering model for multivariate spatial time series
J. Classification
(2010)
J. Demšar
Statistical Comparisons of Classifiers Over Multiple Data Sets
J. Mach. Learn. Res.
(2006)
H. Ding et al.
Querying and mining of time series data: experimental comparison of representations and distance measures
P. D’Urso et al.
Dissimilarities between trajectories of a three-way longitudinal data set

C. Faloutsos et al.

Fast subsequence matching in timeseries databases

ACM SIGMOD Record

(1994)

S. Garcia et al.

An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons

J. Mach. Learn. Res.

(2008)

T. Górecki

Two parametrical derivative dynamic time warping

T. Górecki et al.

Using derivatives in time series classification

Data Min. Knowl. Discov.

(2013)

Cited by (0)

^☆: This paper has been recommended for acceptance by A. Marcelli.

View full text

Pattern Recognition Letters

Using derivatives in a longest common subsequence dissimilarity measure for time series classification☆

Abstract

Introduction

Section snippets

Longest common subsequence

Experimental setup

Results

Conclusions

Acknowledgments

Fast similarity search in the presence of noise, scaling and translation in time-series databases

A complexity-invariant distance measure for time series

Improvements of general multiple test procedures for redundant systems of hypotheses

Using dynamic time warping to find patterns in time series

Factor analysis of evolution and cluster methods on trajectories

A Fuzzy clustering model for multivariate spatial time series

J. Classification

Statistical Comparisons of Classifiers Over Multiple Data Sets

J. Mach. Learn. Res.

Querying and mining of time series data: experimental comparison of representations and distance measures

Dissimilarities between trajectories of a three-way longitudinal data set

Fast subsequence matching in timeseries databases

ACM SIGMOD Record

An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons

J. Mach. Learn. Res.

Two parametrical derivative dynamic time warping

Using derivatives in time series classification

Data Min. Knowl. Discov.