Shape-Sphere: A metric space for analysing time series by their shape
Introduction
Comparing time series by their appearance or visual “shape” has diverse applications such as classifying heartbeat ECG records into normal and abnormal signals [21], clustering accelerometer records from wearables attached to a trainee into different sets of exercises [27], [28], [29], or categorising data received from a space shuttle [7]. In these problems, the relationships between points on the time series are more important than each individual absolute value. For example, Fig. 1 shows an abnormal and a normal heartbeat from the ECG200 database [37]. We can see that both heartbeats have peaks in the highlighted areas. However, the peaks in the normal heartbeat are more distinguishable than the peaks in the abnormal one. This detectability manifests itself in the relationship between the peak and its neighbourhood values.
Traditionally, shape analysis for time series data has focused on comparisons between pairs of time series. In such an approach, shape is indirectly characterised by a mapping between smaller pieces of the two time series to each other. For example, dynamic time warping (DTW) uses dynamic programming to find the “best” mapping between the two time series [24]. A DTW value is a numeric distance measure that represents the dissimilarity between two time series. An alternative approach identifies primitives from a time series where a shape is considered to be decomposable into peaks and troughs. In this approach, similar shapes are assumed to behave similarly through time, i.e., if one goes up the other goes up and if one goes down the other goes down as well. In this case, the similarity between two time series is defined through correlation [39]. Although these approaches can be used to define the similarity between two finite time series, they do not provide any insights into the internal relationships among the points of a single time series. This insight is crucial for the interpretation of time series data. As shown in Fig. 1, the notion of the shape of a time series would reveal the underlying relationship among the points on the time series, such as the smoothness of peaks shown in the figure. Our approach is to define the notion of shape in such a way that it enables us to reconstruct the original time series exactly. In this way, a time series is modelled as a unique geometric curve that can be presented to an observer like any other geometric object such as a line, circle or cycloid..
Mathematically, a time series can be described as a curve in space. The simplest curve is a line, an object with no curvature (peaks and troughs) (Fig. 2.1). Another simple object is one with constant curvature—a circle, whose curvature is the reciprocal of its radius (Fig. 2.2). Third, consider a cycloid; the curve generated by a point on the circumference of a circle that rolls along a straight line,2 (Fig. 2.3). The cycloid’s curvature changes according to the fixed point on the circle. Differential geometry generalises this idea to uniquely determine the shape of the curve through its curvature [3]. The curvature shows the degree of deviation from a line while moving along the curve. By knowing the curvature at each point we can always generate the same curve regardless of the initial position [33]. This definition of shape brings three important characteristics for shape analysis, namely:
- 1.
Scale invariance,
- 2.
Translation invariance, and
- 3.
Rotation invariance.
Satisfying these properties through curvature enables us to define time series as geometric objects using their curvature. In this class, time series are compared by their appearance (shape) in space regardless of their scale, position, or rotation. Comparing time series is an important problem when trying to group similar time series. To determine whether a given time series is from a particular group, a common approach is to compare the new time series to every member of the group, which is an inefficient approach. This is the main reason for seeking a prototype to represent a group of time series. A centroid for a group of time series can be seen as the mean value for the given group, i.e., the centroid is a data point (time series) that has the minimum average distance to every other member of the group. Computing a centroid is coupled with designing a distance measure. The mean property of a centroid enables various important types of analysis on time series, such as: 1. Detecting anomalies in a semi-supervised fashion [27]. 2. Designing a fast indexing mechanism for data base retrieval [41]. 3. Generating synthetic time series data [20]. 4. Designing loss functions for optimisation problems with applications in deep neural networks [19] and clustering [30]. 5. Preserving privacy where a centroid represents a group of time series [36].
Our aim is to use differential geometric properties of curves to introduce a new representation of a time series that uniquely identifies the time series by its shape. The Fourier analysis of our representation leads us to design a vector space where each time series is uniquely defined by a vector. Our goal is to address two fundamental problems in time series analysis: 1. How to compute a prototype from a given set of time series?, and 2. How to compare a pair of time series?
We use two important tasks in time series analysis: 1. K-Means clustering and 2. Nearest Centroid Classifier (NCC), to demonstrate the effectiveness of our methods in addressing these two fundamental problems.
Section snippets
Background review
Time series analysis techniques can be divided into three main categories:
- 1.
Analysing the time series by extracting features from the series,
- 2.
Analysing the time series by transforming the time series into (often statistical or neural network) models, and
- 3.
Analysing the time series by their shape.
In feature extraction, the goal is to transform the time series into a feature space having, for example, statistical features like the mean and standard deviation. The time series is then processed as a
Problem definition
We aim to define a time series as a discrete geometric curve generated by sampling a continuous planar curve. Thus, we only consider finite, planar time series. To define the geometric shape of a time series, we first define the time series and then review the definition of shape through curvature from geometry.
Time series: A time series of length n is a finite ordered set of real value measurements, say through time starting from time and ending at time
Shape similarity
The Shape-Series (Definition 2) quantifies the shape of a curve and can be used to compare two curves. We first show in Theorem 1 that each shape quantity uniquely identifies its associated curve modulo translations and rotations. Proposition 1 (Shape uniqueness proposition) Given a Shape-Series of a planar curve on the interval , there is a unique translation- and rotation-invariant curve that is quantified by the Shape-Series. Proof From the first fundamental theorem of calculus and the definition of the
Computing the average shape—Contour prototypes
We define the average shape of a set of time series as the set’s contour. Definition 4 Contour: We define the contour of a set of time series as the Shape-Series that has the maximum similarity (minimum distance) to every Shape-Series in . Thus, the contour, C, of a set of time series is the vector calculated by averaging all the vectors from that represents in Shape-Sphere:
To make contours insensitive to translation we use the periodic property of the Fourier
Comparison of centroids
In Proposition 1 we discussed the one-to-one relationship between a curve and its curvature translation. We transfer the curvature using a cumulative sum of the curvature series. The cumulative sum is equivalent to taking the integral of the curvature series. The result is a unique series up to a constant factor. This uniqueness allows us to analyse time series in the new feature space. This approach is different from the scheme used by DBA, cDBA and SE in K-Shape where a set of time series is
Experimental evaluation
In this section we compare DBA, cDBA and K-Shape to our approach on two problems: 1. Classification using the Nearest Centroid Classifier (NCC) and 2. Clustering using the K-Means clustering algorithm.
We empirically evaluated our method using the entire UCR-2015 time series archive (85 data sets) [12]. Our goal is to answer three questions: 1. How does the NCC classifier based on the contours extracted using the Contour procedure compare to NCCs based on the prototypes computed using the other
Discussion
The experiments in Section 7 show that the proposed methods based on Shape-Sphere for analysing a finite set of time series outperforms the Dynamic Warping and Constrained Warping comparison methods in NCC classification for the 48 data sets. Our method’s performance is competitive to the K-Shape method in this experiment using the 85 data sets. Our clustering experiment showed the proposed method performs at least as well as the three comparison methods. Note that these results should be
Conclusions
In this paper, we proposed Shape-Sphere for representing shape features of time series. We showed how to efficiently transform time series into vectors on Shape-Sphere. We proved that Shape-Sphere, equipped with the ASD measure of similarity, is a pseudometric space under ASD, and showed how to use it to compute an efficient shape average for a given cluster of time series. We proved theoretically and showed experimentally that the angular distance in Shape-Sphere (ASD) has the best time
Future work
We presented Shape-Sphere—a model for analysing time series by their geometric shape. We showed that the geometric centroid for a given set of time series in Shape-Sphere is computed with the arithmetic mean of the members of the set. As shown by Petitjean et al. [42] the NCC classification task can be improved by selecting multiple centroids for a given set. Our NCC result can be improved by dividing a given set of time series into convex subsets. Thus, a future improvement is to design a
CRediT authorship contribution statement
Yousef Kowsar: Conceptualization, Methodology, Software, Writing - original draft. Masud Moshtaghi: Conceptualization, Supervision. Eduardo Velloso: Validation, Formal analysis, Investigation. James C. Bezdek: Validation, Formal analysis, Investigation, Writing - original draft. Lars Kulik: Conceptualization, Resources. Christopher Leckie: Conceptualization, Writing - original draft, Supervision.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
This research was supported by the use of the NeCTAR Research Cloud, a collaborative Australian research platform supported by the National Collaborative Research Infrastructure Strategy. We would also like to acknowledge the publicly available time series data set available athttp://www.timeseriesclassification.com [5].
References (50)
- et al.
Time-series clustering – A decade review
Journal of Information Systems
(2015) - et al.
An evolutionary approach for efficient prototyping of large time series datasets
Information Sciences
(2020) - et al.
A fast algorithm for active contours and curvature estimation
CVGIP: Image Understanding
(1992) - et al.
Cross-words reference template for dtw-based speech recognition systems
- et al.
Curvature and tangential deflection of discrete arcs: A theory based on the commutator of scatter matrix pairs and its application to vertex detection in planar shape data
- et al.
The great time series classification bake off: A review and experimental evaluation of recent algorithmic advances
Journal of Data Mining and Knowledge Discovery
(2017) - et al.
The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances
Data Mining and Knowledge Discovery
(2017) - et al.
A supervised time series feature extraction technique using dct and dwt
- et al.
Black holes in asymptotically Lifshitz spacetimes with arbitrary critical exponent
Physical Review
(2009) - et al.
Process: Projection-based classification of electroencephalograph signals
Locally adaptive dimensionality reduction for indexing large time series databases
ACM Transactions on Database Systems
Efficient time series matching by wavelets
The UCR Time Series Classification Archive
Discrete curvature based on osculating circle estimation
Pattern extraction for time series classification
Nonlinear alignment and averaging for estimating the evoked potential
IEEE Transactions on Biomedical Engineering
Comparing partitions
Journal of Classification
Deep convolutional identifier for dynamic modeling and adaptive control of unmanned helicopter
IEEE Transactions on Neural Networks and Learning Systems
Feature-based comparison and generation of time series
Time series analysis as input for clinical predictive modeling: Modeling cardiac arrest in a pediatric ICU
Theoretical Biology and Medical Modelling
Dimensionality reduction for fast similarity search in large time series databases
Knowledge and Information Systems
On the need for time series data mining benchmarks: A survey and empirical demonstration
Journal of Data Mining and Knowledge Discovery
Cited by (0)
- 1
Present address: Amazon, Manhattan Beach, CA.