Elsevier

Information Sciences

Volume 582, January 2022, Pages 198-214
Information Sciences

Shape-Sphere: A metric space for analysing time series by their shape

https://doi.org/10.1016/j.ins.2021.08.101Get rights and content

Abstract

Shape analogy is a key technique in analyzing time series. That is, time series are compared by how much they look alike. This concept has been applied for many years in geometry. Notably, none of the current techniques describe a time series as a geometric curve that is expressed by its relative location and form in space. To fill this gap, we introduce Shape-Sphere, a vector space where time series are presented as points on the surface of a sphere. We prove a pseudo-metric property for distances in Shape-Sphere. We show how to describe the average shape of a time series set using the pseudo-metric property of Shape-Sphere by deriving a centroid from the set. We demonstrate the effectiveness of the pseudo-metric property and its centroid in capturing the ‘shape’ of a time series set, using two important machine learning techniques, namely: Nearest Centroid Classifier and K-Means clustering, using 85 publicly available data sets. Shape-Sphere improves the nearest centroid classification results when the shape is the differentiating feature while keeping the quality of clustering equivalent to current state-of-the-art techniques.

Introduction

Comparing time series by their appearance or visual “shape” has diverse applications such as classifying heartbeat ECG records into normal and abnormal signals [21], clustering accelerometer records from wearables attached to a trainee into different sets of exercises [27], [28], [29], or categorising data received from a space shuttle [7]. In these problems, the relationships between points on the time series are more important than each individual absolute value. For example, Fig. 1 shows an abnormal and a normal heartbeat from the ECG200 database [37]. We can see that both heartbeats have peaks in the highlighted areas. However, the peaks in the normal heartbeat are more distinguishable than the peaks in the abnormal one. This detectability manifests itself in the relationship between the peak and its neighbourhood values.

Traditionally, shape analysis for time series data has focused on comparisons between pairs of time series. In such an approach, shape is indirectly characterised by a mapping between smaller pieces of the two time series to each other. For example, dynamic time warping (DTW) uses dynamic programming to find the “best” mapping between the two time series [24]. A DTW value is a numeric distance measure that represents the dissimilarity between two time series. An alternative approach identifies primitives from a time series where a shape is considered to be decomposable into peaks and troughs. In this approach, similar shapes are assumed to behave similarly through time, i.e., if one goes up the other goes up and if one goes down the other goes down as well. In this case, the similarity between two time series is defined through correlation [39]. Although these approaches can be used to define the similarity between two finite time series, they do not provide any insights into the internal relationships among the points of a single time series. This insight is crucial for the interpretation of time series data. As shown in Fig. 1, the notion of the shape of a time series would reveal the underlying relationship among the points on the time series, such as the smoothness of peaks shown in the figure. Our approach is to define the notion of shape in such a way that it enables us to reconstruct the original time series exactly. In this way, a time series is modelled as a unique geometric curve that can be presented to an observer like any other geometric object such as a line, circle or cycloid..

Mathematically, a time series can be described as a curve in space. The simplest curve is a line, an object with no curvature (peaks and troughs) (Fig. 2.1). Another simple object is one with constant curvature—a circle, whose curvature is the reciprocal of its radius (Fig. 2.2). Third, consider a cycloid; the curve generated by a point on the circumference of a circle that rolls along a straight line,2 (Fig. 2.3). The cycloid’s curvature changes according to the fixed point on the circle. Differential geometry generalises this idea to uniquely determine the shape of the curve through its curvature [3]. The curvature shows the degree of deviation from a line while moving along the curve. By knowing the curvature at each point we can always generate the same curve regardless of the initial position [33]. This definition of shape brings three important characteristics for shape analysis, namely:

  • 1.

    Scale invariance,

  • 2.

    Translation invariance, and

  • 3.

    Rotation invariance.

Satisfying these properties through curvature enables us to define time series as geometric objects using their curvature. In this class, time series are compared by their appearance (shape) in space regardless of their scale, position, or rotation. Comparing time series is an important problem when trying to group similar time series. To determine whether a given time series is from a particular group, a common approach is to compare the new time series to every member of the group, which is an inefficient approach. This is the main reason for seeking a prototype to represent a group of time series. A centroid for a group of time series can be seen as the mean value for the given group, i.e., the centroid is a data point (time series) that has the minimum average distance to every other member of the group. Computing a centroid is coupled with designing a distance measure. The mean property of a centroid enables various important types of analysis on time series, such as: 1. Detecting anomalies in a semi-supervised fashion [27]. 2. Designing a fast indexing mechanism for data base retrieval [41]. 3. Generating synthetic time series data [20]. 4. Designing loss functions for optimisation problems with applications in deep neural networks [19] and clustering [30]. 5. Preserving privacy where a centroid represents a group of time series [36].

Our aim is to use differential geometric properties of curves to introduce a new representation of a time series that uniquely identifies the time series by its shape. The Fourier analysis of our representation leads us to design a vector space where each time series is uniquely defined by a vector. Our goal is to address two fundamental problems in time series analysis: 1. How to compute a prototype from a given set of time series?, and 2. How to compare a pair of time series?

We use two important tasks in time series analysis: 1. K-Means clustering and 2. Nearest Centroid Classifier (NCC), to demonstrate the effectiveness of our methods in addressing these two fundamental problems.

Section snippets

Background review

Time series analysis techniques can be divided into three main categories:

  • 1.

    Analysing the time series by extracting features from the series,

  • 2.

    Analysing the time series by transforming the time series into (often statistical or neural network) models, and

  • 3.

    Analysing the time series by their shape.

In feature extraction, the goal is to transform the time series into a feature space having, for example, statistical features like the mean and standard deviation. The time series is then processed as a

Problem definition

We aim to define a time series as a discrete geometric curve generated by sampling a continuous planar curve. Thus, we only consider finite, planar time series. To define the geometric shape of a time series, we first define the time series and then review the definition of shape through curvature from geometry.

Time series: A time series Ts of length n is a finite ordered set of real value measurements, say {Y(tk)} through time starting from time t1 and ending at time tn:Ts={Y(t1),Y(t2),,Y(tn)}

Shape similarity

The Shape-Series (Definition 2) quantifies the shape of a curve and can be used to compare two curves. We first show in Theorem 1 that each shape quantity uniquely identifies its associated curve modulo translations and rotations.

Proposition 1

(Shape uniqueness proposition) Given a Shape-Series of a planar curve on the interval [t1,tn], there is a unique translation- and rotation-invariant curve that is quantified by the Shape-Series.

Proof

From the first fundamental theorem of calculus and the definition of the

Computing the average shape—Contour prototypes

We define the average shape of a set of time series as the set’s contour.

Definition 4

Contour: We define the contour of a set of time series {Vs} as the Shape-Series that has the maximum similarity (minimum distance) to every Shape-Series in Vs. Thus, the contour, C, of a set of time series Vs,={TiS|1in}is the vector calculated by averaging all the vectors from Vs that represents Ts in Shape-Sphere:

CVs=i=1nVSTiSn

To make contours insensitive to translation we use the periodic property of the Fourier

Comparison of centroids

In Proposition 1 we discussed the one-to-one relationship between a curve and its curvature translation. We transfer the curvature using a cumulative sum of the curvature series. The cumulative sum is equivalent to taking the integral of the curvature series. The result is a unique series up to a constant factor. This uniqueness allows us to analyse time series in the new feature space. This approach is different from the scheme used by DBA, cDBA and SE in K-Shape where a set of time series is

Experimental evaluation

In this section we compare DBA, cDBA and K-Shape to our approach on two problems: 1. Classification using the Nearest Centroid Classifier (NCC) and 2. Clustering using the K-Means clustering algorithm.

We empirically evaluated our method using the entire UCR-2015 time series archive (85 data sets) [12]. Our goal is to answer three questions: 1. How does the NCC classifier based on the contours extracted using the Contour procedure compare to NCCs based on the prototypes computed using the other

Discussion

The experiments in Section 7 show that the proposed methods based on Shape-Sphere for analysing a finite set of time series outperforms the Dynamic Warping and Constrained Warping comparison methods in NCC classification for the 48 data sets. Our method’s performance is competitive to the K-Shape method in this experiment using the 85 data sets. Our clustering experiment showed the proposed method performs at least as well as the three comparison methods. Note that these results should be

Conclusions

In this paper, we proposed Shape-Sphere for representing shape features of time series. We showed how to efficiently transform time series into vectors on Shape-Sphere. We proved that Shape-Sphere, equipped with the ASD measure of similarity, is a pseudometric space under ASD, and showed how to use it to compute an efficient shape average for a given cluster of time series. We proved theoretically and showed experimentally that the angular distance in Shape-Sphere (ASD) has the best time

Future work

We presented Shape-Sphere—a model for analysing time series by their geometric shape. We showed that the geometric centroid for a given set of time series in Shape-Sphere is computed with the arithmetic mean of the members of the set. As shown by Petitjean et al. [42] the NCC classification task can be improved by selecting multiple centroids for a given set. Our NCC result can be improved by dividing a given set of time series into convex subsets. Thus, a future improvement is to design a

CRediT authorship contribution statement

Yousef Kowsar: Conceptualization, Methodology, Software, Writing - original draft. Masud Moshtaghi: Conceptualization, Supervision. Eduardo Velloso: Validation, Formal analysis, Investigation. James C. Bezdek: Validation, Formal analysis, Investigation, Writing - original draft. Lars Kulik: Conceptualization, Resources. Christopher Leckie: Conceptualization, Writing - original draft, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This research was supported by the use of the NeCTAR Research Cloud, a collaborative Australian research platform supported by the National Collaborative Research Infrastructure Strategy. We would also like to acknowledge the publicly available time series data set available athttp://www.timeseriesclassification.com [5].

References (50)

  • S. Aghabozorgi et al.

    Time-series clustering – A decade review

    Journal of Information Systems

    (2015)
  • P. Leon-Alcaide et al.

    An evolutionary approach for efficient prototyping of large time series datasets

    Information Sciences

    (2020)
  • D.J. Williams et al.

    A fast algorithm for active contours and curvature estimation

    CVGIP: Image Understanding

    (1992)
  • W.H. Abdulla et al.

    Cross-words reference template for dtw-based speech recognition systems

  • I.M. Anderson et al.

    Curvature and tangential deflection of discrete arcs: A theory based on the commutator of scatter matrix pairs and its application to vertex detection in planar shape data

  • A. Bagnall et al.

    The great time series classification bake off: A review and experimental evaluation of recent algorithmic advances

    Journal of Data Mining and Knowledge Discovery

    (2017)
  • A. Bagnall et al.

    The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances

    Data Mining and Knowledge Discovery

    (2017)
  • I. Batal et al.

    A supervised time series feature extraction technique using dct and dwt

  • G. Bertoldi et al.

    Black holes in asymptotically Lifshitz spacetimes with arbitrary critical exponent

    Physical Review

    (2009)
  • K. Buza et al.

    Process: Projection-based classification of electroencephalograph signals

  • Y. Cai, R. Ng, Indexing spatio-temporal trajectories with chebyshev polynomials, in: Proc. of the 2004 International...
  • K. Chakrabarti et al.

    Locally adaptive dimensionality reduction for indexing large time series databases

    ACM Transactions on Database Systems

    (2002)
  • K.P. Chan et al.

    Efficient time series matching by wavelets

  • Y. Chen et al.

    The UCR Time Series Classification Archive

    (2015)
  • D. Coeurjolly et al.

    Discrete curvature based on osculating circle estimation

  • C. Faloutsos, M. Ranganathan, Y. Manolopoulos, Fast subsequence matching in time-series databases, in: Proc. of the...
  • P. Geurts

    Pattern extraction for time series classification

  • M. Gribskov, Identification of sequence patterns, motifs and domains, in: Reference Module in Life Sciences. Elsevier,...
  • L. Gupta et al.

    Nonlinear alignment and averaging for estimating the evoked potential

    IEEE Transactions on Biomedical Engineering

    (1996)
  • L. Hubert et al.

    Comparing partitions

    Journal of Classification

    (1985)
  • Y. Kang et al.

    Deep convolutional identifier for dynamic modeling and adaptive control of unmanned helicopter

    IEEE Transactions on Neural Networks and Learning Systems

    (2019)
  • L. Kegel et al.

    Feature-based comparison and generation of time series

  • C.E. Kennedy et al.

    Time series analysis as input for clinical predictive modeling: Modeling cardiac arrest in a pediatric ICU

    Theoretical Biology and Medical Modelling

    (2011)
  • E. Keogh et al.

    Dimensionality reduction for fast similarity search in large time series databases

    Knowledge and Information Systems

    (2001)
  • E. Keogh et al.

    On the need for time series data mining benchmarks: A survey and empirical demonstration

    Journal of Data Mining and Knowledge Discovery

    (2003)
  • Cited by (0)

    1

    Present address: Amazon, Manhattan Beach, CA.

    View full text