An evolutionary approach for efficient prototyping of large time series datasets

doi:10.1016/j.ins.2019.09.044

Information Sciences

Volume 511, February 2020, Pages 74-93

https://doi.org/10.1016/j.ins.2019.09.044 Get rights and content

Abstract

We here describe an algorithm based on an evolutionary strategy to find the prototype series of a set of time series, and we use Dynamic Time Warping (DTW) as a distance measure between series, and do not restrict the search space to the series in the set. The problem of calculating the centroid of a set of time series can be addressed as a minimization problem, using genetic algorithms. Our proposal may be considered among the set of non-classical approaches to genetic algorithms, where an individual gene is a candidate time series for being the centroid or representative of the whole set of series. The representation and operators of genetic algorithms are redesigned, in order to generate efficient summaries, the fitness function of each candidate series to be a prototype is approximated, comparing them only with a subset of randomly selected time series from the original dataset. Three areas are looked at in order to assess the goodness of our proposal: the performance of the prototype generated in terms of a fitness function, the consistency of the prototype generation for use in classical grouping algorithms, and its use in classification algorithms based on the nearest prototypes.

Introduction

At present, due to the great increase in the volume of data available, in turn largely due to the increase in storage capacity and the growth of connected sensors, techniques for handling the data are increasingly necessary in order to obtain acceptable results within a reasonable time. One of the most important data types, because of its abundance and diversity, is the time series. For example, seismic series, temperature evolution, sound representation, stock exchange movements, and electrocardiograms. Due to the temporal nature of this type of data, time series have specific characteristics that require special treatment in order for interesting patterns to be discovered. For example, similarity searches among time series can be performed by applications such as clustering, classification, summarization, etc. Time series summarization tries to summarize a set of series by the use of a single representative, a common task in several grouping algorithms. The summary thus obtained can provide high level information that facilitates the identification of frequently appearing patterns among other applications. Summarizing a set of time series by one representative is a conceptually simple task when using a rigid distance, such as the Manhattan, Euclidean, Minkowski, or Chebyshev distance. When using an elastic distance, such as DTW, however, the problem of calculating the representative becomes much harder, in terms of computational complexity, although such use may be a determining factor in improving the results of supervised and unsupervised learning tasks. Basically, such distances take a natural approach to dealing with the nature of this type of data, leading to the development of more accurate and better learning models. For example, Fig. 1 shows how, on many occasions, the use of Euclidean distance is not suitable for extracting the representative/prototype from a collection of series. Fig. 1.(a) shows a collection of series that are summarized in Fig. 1.(b) using Euclidean distance and in Fig. 1.(c) using a DTW distance. The error in both amplitude and shape can be seen in Fig. 1.(b), while Fig. 1.(c) shows a correct and more natural summarization.

Dynamic Time Warping (DTW) is a measure of the distance between two time series, belonging to the set of elastic distances [33]. These distances, as opposed to rigid ones, which measure the index to index distance, look for a link between the shapes of the time series. Thus, the distance is not measured between the ith element of one series and the ith element of the other: there can even exist a one to many relation between the positions. The DTW distance was first used in speech recognition in the 1970s, to measure the similarity between the templates representing the words and the sound that was recognized by them [28], [32]. Its use was widespread in this field because it is a very robust measure in the face of the displacements, widening, and narrowing of the shapes of the series that arise due to the different speeds at which the same word is said. A few years later, due to the increase in the amount of data available, its use was suggested for the detection of patterns in time series in general, not only in those related to voice recognition [4]. Currently, DTW is probably the most common measure of elastic distance used between time series, as indicated in [23]. The current techniques most often used to find the representative of a set of series using an elastic distance usually select the series of the set that most closely resembles the rest, and this kind of method has two clear drawbacks. First, a matrix of distances between all the series must be developed, which requires a huge number of calculations when the amount of data handled is large. And secondly, the series that most closely resembles the rest of the series of the set may not not remotely be the series that best sums up that set.

With respect to the implementation of the DTW, its goal is to align two sequences of characteristic vectors by deforming the time axis iteratively until an optimal match (according to a suitable metric) is found between the two sequences. A direct implementation of the classical definition of this distance gives rise to an algorithm of exponential complexity over time. Fortunately, due to the overlap of many of the sub-problems, partial results can be saved for reuse. The problem DTW(X,Y) can thus be solved with a complexity of O(|X| · |Y|), by means of a dynamic implementation, as shown in Algorithm 1, where |X| and |Y| represent the sizes of the time series. This algorithm makes use of a distance δ and a cost matrix M, in which the x-axis represents the X series and the y-axis the Y series. The position M(i, j) is the cost of the best alignment of the sub-sequences X_i and Y_j. The total cost of the best alignment is thereforefound in the position M(|X|, |Y|). Once we have the costs of each box, it is simple to calculate the optimal alignment by running through the lowest cost neighbors from position M(1, 1) to M(|X|, |Y|).

Thus, in this paper we present a method to summarize a set of time series using DTW, devoting special attention to making the algorithms scalable, so as to be applicable to large sets of series. This is possible if we address the problem with an evolutionary approach, where the computation of the centroid is treated as a minimization problem.

The problem of calculating the centroid of a set of time series $S = S_{1}, S_{2}, \dots, S_{n}$ using the Euclidean distance is as simple as calculating the average of the elements of each series. Thus, if you have a set of n series with length m, the centroid $C = {c_{1}, c_{2}, \dots, c_{m}}$ is a time series and is calculated as follows: $c_{i} = \frac{\sum_{j = 1}^{n} S_{i j}}{n}$

Solving this same problem using an elastic distance, such as DTW, is much more complicated. This is mainly due to the fact that when measuring the similarity between two series using an elastic distance, there may be a correspondence of several elements from one series to one of the other, rather than the one-to-one relationship that exists if the Euclidean distance is used. Therefore, the search for the centroid becomes an optimization problem, in which a series is sought that minimizes its DTW distance from the series of the set S. If we have a set of n series, the centroid C of that set is obtained by solving the following minimization problem: $\min_{C \in S} \sum_{i = 1}^{n} D T W^{2} (C, S_{i})$

This minimization problem is resolved trivially if the sequencing space in which the centroid is sought is restricted to the set of series from which it started, that is, S. In this case, the average series must belong to the set of starting series, C ∈ S. To find the sequence that minimizes the distance to the rest it is necessary to calculate every distance $D T W (S_{i}, S_{j}), \forall i, j = {1, . . ., n}$ .

This solution therefore brings with it a complexity over time of O(n²), which is un feasible when the set of series grows. Without restricting the search space to the initial set of series, the problem is complicated. The necessary series is called Steiner’s series, related to Steiner’s tree theory, and the algorithm that produces it has a temporal and spatial complexity of O(mⁿ), which is unfeasible as soon as you have a set of several sequences. After more than 30 years of research, a scalable algorithm to calculate the Steiner series has not been found. Thus, it is proposed to develop an algorithm based on an evolutionary strategy where the use of genetic algorithms takes a non-classical approach. Furthermore, the representation of the genes is not binary and the design of the operators must depend on the representation, and on our previously stated goals.

The classical genetic algorithm, while useful for solving many problems, has restrictions that can make it difficult to use. It can be modified to apply the crossover and mutation operations in a different order, and even define these operations in a different way, better adapting them to the problem to be solved. In the classical definition of a genetic algorithm, a chromosome is a sequence of binary symbols. However, some papers, such as [10], use alphabets of greater cardinality. Goldberg developed the theory of virtual alphabets [9], with which he shows why non-binary representations work well. Other experimental studies, in which high-cardinality alphabets are selected, have used chromosomes where each symbol represents an integer or decimal number [15], [20]. In most problems, this type of representation facilitates the use of the GA because the parameters to be optimized have a numerical form, and the most natural coding of any problem is not having to perform any. Experiments have been carried out in [15] to show that representations using decimal numbers achieve better, more consistent, and faster solutions. Due to these new encodings, other types of mutation and crossover operators have also arisen, which deal more naturally with problems in which individuals are formed by decimal or integer numbers. With respect to the operators defined by the problem domain, although most GA investigations have used traditional crossover and mutation operators, some have advocated designing new operators for each problem, taking into account the specific knowledge domain of the problem. Because GAs are designed to solve real problems, incorporating knowledge of the problem makes sense if it leads to better results. In [30], it is argued that specific knowledge of the problem can be introduced into the crossover operator. This knowledge can be used to prevent generating chromosomes that are obviously bad or violate the restrictions of the problem. It can also be used to generate an initial population and prevent it from being completely random. In [10], Goldberg describes techniques for adding problem domain knowledge to operations such as crossover and mutation. Finally, another kind of approach presents a set of operators and techniques related to GAs to solve way multi-objective problems efficiently with large-scale training sets, combining evolutionary-based and heuristic-based algorithms [35], or proposing modifications of the shift-based density estimation (SDE) strategy [18].

Our main goal is to design an efficient algorithm based on an evolutionary strategy to find the prototype series of large sets of time series where:

•
The prototype series of the set is not a series belonging to the set itself, as is the case with most classical k-medoids techniques.
•
The representation of the genes is based on the similarities of the shapes of the series, by defining the concept of segment and using an elastic distance measurement.
•
The efficiency of this technique is mainly based on considering an approximation of the fitness function by extracting a subset of the overall set of series. Then this subset is the one to be compared with the prototype population. This allows the method to be scalable and towork with large sets of time series.
•
The crossover operator exchanges segments obtained by the alignment concept defined in DTW.
•
The mutation operator modifies the shape of the time series represented by each individual in the population. Three different mutation operators are implemented, each one being applied with a certain probability. The main difference between them is that they have the capability of modifying the shape of the time series in different ways.

The rest of this paper is organized as follows: First, in Section 2, we give an overview of the problem of finding the prototype of a set of time series, focusing our attention on evolutionary approaches. Then, we introduce DBA and COMASA because of their close relationship with our proposal. Section 3 introduces the implementation of the genetic algorithm our method is based on. Lastly, Section 4 describes the experiments and compares their results with other similar techniques. Section 5 presents some conclusions and future research lines.

Section snippets

Background

Firstly, in this section we will look at different articles related to our work within the topic of the search for prototypes in time series. We shall then look at DBA and COMASA as they are the evolutionary techniques that most closely approximate our proposal.

Proposed algorithm: GA-segments

The goal of this section is to describe the implementation of a GA with specific operators linked with the domain of the problem. This algorithm is referred to as GA-segments. Firstly, the representation of the individuals and the meaning of the basic operators will be described. We will then focus our attention on describing in detail the particular design of each operator in the genetic algorithm. Before starting, the general operation of a GA is outlined in Algorithm 2.

Experiments

We wished to evaluate three things: the performance of the prototype generated in terms of its fitness function, the consistency of the generation of the prototype for use in classic grouping algorithms, and finally its use in classification algorithms based on the nearest prototypes.

Conclusions and future work

This section presents the conclusions obtained from this study, analysing the achievement assessing the fulfilment of the objectives set at the beginning of the paper. It also details possible improvements and future work.

Declaration of Competing Interest

The authors whose names are listed immediately below certify that they have NO affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers’ bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in

Acknowledgements

Supported by Project TIN2015-64776-C3-3-R of the Spanish Ministry of Science and Innovation, co-funded by the European Regional Development Fund.

References (36)

S. Aghabozorgi et al.
Time-series clustering–a decade review
Inf. Syst.
(2015)
M. Li et al.
Shift-based density estimation for Pareto-based algorithms in many-objective optimization
IEEE Trans. Evol. Comput.
(2014)
F. Petitjean et al.
Summarizing a set of time series by averaging: from Steiner sequence to compact multiple alignment
Theor. Comput. Sci.
(2012)
F. Petitjean et al.
A global averaging method for dynamic time warping, with applications to clustering
Pattern Recognit.
(2011)
S. Pravilovic et al.
Using multiple time series analysis for Geosensor data forecasting
Inf. Sci.
(2017)
T. Rajesh et al.
Hybrid clustering algorithm for time series data - a literature survey
Proceedings of the International Conference on Big Data Analytics and Computational Intelligence (ICBDAC)
(2017)
Z. Zhang et al.
Dynamic time warping under limited warping path length
Inf. Sci.
(2017)
S. Aghabozorgi et al.
Clustering of large time series datasets
Intell. Data Anal.
(2014)
R.G. Andrzejak et al.
Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: dependence on recording region and brain state
Phys. Rev. E
(2001)
D.J. Berndt et al.
Using dynamic time warping to find patterns in time series
Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining
(1994)

J. Chen et al.

A bi-layered parallel training architecture for large-scale convolutional neural networks

IEEE Trans. Parallel Distrib. Syst.

(2018)

J. Chen et al.

A periodicity-based parallel time series prediction algorithm in cloud computing environments

Inf. Sci.

(2018)

Y. Chen, E. Keogh, B. Hu, N. Begum, A. Bagnall, A. Mueen, G. Batista, The UCR time series classification archive, 2015,...

F.-L. Chung et al.

An evolutionary approach to pattern-based time series segmentation

IEEE Trans. Evol. Comput.

(2004)

D. Goldberg

The theory of virtual alphabets

Parallel Probl. Solv. Nat.

(1991)

D.E. Goldberg et al.

Genetic algorithms and machine learning

Mach. Learn.

(1988)

L. Gupta et al.

Nonlinear alignment and averaging for estimating the evoked potential

IEEE Trans. Biomed. Eng.

(1996)

L. Gupta et al.

Nonlinear alignment and averaging for estimating the evoked potential

IEEE Trans. Biomed. Eng.

(1996)

Cited by (0)

View full text

An evolutionary approach for efficient prototyping of large time series datasets

Abstract

Introduction

Section snippets

Background

Proposed algorithm: GA-segments

Experiments

Conclusions and future work

Declaration of Competing Interest

Acknowledgements

Inf. Syst.

IEEE Trans. Evol. Comput.

Theor. Comput. Sci.

Pattern Recognit.

Inf. Sci.

Inf. Sci.

Clustering of large time series datasets

Intell. Data Anal.

Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: dependence on recording region and brain state

Phys. Rev. E

Using dynamic time warping to find patterns in time series

Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining

A bi-layered parallel training architecture for large-scale convolutional neural networks

IEEE Trans. Parallel Distrib. Syst.

A periodicity-based parallel time series prediction algorithm in cloud computing environments

Inf. Sci.

An evolutionary approach to pattern-based time series segmentation

IEEE Trans. Evol. Comput.

The theory of virtual alphabets

Parallel Probl. Solv. Nat.

Genetic algorithms and machine learning

Mach. Learn.

Nonlinear alignment and averaging for estimating the evoked potential

IEEE Trans. Biomed. Eng.

Nonlinear alignment and averaging for estimating the evoked potential

IEEE Trans. Biomed. Eng.