Discovering original motifs with different lengths from time series
Introduction
Given the wide utilization of information technology, large amounts of data are being collected during scientific experiments and normal business operations. Data variables collected for many observations at different time periods also result in massive amounts of data. Data from these kinds of observations that are sequentially constructed over time are called a time series. Generally speaking, a time series is a sequence of real numbers where each number represents a value at a given point in time. For example, a sequence could represent a POS (point-of-system) transaction, exchange rates, or weather data over time [1].
Many studies on time series such as seasonality, forecasting, and trend analysis have been carried out from the angle of statistics. In recent years, mining time series has become an important research topic in the data mining and KDD fields [2]. Among these studies, discovering those frequently occurring but previously unknown patterns has received much attention, since it is not only a stand-alone mining process, but also widely used as a data preprocessing routine for other data mining tasks [3]. Some research works use the word “motifs” to refer to the hidden patterns in time series sequences and the proposed algorithms such as the k-motif, which has proved to be effective and efficient [3], [4].
However, the k-motif algorithm is sensitive to the parameter w, which is the length of the pattern to be discovered. When the pattern is unknown, the value of w is very difficult to estimate. Based on this traditional motif-discovery algorithm, this paper proposes a novel algorithm to solve the above-mentioned problem. Our approach does not require an exact w value to be determined in advance, and, moreover, it can be used to identify motifs with different lengths by running it only once. Its effectiveness has been demonstrated by the experimental results in this paper.
The rest of the paper is organized as follows: Section 2 gives a review of the related research studies and their motivations. Section 3 introduces the related definitions as well as the proposed algorithm. Section 4 shows the result of the experiments. Finally, the summary and implications are given in Section 5.
Section snippets
Related research and motivations
Discovering patterns from a time series sequence is an important data mining task and much research attention has been devoted to this area [5], [6], [7], [8], [9]. Most existing work concentrates on the similarity problem, i.e., it is based on a specific sequence (the keyword sequence) and the attempt to locate similar sequences from the database or similar subsequences for a given sequence.
In some cases, we may also need to identify previously unknown patterns that occur frequently in a time
Definitions and algorithm
In this section, we review relevant definitions and propose a novel algorithm for finding motifs with different lengths in time series. Definition 1, Definition 2, Definition 3 are based on the existing work, while the motif-concatenation algorithm and Definition 4, Definition 5 are given by the authors. Definition 1 Time series A time series T = <t1, … , tm> is a finite sequence of real-valued variables, where m is the length of the time series.
Suppose for any 1 ⩽ p ⩽ q,sp is a subsequence of the time series T with length w,
Experiment A
This experiment consists of two stages. In the first stage, we run the original algorithm on the testing dataset and results are explained. In the second stage, the proposed algorithm is applied to the same dataset to illustrate its advantages. The basic testing dataset is created by the cylinder–bell–funnel (c–b–f) synthetic approach, which is widely used as the testing dataset for time series analysis [14]. The three different shapes (cylinder, bell, and funnel) and an additional synthetic
Conclusion
The focus of this paper is placed on a novel motif-discovery algorithm. The major contribution of our work is the proposition of a novel approach to improve the widely used k-motif algorithm, which suffers from the problem of the setting of parameter w. More importantly, the conventional k-motif approach can only discover patterns with a predefined length, which is normally only a fraction of the original patterns. In contrast, the proposed approach is capable of discovering the original whole
Acknowledgements
We thank Dr. Eamonn Keogh who provided the source code of the original motif-discovery algorithm. This research is directly supported by a CERG grant (CityU1236/03E) of RGC, Hong Kong SAR.
References (15)
- et al.
Data mining on time series: an illustration using fast-food restaurant franchise data
Computational Statistics & Data Analysis
(2001) - Y. K. M. Last, and A. Kandel, Knowledge discovery in time series databases, IEEE Trans on System, Man and Cybernetics,...
- K. Lin J., E., Patel, P. and Lonardi, S., Finding Motifs in Time Series, presented at In the 2nd Workshop on Temporal...
- B. Chiu, E. Keogh, and S. Lonardi, Probabilistic Discovery of Time Series Motifs, presented at In Proceedings of the...
- X. Ge and P. Smyth, Deformable Markov model templates for time-series pattern matching, presented at proceedings of the...
- et al.
Dimensionality reduction for fast similarity search in large time series databases
Journal of Knowledge and Information Systems
(2000) - S. S. Liao, H. Tang, and W.-Y. Liu, Finding Relevant Sequences in Time Series Containing Crisp, Interval and Fuzzy...