Elsevier

Knowledge-Based Systems

Volume 21, Issue 7, October 2008, Pages 666-671
Knowledge-Based Systems

Discovering original motifs with different lengths from time series

https://doi.org/10.1016/j.knosys.2008.03.022Get rights and content

Abstract

Finding previously unknown patterns in a time series has received much attention in recent years. Of the associated algorithms, the k-motif algorithm is one of the most effective and efficient. It is also widely used as a time series preprocessing routine for many other data mining tasks. However, the k-motif algorithm depends on the predefine of the parameter w, which is the length of the pattern. This paper introduces a novel k-motif-based algorithm that can solve the existing problem and, moreover, provide a way to generate the original patterns by summarizing the discovered motifs.

Introduction

Given the wide utilization of information technology, large amounts of data are being collected during scientific experiments and normal business operations. Data variables collected for many observations at different time periods also result in massive amounts of data. Data from these kinds of observations that are sequentially constructed over time are called a time series. Generally speaking, a time series is a sequence of real numbers where each number represents a value at a given point in time. For example, a sequence could represent a POS (point-of-system) transaction, exchange rates, or weather data over time [1].

Many studies on time series such as seasonality, forecasting, and trend analysis have been carried out from the angle of statistics. In recent years, mining time series has become an important research topic in the data mining and KDD fields [2]. Among these studies, discovering those frequently occurring but previously unknown patterns has received much attention, since it is not only a stand-alone mining process, but also widely used as a data preprocessing routine for other data mining tasks [3]. Some research works use the word “motifs” to refer to the hidden patterns in time series sequences and the proposed algorithms such as the k-motif, which has proved to be effective and efficient [3], [4].

However, the k-motif algorithm is sensitive to the parameter w, which is the length of the pattern to be discovered. When the pattern is unknown, the value of w is very difficult to estimate. Based on this traditional motif-discovery algorithm, this paper proposes a novel algorithm to solve the above-mentioned problem. Our approach does not require an exact w value to be determined in advance, and, moreover, it can be used to identify motifs with different lengths by running it only once. Its effectiveness has been demonstrated by the experimental results in this paper.

The rest of the paper is organized as follows: Section 2 gives a review of the related research studies and their motivations. Section 3 introduces the related definitions as well as the proposed algorithm. Section 4 shows the result of the experiments. Finally, the summary and implications are given in Section 5.

Section snippets

Related research and motivations

Discovering patterns from a time series sequence is an important data mining task and much research attention has been devoted to this area [5], [6], [7], [8], [9]. Most existing work concentrates on the similarity problem, i.e., it is based on a specific sequence (the keyword sequence) and the attempt to locate similar sequences from the database or similar subsequences for a given sequence.

In some cases, we may also need to identify previously unknown patterns that occur frequently in a time

Definitions and algorithm

In this section, we review relevant definitions and propose a novel algorithm for finding motifs with different lengths in time series. Definition 1, Definition 2, Definition 3 are based on the existing work, while the motif-concatenation algorithm and Definition 4, Definition 5 are given by the authors.

Definition 1 Time series

A time series T = <t1,  , tm> is a finite sequence of real-valued variables, where m is the length of the time series.

Suppose for any 1  p  q,sp is a subsequence of the time series T with length w,

Experiment A

This experiment consists of two stages. In the first stage, we run the original algorithm on the testing dataset and results are explained. In the second stage, the proposed algorithm is applied to the same dataset to illustrate its advantages. The basic testing dataset is created by the cylinder–bell–funnel (c–b–f) synthetic approach, which is widely used as the testing dataset for time series analysis [14]. The three different shapes (cylinder, bell, and funnel) and an additional synthetic

Conclusion

The focus of this paper is placed on a novel motif-discovery algorithm. The major contribution of our work is the proposition of a novel approach to improve the widely used k-motif algorithm, which suffers from the problem of the setting of parameter w. More importantly, the conventional k-motif approach can only discover patterns with a predefined length, which is normally only a fraction of the original patterns. In contrast, the proposed approach is capable of discovering the original whole

Acknowledgements

We thank Dr. Eamonn Keogh who provided the source code of the original motif-discovery algorithm. This research is directly supported by a CERG grant (CityU1236/03E) of RGC, Hong Kong SAR.

References (15)

  • S.B. L.M. Liu et al.

    Data mining on time series: an illustration using fast-food restaurant franchise data

    Computational Statistics & Data Analysis

    (2001)
  • Y. K. M. Last, and A. Kandel, Knowledge discovery in time series databases, IEEE Trans on System, Man and Cybernetics,...
  • K. Lin J., E., Patel, P. and Lonardi, S., Finding Motifs in Time Series, presented at In the 2nd Workshop on Temporal...
  • B. Chiu, E. Keogh, and S. Lonardi, Probabilistic Discovery of Time Series Motifs, presented at In Proceedings of the...
  • X. Ge and P. Smyth, Deformable Markov model templates for time-series pattern matching, presented at proceedings of the...
  • E. Keogh et al.

    Dimensionality reduction for fast similarity search in large time series databases

    Journal of Knowledge and Information Systems

    (2000)
  • S. S. Liao, H. Tang, and W.-Y. Liu, Finding Relevant Sequences in Time Series Containing Crisp, Interval and Fuzzy...
There are more references available in the full text version of this article.

Cited by (0)

View full text