Efficient indexing of interval time sequences

https://doi.org/10.1016/j.ipl.2008.08.003Get rights and content

Abstract

Time sequences, which are ordered sets of observations, have been studied in various database applications. In this paper, we introduce a new class of time sequences where each observation is represented by an interval rather than a number. Such sequences may arise in many situations. For instance, we may not be able to determine the exact value at a time point due to uncertainty or aggregation. Such observation may be represented better by a range of possible values. Similarity search with interval time sequences as both query and data sequences poses a new challenge for research. We first address the issue of (dis)similarity measures for interval time sequences. We choose an L1 norm-based measure because it effectively quantifies the degree of overlapping and remoteness between two intervals, and is invariant irrespective of the position of an interval when it is enclosed within another interval. We next propose an efficient indexing technique for fast retrieval of similar interval time sequences from large databases. More specifically, we propose: (1) to extract a segment-based feature vector for each sequence, and (2) to map each feature vector to either a point or a hyper-rectangle in a multi-dimensional feature space. We then show how we can use existing multi-dimensional index structures such as the R-tree for efficient query processing. The proposed method guarantees no false dismissals. Experimental results show that, for synthetic and real stock data, it is superior to sequential scanning in performance and scales well with the data size.

References (19)

  • D.Q. Goldin et al.

    Bounded similarity querying for time-series data

    Information and Computation

    (2004)
  • G. Rote

    Computing the minimum Hausdorff distance between two point sets on a line under translation

    Information Processing Letters

    (1991)
  • R. Agrawal, C. Faloutsos, A. Swami, Efficient similarity search in sequence databases, in: Proc. of FODO Conf.,...
  • C. Faloutsos, M. Ranganathan, Y. Manolopoulos, Fast subsequence matching in time-series databases, in: Proc. of the ACM...
  • D. Rafiei, A. Mendelzon, Similarity-based queries for time series data, in: Proc. of ACM SIGMOD Conf., Tucson, AZ,...
  • B.-K. Yi, H. Jagadish, C. Faloutsos, Efficient retrieval of similar time sequences under time warping, in: IEEE Proc....
  • K. Chakrabarti et al.

    Locally adaptive dimensionality reduction for indexing large time series databases

    ACM Transactions on Database Systems

    (2002)
  • B.-K. Yi, C. Faloutsos, Fast time sequence indexing for arbitrary lp norms, in: Proc. of VLDB Conf., Cairo, Egypt,...
  • E.J. Keogh et al.

    Dimensionality reduction for fast similarity search in large time series databases

    Knowledge and Information Systems

    (2001)
There are more references available in the full text version of this article.

Cited by (6)

Preliminary version appeared in Proc. of DASFAA 2004.

View full text