Abstract
In this paper, a hierarchical algorithm, HierarchyScan, is proposed to efficiently locate one-dimensional subsequences within a collection of sequences with arbitrary length. The proposed algorithm performs correlation between the stored sequences and the template pattern in the transformed domain to identify subsequences in a scale- and phase-independent fashion. This is in contrast to those approaches based on the computation of Euclidean distance in the transformed domain. In the proposed hierarchical algorithm, the transformed domain representation of each original sequence is divided into multiple groups of coefficients. The matching is performed hierarchically from the group with the greatest filtering capability to the group with the lowest filtering capability. Only those subsequences whose maximum correlation value is higher than a predefined threshold will be selected for additional screening. This approach is compared to the sequential scanning and an order-of-magnitude speedup is observed.
Similar content being viewed by others
References
R. Stam, R. Snodgrass. A bibliography on temporal databases, IEEE Bulletin on Data Engineering 11(4), 1988.
K. K. Al-Taha, R. T. Snodgrass, M. D. Soo. Bibliography on spatiotemporal databases, Int. J. Geogr. Inf. System 8(1), 95–103, 1994.
C. Faloutsos, W. Equitz, M. Flickner, W. Niblack, D. Petkovic, R. Barber. Efficient and effective querying by image content, J. Intelligent Information Systems, 3(3–4), 231–262, 1993.
R. Agrawal, T. Imielinski, A. Swami. Database mining: A performance perspective, IEEE Trans. Knowledge and Data Engineering, Special Issue on Learning and Discovery in Knowledge-Based Databases, 1993.
R. Agrawal, C. Faloutsos, A. Swami. Efficient similarity search in sequence database. In: Fourth International Conference on Foundations of Data Organization and Algorithms, Chicago, October, 1993.
C. Faloutsos, M. Ranganathan, Y. Manolopoulos. Fast subsequence matching in time-series databases. In: Proc. SIGMOD’94, 1994, pp. 419–429.
C. Faloutsos, K.-I. Lin. FastMap: A fast algorithm for indexing, data mining, and visualization of traditional and multimedia datasets. In: Proc. SIGMOD’95, 1995, pp. 163–174.
W. Lu, J. Han, B.C. Ooi. Discovery of general knowledge in large spatial databases. In: Proc. Far East Workshop on Geographic Information Systems, Singapore, 1993, pp. 275–289.
H. V. Jagadish. A retrieval technique for similar shapes. In: Proc. SIGMOD’91, 1991, pp. 208–217.
A. Papoulis. Probability, Random Variable, and Stochastic Process, McGraw Hill: New York, 1984.
J. B. Lee, B. G. Lee. Transform domain filtering based on pipelining structure, IEEE Trans. Signal Processing 40(8), 2061–2064, 1992.
P. P. Vaidyanathan. Orthonormal and biorthonormal filter banks as convolvers, and convolutional coding gain, IEEE Trans. Signal Processing 41(6), 2110–2130, 1993.
S. G. Mallat. A theory for multiresolution signal decomposition: The wavelet representation, IEEE Trans. Pattern Analysis and Machine Intelligence 11(7), 647–693, 1989.
E. F. Fama. The behavior of stock market prices, J. Business, January, 34–105, 1965.
M. F. M. Osborne. Brownian motion in the stock market, Operations Research, March–April, 1959.
R. Agrawal, K. Lin, H. S. Sawhney, K. Shim. Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In: Proc. 21st International Conference on Very Large Databases, Zurich, Switzerland, September, 1995.
R. Agrawal, G. Psaila, E. L. Wimmers, M. Zait. Querying shapes of histories. In: Proc. VLDB, Switzerland, 1995, pp. 502–514.
C.-S. Li, V. Castelli, P. S. Yu. HierarchyScan: A hierarchical similarity search algorithm for databases of long sequences. In: Proc. ICDE, 1996, pp. 546–553.
H. Shatkay, S. B. Zdonik. Approximate queries and representations for large data sequences. Proc. ICDE, 1996, pp. 536–545.
G. Das, D. Gunopulos, H. Mannila. Finding similar time series. In: PKDD’97, 1997, pp. 88–100.
B. Bollobas, G. Das, D. Gunopulos. Time-series similarity problems and well-separated geometric sets. In: 13th ACM Symposium on Computational Geometry, 1997, pp. 454–456.
D. Rafiei, A. Mendelzon. Similarity based queries for time series data. In: SIGMOD, 1997, pp. 13–25.
E. Keogh. Fast similarity search in the presence of longitudinal scaling of time series databases. In: Proc. IEEE International Conferences on Tools with Artificial Intelligence, 1997, pp. 578–584.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was funded in part by grant no. NASA/CAN NCC5-101.
Rights and permissions
About this article
Cite this article
Li, CS., Yu, P.S. & Castelli, V. HierarchyScan: A Hierarchical Algorithm for Similarity Search in Databases Consisting of Long Sequences. Knowledge and Information Systems 1, 229–256 (1999). https://doi.org/10.1007/BF03325099
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/BF03325099