HierarchyScan: A Hierarchical Algorithm for Similarity Search in Databases Consisting of Long Sequences

Li, Chung-Sheng; Yu, Philip S.; Castelli, Vittorio

doi:10.1007/BF03325099

HierarchyScan: A Hierarchical Algorithm for Similarity Search in Databases Consisting of Long Sequences

Regular Paper
Published: 13 July 2013

Volume 1, pages 229–256, (1999)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Chung-Sheng Li¹,
Philip S. Yu¹ &
Vittorio Castelli¹

70 Accesses
Explore all metrics

Abstract

In this paper, a hierarchical algorithm, HierarchyScan, is proposed to efficiently locate one-dimensional subsequences within a collection of sequences with arbitrary length. The proposed algorithm performs correlation between the stored sequences and the template pattern in the transformed domain to identify subsequences in a scale- and phase-independent fashion. This is in contrast to those approaches based on the computation of Euclidean distance in the transformed domain. In the proposed hierarchical algorithm, the transformed domain representation of each original sequence is divided into multiple groups of coefficients. The matching is performed hierarchically from the group with the greatest filtering capability to the group with the lowest filtering capability. Only those subsequences whose maximum correlation value is higher than a predefined threshold will be selected for additional screening. This approach is compared to the sequential scanning and an order-of-magnitude speedup is observed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile

Article 24 June 2017

$\textsc {McDag}$: indexing maximal common subsequences for k strings

Article Open access 19 April 2025

SSAW: A new sequence similarity analysis method based on the stationary discrete wavelet transform

Article Open access 02 May 2018

References

R. Stam, R. Snodgrass. A bibliography on temporal databases, IEEE Bulletin on Data Engineering 11(4), 1988.
K. K. Al-Taha, R. T. Snodgrass, M. D. Soo. Bibliography on spatiotemporal databases, Int. J. Geogr. Inf. System 8(1), 95–103, 1994.
Article Google Scholar
C. Faloutsos, W. Equitz, M. Flickner, W. Niblack, D. Petkovic, R. Barber. Efficient and effective querying by image content, J. Intelligent Information Systems, 3(3–4), 231–262, 1993.
Google Scholar
R. Agrawal, T. Imielinski, A. Swami. Database mining: A performance perspective, IEEE Trans. Knowledge and Data Engineering, Special Issue on Learning and Discovery in Knowledge-Based Databases, 1993.
R. Agrawal, C. Faloutsos, A. Swami. Efficient similarity search in sequence database. In: Fourth International Conference on Foundations of Data Organization and Algorithms, Chicago, October, 1993.
C. Faloutsos, M. Ranganathan, Y. Manolopoulos. Fast subsequence matching in time-series databases. In: Proc. SIGMOD’94, 1994, pp. 419–429.
C. Faloutsos, K.-I. Lin. FastMap: A fast algorithm for indexing, data mining, and visualization of traditional and multimedia datasets. In: Proc. SIGMOD’95, 1995, pp. 163–174.
W. Lu, J. Han, B.C. Ooi. Discovery of general knowledge in large spatial databases. In: Proc. Far East Workshop on Geographic Information Systems, Singapore, 1993, pp. 275–289.
H. V. Jagadish. A retrieval technique for similar shapes. In: Proc. SIGMOD’91, 1991, pp. 208–217.
A. Papoulis. Probability, Random Variable, and Stochastic Process, McGraw Hill: New York, 1984.
Google Scholar
J. B. Lee, B. G. Lee. Transform domain filtering based on pipelining structure, IEEE Trans. Signal Processing 40(8), 2061–2064, 1992.
Article Google Scholar
P. P. Vaidyanathan. Orthonormal and biorthonormal filter banks as convolvers, and convolutional coding gain, IEEE Trans. Signal Processing 41(6), 2110–2130, 1993.
Article MATH Google Scholar
S. G. Mallat. A theory for multiresolution signal decomposition: The wavelet representation, IEEE Trans. Pattern Analysis and Machine Intelligence 11(7), 647–693, 1989.
Article Google Scholar
E. F. Fama. The behavior of stock market prices, J. Business, January, 34–105, 1965.
M. F. M. Osborne. Brownian motion in the stock market, Operations Research, March–April, 1959.
R. Agrawal, K. Lin, H. S. Sawhney, K. Shim. Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In: Proc. 21st International Conference on Very Large Databases, Zurich, Switzerland, September, 1995.
R. Agrawal, G. Psaila, E. L. Wimmers, M. Zait. Querying shapes of histories. In: Proc. VLDB, Switzerland, 1995, pp. 502–514.
Google Scholar
C.-S. Li, V. Castelli, P. S. Yu. HierarchyScan: A hierarchical similarity search algorithm for databases of long sequences. In: Proc. ICDE, 1996, pp. 546–553.
H. Shatkay, S. B. Zdonik. Approximate queries and representations for large data sequences. Proc. ICDE, 1996, pp. 536–545.
G. Das, D. Gunopulos, H. Mannila. Finding similar time series. In: PKDD’97, 1997, pp. 88–100.
B. Bollobas, G. Das, D. Gunopulos. Time-series similarity problems and well-separated geometric sets. In: 13th ACM Symposium on Computational Geometry, 1997, pp. 454–456.
D. Rafiei, A. Mendelzon. Similarity based queries for time series data. In: SIGMOD, 1997, pp. 13–25.
E. Keogh. Fast similarity search in the presence of longitudinal scaling of time series databases. In: Proc. IEEE International Conferences on Tools with Artificial Intelligence, 1997, pp. 578–584.

Download references

Author information

Authors and Affiliations

IBM Thomas J. Watson Research Center, P.O. Box 704, Yorktown Heights, NY 10598, USA
Chung-Sheng Li, Philip S. Yu & Vittorio Castelli

Authors

Chung-Sheng Li
View author publications
You can also search for this author inPubMed Google Scholar
Philip S. Yu
View author publications
You can also search for this author inPubMed Google Scholar
Vittorio Castelli
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Chung-Sheng Li.

Additional information

This work was funded in part by grant no. NASA/CAN NCC5-101.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, CS., Yu, P.S. & Castelli, V. HierarchyScan: A Hierarchical Algorithm for Similarity Search in Databases Consisting of Long Sequences. Knowledge and Information Systems 1, 229–256 (1999). https://doi.org/10.1007/BF03325099

Download citation

Received: 19 March 1998
Revised: 27 October 1998
Accepted: 22 December 1998
Published: 13 July 2013
Issue Date: May 1999
DOI: https://doi.org/10.1007/BF03325099

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HierarchyScan: A Hierarchical Algorithm for Similarity Search in Databases Consisting of Long Sequences

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile

\(\textsc {McDag}\): indexing maximal common subsequences for k strings

SSAW: A new sequence similarity analysis method based on the stationary discrete wavelet transform

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

HierarchyScan: A Hierarchical Algorithm for Similarity Search in Databases Consisting of Long Sequences

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile

\(\textsc {McDag}\): indexing maximal common subsequences for k strings

SSAW: A new sequence similarity analysis method based on the stationary discrete wavelet transform

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now