A Subsequence Matching Algorithm that Supports Normalization Transform in Time-Series Databases

Loh, Woong-Kee; Kim, Sang-Wook; Whang, Kyu-Young

doi:10.1023/B:DAMI.0000026902.89522.a3

A Subsequence Matching Algorithm that Supports Normalization Transform in Time-Series Databases

Published: July 2004

Volume 9, pages 5–28, (2004)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Woong-Kee Loh¹,
Sang-Wook Kim² &
Kyu-Young Whang¹

312 Accesses
29 Citations
Explore all metrics

Abstract

In this paper, an algorithm is proposed for subsequence matching that supports normalization transform in time-series databases. Normalization transform enables finding sequences with similar fluctuation patterns even though they are not close to each other before the normalization transform. Simple application of existing subsequence matching algorithms to support normalization transform is not feasible since the algorithms do not have information for normalization transform of subsequences of arbitrary lengths. Application of the existing whole matching algorithm supporting normalization transform to the subsequence matching is feasible, but requires an index for every possible length of the query sequence causing serious overhead on both storage space and update time. The proposed algorithm generates indexes only for a small number of different lengths of query sequences. For subsequence matching it selects the most appropriate index among them. Better search performance can be obtained by using more indexes. In this paper, the approach is called index interpolation. It is formally proved that the proposed algorithm does not cause false dismissal. The search performance can be traded off with storage space by adjusting the number of indexes. For performance evaluation, a series of experiments is conducted using the indexes for only five different lengths out of lengths 256∼512 of the query sequence. The results show that the proposed algorithm outperforms the sequential scan by up to 2.4 times on the average when the selectivity of the query is 10⁻² and up to 14.6 times when it is 10⁻⁵. Since the proposed algorithm performs better with smaller selectivities, it is suitable for practical situations, where the queries with smaller selectivities are much more frequent.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MVS-match: An Efficient Subsequence Matching Approach Based on the Series Synopsis

Time Series Subsequence Matching Based on Middle Points and Clipping

Similarity search for numerous patterns over multiple time series streams under dynamic time warping which supports data normalization

Article Open access 11 March 2016

References

Allan, J., Carbonell, J., Doddington, G., Yamron, J., and Yang, Y. 1998. Topic detection and tracking pilot study: Final report, in Proc. of the DARPA Broadcast News Transcription and UnderstandingWorkshop, pp. 194–218.
Barnett, V. and Lewis, T. 1994. Outliers in Statistical Data, John Wiley & Sons.
Bonchi, F., Giannotti, F., Mainetto, G., and Pedeschi, D. 1999. A classification-based methodology for planning audit strategies in fraud detection. In Proc. of KDD-99, pp. 175–184.
Burge, P. and Shawe-Taylor, J. 1997. Detecting cellular fraud using adaptive prototypes. In Proc. of AI Approaches to Fraud Detection and Risk Management, pp. 9–13.
Chan, P. and Stolfo, S. 1998. Toward scalable learning with non-uniform class and cost-distributions: A case study in credit card fraud detection. In Proc. of KDD-98, AAAI-Press, pp. 164–168.
Cover, T. and Thomas, J.A. 1991. Elements of Information Theory. Wiley-International.
Dempster, A.P., Laird, N.M., and Ribin, D.B. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B, 39(1): 1–38.
Google Scholar
Fawcett, T. and Provost, F. 1997. Combining data mining and machine learning for effective fraud detection. In Proc. of AI Approaches to Fraud Detection and Risk Management, pp. 14–19.
Fawcett, T. and Provost, F. 1999. Activity monitoring: Noticing interesting changes in behavior. In Proc. of KDD-99, pp. 53–62.
Grabec, I. 1990. Self-organization of Neurons described by the maximum-entropy principle, Biological Cybernetics, 63: 403–409.
Google Scholar
Guralnik, V. and Srivastava, J. 1999. Event detection from time series data. In Proc. KDD-99, pp. 33–42.
Hawkins, D.M. 1980. Identification of Outliers. Chapman and Hall, London.
Hunt, L.A. and Jorgensen, M.A. 1999. Mixture model clustering: A brief introduction to the MULTMIX program, Australian & New Zealand Journal of Statistics, 40: 153–171.
Google Scholar
Knorr, E.M. and Ng, R.T. 1998. Algorithms for mining distance-based outliers in large datasets. In Proc. of the 24th VLDB Conference, pp. 392–403.
Knorr, E.M. and Ng, R.T. 1999. Finding intensional knowledge of distance-based outliers. In Proc. of the 25^th VLDB Conference, pp. 211–222.
Krichevskii, R.E. and Trofimov, V.K. 1981. The performance of universal coding. IEEE Trans. Inform. Theory, IT-27(2): 199–207.
Google Scholar
Lane, T. and Brodley, C. 1998. Approaches to on-line learning and concept drift for user identification in computer security. In Proc. of KDD-98, AAAI Press, pp. 66–72.
Lee, W., Stolfo, S.J., and Mok, K.W. 1998. Mining audit data to build intrusion detection models. In Proc. of KDD-98.
Lee, W., Stolfo, S.J., and Mok, K.W. 1999. Mining in a data-flow environment: Experience in network intrusion detection. In Proc. of KDD-99, pp. 114–124.
Marron, J.S. and Wand, M.P. 1992. Exact mean integrated squared error. Annals of Statistics, 20: 712–736.
Google Scholar
McLachlan, G. and Peel, D. 2000. Finite Mixture Models. Wiley Series in Probability and Statistics, John Wiley and Sons.
Moreau,Y. and Vandewalle, J. Detection of mobile phone fraud using supervised neural networks: Afirst prototype, Available via: ftp: //ftp.esat.kuleuven.ac.jp/pub/SISTA/moreau/reports/icann97 TR97–44.ps.
Neal, R.M. and Hinton, G.E. 1993. A view of the EM algorithm that justifies incremental, sparse, and other variants, ftp://ftp.cs.toronto.edu/pub/radford/www/publications.html
Ng, S.K. and McLachlan, G.J. 2002. On the choice of the number of blocks with the incremental EM algorithm for the fitting of normal mixtures. Statistics & Computing. In press. Available at http: //www.maths.uq.edu.au/ gim/increm.ps
Rocke, D.M. 1996. Robustness properties of S-estimators of multivariate location and shape in high dimension. Annals of Statistics, 24(3): 1327–1345.
Article Google Scholar
Rosset, S., Murad, U., Neumann, E., Idan,Y., and Pinkas, G. 1999. Discovery of fraud rules for telecommunicationschallenges and solutions. In Proc. of KDD-99, pp. 409–413.
Williams, G.J. and Huang, Z. 1997. Mining the knowledge mine: The hot spots methodology for mining large real world databases. In Advanced Topics in Artificial Intelligence Lecture Notes in Artificial Intelligence, volume 1342, Springer-Verlag, pp. 340–348.
Google Scholar
Yamanishi, K., Takeuchi, J., Williams, G., and Milne, P. 2000. On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. In Proc. of KDD2000, ACM Press, pp. 250–254.

Download references

Author information

Authors and Affiliations

Department of Computer Science & Advanced Information Technology Research Center (AITrc), Korea Advanced Institute of Science and Technology (KAIST), Korea
Woong-Kee Loh & Kyu-Young Whang
College of Information and Communications, Hanyang University, Korea
Sang-Wook Kim

Authors

Woong-Kee Loh
View author publications
You can also search for this author in PubMed Google Scholar
Sang-Wook Kim
View author publications
You can also search for this author in PubMed Google Scholar
Kyu-Young Whang
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Loh, WK., Kim, SW. & Whang, KY. A Subsequence Matching Algorithm that Supports Normalization Transform in Time-Series Databases. Data Min Knowl Disc 9, 5–28 (2004). https://doi.org/10.1023/B:DAMI.0000026902.89522.a3

Download citation

Issue Date: July 2004
DOI: https://doi.org/10.1023/B:DAMI.0000026902.89522.a3

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Subsequence Matching Algorithm that Supports Normalization Transform in Time-Series Databases

Abstract

Access this article

Similar content being viewed by others

MVS-match: An Efficient Subsequence Matching Approach Based on the Series Synopsis

Time Series Subsequence Matching Based on Middle Points and Clipping

Similarity search for numerous patterns over multiple time series streams under dynamic time warping which supports data normalization

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

A Subsequence Matching Algorithm that Supports Normalization Transform in Time-Series Databases

Abstract

Access this article

Similar content being viewed by others

MVS-match: An Efficient Subsequence Matching Approach Based on the Series Synopsis

Time Series Subsequence Matching Based on Middle Points and Clipping

Similarity search for numerous patterns over multiple time series streams under dynamic time warping which supports data normalization

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation