Abstract
Finding discords in time series database is an important problem in the last decade due to its variety of real-world applications, including data cleansing, fault diagnostics, and financial data analysis. The best known approach to our knowledge is HOT SAX technique based on the equiprobable distribution of SAX representations of time series. This characteristic, however, is not preserved in the reduced-dimensionality literature, especially on the lack of Gaussian distribution datasets. In this paper, we introduce a k-means based algorithm for symbolic representations of time series called adaptive Symbolic Aggregate approXimation (aSAX) and propose HOT aSAX algorithm for time series discords discovery. Due to the clustered characteristic of aSAX words, our algorithm produces greater pruning power than the previous approach. Our empirical experiments with real-world time series datasets confirm the theoretical analyses as well as the efficiency of our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bu, Y., Leung, T.-W., Fu, A., Keogh, E., Pei, J., Meshkin, S.: WAT: Finding Top-K Discords in Time Series Databases. In: Proceedings of the 7th SIAM International Conference on Data Mining, USA, pp. 449–454 (2007)
Chan, K., Fu, A.: Efficient time series matching by wavelets. In: Proceedings of ICDE, Australia, pp. 126–133 (1999)
Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time series databases. In: Proceedings of ACM SIGMOD, USA, pp. 419–429 (1994)
Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Dimensionality reduction for fast similarity search in large time series databases. Journal of Knowledge and Information System, 263–286 (2000)
Keogh, E., Lin, J., Fu, A.: HOT SAX: Efficiently finding the most unusual time series subsequence. In: Proceedings of ICDM, USA, pp. 226–233 (2005)
Lin, J., Keogh, E., Wei, L., Lonardi, S.: Experiencing SAX: a novel symbolic representation of time series. Journal of Data Mining Knowledge Discovery, 107–144 (2007)
Lloyd, S.P.: Least squares quantization in PCM. Proceedings of IEEE Transaction on Information Theory, 129–137 (1982)
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pham, N.D., Le, Q.L., Dang, T.K. (2010). HOT aSAX: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery. In: Nguyen, N.T., Le, M.T., ÅšwiÄ…tek, J. (eds) Intelligent Information and Database Systems. ACIIDS 2010. Lecture Notes in Computer Science(), vol 5990. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12145-6_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-12145-6_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12144-9
Online ISBN: 978-3-642-12145-6
eBook Packages: Computer ScienceComputer Science (R0)