Skip to main content
Log in

Online summarization of dynamic time series data

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Managing large-scale time series databases has attracted significant attention in the database community recently. Related fundamental problems such as dimensionality reduction, transformation, pattern mining, and similarity search have been studied extensively. Although the time series data are dynamic by nature, as in data streams, current solutions to these fundamental problems have been mostly for the static time series databases. In this paper, we first propose a framework to online summary generation for large-scale and dynamic time series data, such as data streams. Then, we propose online transform-based summarization techniques over data streams that can be updated in constant time and space. We present both the exact and approximate versions of the proposed techniques and provide error bounds for the approximate case. One of our main contributions in this paper is the extensive performance analysis. Our experiments carefully evaluate the quality of the online summaries for point, range, and knn queries using real-life dynamic data sets of substantial size.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agrawal, R., Faloutsos, C., Swami, A.: Efficient similarity search in sequence databases. In: Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms (1993)

  2. Albrecht, S., Cumming, I., Dudas, J.: The momentary fourier transformation derived from recursive matrix transformations. In: Proceedings of the 13th International Conference on Digital Signal Processing (1997)

  3. Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. In: ACM STOC (1996)

  4. Ayad, A.M., Naughton, J.F.: Static optimization of conjunctive queries with sliding windows over infinite streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2004)

  5. Babu, S., Widom, J.: Continuous queries over data streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2001)

  6. Berchtold, S., Bohm, C., Kriegel, H.-P.: The Pyramid-Technique: Towards breaking the curse of dimensionality. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (1998)

  7. Bulut, A., Singh, A.: Swat: Hierarchical stream summarization in large networks. In: Proceedings of the International Conference on Data Engineering (2003)

  8. Castleman, K.R.: Digital Image Processing. Englewood Cliffs: Prentice-Hall (1996)

    Google Scholar 

  9. Chandrasekaran, S., Franklin, M.J.: Streaming queries over streaming data. In: Proceedings of the International Conference on Very Large Data Bases (2002)

  10. COUGAR. The cougar sensor database project: the network is the database. http://www.cs.cornell.edu/database/cougar/index.htm/

  11. Dobra, A., Garofalakis, M., Gehrke, J.E., Rastogi, R.: Processing complex aggregate queries over data streams. In: ACM SIGMOD (2002)

  12. Douglas, S.C., Soh, J.K.: A numerically-stable slidingwindow estimator and its application to adaptive filters. In: Proceedings of the 31st Asilomar Conference on Signals, Systems, and Computers (1997)

  13. Egecioglu, O., Ferhatosmanoglu, H., Ogras, U.: Dimensionality reduction and similarity computation using inner product approximations. IEEE Trans. Knowl. Data Eng. 16(6), 714–726 (2004)

    Article  Google Scholar 

  14. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of the 21st ACM Symposium on Principles of Database Systems(2002)

  15. Babcock, B., Babu, S., Datar, M., Motwani, R.: Chain: Operator scheduling for memory minimization in data stream systems. In: Proceedings of the ACM SIGMOD Interantional Conference on Management of Data (2003)

  16. Babcock, B., Datar, M., Motwani, R., O'Callaghan, L.: Sliding window computations over data streams. In: Proceedings of the Symposium on Principles of Databases Systems (2003)

  17. Abadi, D.J., Carney, D., Četintemel, U., Cherniack, M., Convey, C., Lee, S., Stonebraker, M., Tatbul, N., Zdonik, S.: Aurora: A new model and architecture for data stream management. In: Proceedings of International Conference on Very Large Data Bases (2003)

  18. Carney, D., Četintemel, U., Cherniack, M., Convey, C., Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., Zdonik, S.: Monitoring streams – a new class of DBMS applications. In: International Conference on Very Large Data Bases (2002)

  19. Chakrabarti, K., Garofalakis, M., Rastogi, R., Shim, K.: Approximate query processing using wavelets. In: Proceedings of the International Conference on Very Large Data Bases (2000)

  20. Datar, M., Gionis, A., Indyk, P., Motwani, R.: Maintaining stream statistics over sliding windows. In: Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (2002)

  21. Motwani, R., Widom, J., Arasu, A., Babcock, B., Babu, S., Datar, M., Manku, G., Olston, C., Rosenstein, J., Varma, R.: Query processing, approximation, and resource management in a data stream management system. In: Proceedings of the CIDR Conference (2003)

  22. Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (1994)

  23. Gao, L., Wang, X.: Continually evaluating similaritybased pattern queries on a streaming time series. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2002)

  24. Garofalakis, M., Gibbons, P.B.: Wavelet synopses with error guarantees. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2002)

  25. Gehrke, J., Korn, F., Srivastava, D.: On computing correlated aggregates over continual data streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2001)

  26. Gibbons, P.B., Matias, Y., Poosala, V.: Fast incremental maintenance of approximate histograms. In: Proceedings of the Internatinal Conference on Very Large Data Bases (1997)

  27. Gilbert, A., Kotidis, Y., Muthukrishnan, S., Straus, M.: Surfing wavelets on streams: one pass summaries for approximate aggregate queries. In: International Conference on Very Large Data Bases (2001)

  28. Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proceedings of the International Conference on Very Large Data Bases (1999)

  29. Kailath, T.: Modern Signal Processing. Berlin, Heidelberg,New York: Springer (1985)

    Google Scholar 

  30. Kang, J., Naughton, J.F., Viglas, S.: Evaluating window joins over unbounded streams. In: Proceedings of the International Conference on Data Engineering (2003)

  31. Kanth, K.V.R., Agrawal, D., Singh, A.: Dimensionality reduction for similarity searching in dynamic databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (1998)

  32. Karhunen, H.: Uber lineare methoden in der wahrscheinlich-keitsrechnung. Ann. Acad. Sci. Fennicae, Ser. A1 Math.-Phys. 37, 3–79 (1947)

    Google Scholar 

  33. Keogh, E.J., Chakrabarti, K., Mehrotra, S., Pazzani, M.J.: Locally adaptive dimensionality reduction for indexing large time series databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2001)

  34. Lee, J., Kim, D., Chung, C.: Multi-dimensional selectivity estimation using compressed histogram information. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (1999)

  35. Loeve, M.: Fonctions aleatoires de seconde ordre. Processus Stochastiques et Mouvement Brownien. Paris: Hermann (1948)

    Google Scholar 

  36. Madden, S., Franklin, M.J.: Fjording the stream: an architecture for queries over streaming sensor data. In: Proceedings of the International Conference on Data Engineering (2002)

  37. Matias, Y., Vitter, J.S., Wang, M.: Wavelet based histograms for selectivity estimation. In: Proceedings of the ACM Sigmod International Conference on Management of Data (1998)

  38. Matias, Y., Vitter, J.S., Wang, M.: Dynamic maintenance of wavelet-based histograms. In: International Conference on Very Large Data Bases (2000)

  39. Mendel, J.: Lessons in Estimation Theory for Signal Processing, Communications, and Control. Englewood Cliffs: Prentice-Hall (1995)

    Google Scholar 

  40. Populis, A.: Signal Analysis. New York: McGraw-Hill (1977)

    Google Scholar 

  41. Rafiei, D., Mendelzon, A.: Similarity-based queries for time series data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (1997)

  42. Rafiei, D., Mendelzon, A.: Efficient retrieval of similar time sequences using dft. In: Proceedings of the International Conference on Foundations of Data Organization and Algorithms (FODO) (1998)

  43. Rao, K.R., Yip, P.C.: The Transform and Data Compression Handbook. Boca Raton: CRC (2001)

    Google Scholar 

  44. Seidl, T., Kriegel, H.P.: Optimal multi-step k-nearest neighbor search. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. Chicago: ACM (1998)

    Google Scholar 

  45. Shumway, R.H., Stoffer, D.S.: Time Series Analysis and Its Applications. Berlin, Heidelberg, New York: Springer (2000)

    Google Scholar 

  46. Viglas, S., Naughton, J.F.: Rate-based query optimization for streaming information sources. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Madison, WI (2002)

  47. Vitter, J.S., Wang, M.: Approximate computation of multidimensional aggregates of sparse data using wavelets. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (1999)

  48. Wu, D., Agrawal, D., El Abbadi, A., Smith, T.R.: Efficient retrieval for browsing large image databases. In: Proceedings of the Conference on Information and Knowledge Management, pp. 11–18 (1996)

  49. Yao, Y., Gehrke, J.: Query processing for sensor networks. In: Proceedings of CIDR (2002)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Umit Y. Ogras.

Additional information

Edited by W. Aref

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ogras, U.Y., Ferhatosmanoglu, H. Online summarization of dynamic time series data. The VLDB Journal 15, 84–98 (2006). https://doi.org/10.1007/s00778-004-0149-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-004-0149-x

Keywords

Navigation