Abstract
Managing large-scale time series databases has attracted significant attention in the database community recently. Related fundamental problems such as dimensionality reduction, transformation, pattern mining, and similarity search have been studied extensively. Although the time series data are dynamic by nature, as in data streams, current solutions to these fundamental problems have been mostly for the static time series databases. In this paper, we first propose a framework to online summary generation for large-scale and dynamic time series data, such as data streams. Then, we propose online transform-based summarization techniques over data streams that can be updated in constant time and space. We present both the exact and approximate versions of the proposed techniques and provide error bounds for the approximate case. One of our main contributions in this paper is the extensive performance analysis. Our experiments carefully evaluate the quality of the online summaries for point, range, and k–nn queries using real-life dynamic data sets of substantial size.
Similar content being viewed by others
References
Agrawal, R., Faloutsos, C., Swami, A.: Efficient similarity search in sequence databases. In: Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms (1993)
Albrecht, S., Cumming, I., Dudas, J.: The momentary fourier transformation derived from recursive matrix transformations. In: Proceedings of the 13th International Conference on Digital Signal Processing (1997)
Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. In: ACM STOC (1996)
Ayad, A.M., Naughton, J.F.: Static optimization of conjunctive queries with sliding windows over infinite streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2004)
Babu, S., Widom, J.: Continuous queries over data streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2001)
Berchtold, S., Bohm, C., Kriegel, H.-P.: The Pyramid-Technique: Towards breaking the curse of dimensionality. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (1998)
Bulut, A., Singh, A.: Swat: Hierarchical stream summarization in large networks. In: Proceedings of the International Conference on Data Engineering (2003)
Castleman, K.R.: Digital Image Processing. Englewood Cliffs: Prentice-Hall (1996)
Chandrasekaran, S., Franklin, M.J.: Streaming queries over streaming data. In: Proceedings of the International Conference on Very Large Data Bases (2002)
COUGAR. The cougar sensor database project: the network is the database. http://www.cs.cornell.edu/database/cougar/index.htm/
Dobra, A., Garofalakis, M., Gehrke, J.E., Rastogi, R.: Processing complex aggregate queries over data streams. In: ACM SIGMOD (2002)
Douglas, S.C., Soh, J.K.: A numerically-stable slidingwindow estimator and its application to adaptive filters. In: Proceedings of the 31st Asilomar Conference on Signals, Systems, and Computers (1997)
Egecioglu, O., Ferhatosmanoglu, H., Ogras, U.: Dimensionality reduction and similarity computation using inner product approximations. IEEE Trans. Knowl. Data Eng. 16(6), 714–726 (2004)
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of the 21st ACM Symposium on Principles of Database Systems(2002)
Babcock, B., Babu, S., Datar, M., Motwani, R.: Chain: Operator scheduling for memory minimization in data stream systems. In: Proceedings of the ACM SIGMOD Interantional Conference on Management of Data (2003)
Babcock, B., Datar, M., Motwani, R., O'Callaghan, L.: Sliding window computations over data streams. In: Proceedings of the Symposium on Principles of Databases Systems (2003)
Abadi, D.J., Carney, D., Četintemel, U., Cherniack, M., Convey, C., Lee, S., Stonebraker, M., Tatbul, N., Zdonik, S.: Aurora: A new model and architecture for data stream management. In: Proceedings of International Conference on Very Large Data Bases (2003)
Carney, D., Četintemel, U., Cherniack, M., Convey, C., Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., Zdonik, S.: Monitoring streams – a new class of DBMS applications. In: International Conference on Very Large Data Bases (2002)
Chakrabarti, K., Garofalakis, M., Rastogi, R., Shim, K.: Approximate query processing using wavelets. In: Proceedings of the International Conference on Very Large Data Bases (2000)
Datar, M., Gionis, A., Indyk, P., Motwani, R.: Maintaining stream statistics over sliding windows. In: Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (2002)
Motwani, R., Widom, J., Arasu, A., Babcock, B., Babu, S., Datar, M., Manku, G., Olston, C., Rosenstein, J., Varma, R.: Query processing, approximation, and resource management in a data stream management system. In: Proceedings of the CIDR Conference (2003)
Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (1994)
Gao, L., Wang, X.: Continually evaluating similaritybased pattern queries on a streaming time series. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2002)
Garofalakis, M., Gibbons, P.B.: Wavelet synopses with error guarantees. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2002)
Gehrke, J., Korn, F., Srivastava, D.: On computing correlated aggregates over continual data streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2001)
Gibbons, P.B., Matias, Y., Poosala, V.: Fast incremental maintenance of approximate histograms. In: Proceedings of the Internatinal Conference on Very Large Data Bases (1997)
Gilbert, A., Kotidis, Y., Muthukrishnan, S., Straus, M.: Surfing wavelets on streams: one pass summaries for approximate aggregate queries. In: International Conference on Very Large Data Bases (2001)
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proceedings of the International Conference on Very Large Data Bases (1999)
Kailath, T.: Modern Signal Processing. Berlin, Heidelberg,New York: Springer (1985)
Kang, J., Naughton, J.F., Viglas, S.: Evaluating window joins over unbounded streams. In: Proceedings of the International Conference on Data Engineering (2003)
Kanth, K.V.R., Agrawal, D., Singh, A.: Dimensionality reduction for similarity searching in dynamic databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (1998)
Karhunen, H.: Uber lineare methoden in der wahrscheinlich-keitsrechnung. Ann. Acad. Sci. Fennicae, Ser. A1 Math.-Phys. 37, 3–79 (1947)
Keogh, E.J., Chakrabarti, K., Mehrotra, S., Pazzani, M.J.: Locally adaptive dimensionality reduction for indexing large time series databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2001)
Lee, J., Kim, D., Chung, C.: Multi-dimensional selectivity estimation using compressed histogram information. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (1999)
Loeve, M.: Fonctions aleatoires de seconde ordre. Processus Stochastiques et Mouvement Brownien. Paris: Hermann (1948)
Madden, S., Franklin, M.J.: Fjording the stream: an architecture for queries over streaming sensor data. In: Proceedings of the International Conference on Data Engineering (2002)
Matias, Y., Vitter, J.S., Wang, M.: Wavelet based histograms for selectivity estimation. In: Proceedings of the ACM Sigmod International Conference on Management of Data (1998)
Matias, Y., Vitter, J.S., Wang, M.: Dynamic maintenance of wavelet-based histograms. In: International Conference on Very Large Data Bases (2000)
Mendel, J.: Lessons in Estimation Theory for Signal Processing, Communications, and Control. Englewood Cliffs: Prentice-Hall (1995)
Populis, A.: Signal Analysis. New York: McGraw-Hill (1977)
Rafiei, D., Mendelzon, A.: Similarity-based queries for time series data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (1997)
Rafiei, D., Mendelzon, A.: Efficient retrieval of similar time sequences using dft. In: Proceedings of the International Conference on Foundations of Data Organization and Algorithms (FODO) (1998)
Rao, K.R., Yip, P.C.: The Transform and Data Compression Handbook. Boca Raton: CRC (2001)
Seidl, T., Kriegel, H.P.: Optimal multi-step k-nearest neighbor search. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. Chicago: ACM (1998)
Shumway, R.H., Stoffer, D.S.: Time Series Analysis and Its Applications. Berlin, Heidelberg, New York: Springer (2000)
Viglas, S., Naughton, J.F.: Rate-based query optimization for streaming information sources. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Madison, WI (2002)
Vitter, J.S., Wang, M.: Approximate computation of multidimensional aggregates of sparse data using wavelets. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (1999)
Wu, D., Agrawal, D., El Abbadi, A., Smith, T.R.: Efficient retrieval for browsing large image databases. In: Proceedings of the Conference on Information and Knowledge Management, pp. 11–18 (1996)
Yao, Y., Gehrke, J.: Query processing for sensor networks. In: Proceedings of CIDR (2002)
Author information
Authors and Affiliations
Corresponding author
Additional information
Edited by W. Aref
Rights and permissions
About this article
Cite this article
Ogras, U.Y., Ferhatosmanoglu, H. Online summarization of dynamic time series data. The VLDB Journal 15, 84–98 (2006). https://doi.org/10.1007/s00778-004-0149-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-004-0149-x