Skip to main content
Log in

Adaptive, unsupervised stream mining

  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract.

Sensor devices and embedded processors are becoming widespread, especially in measurement/monitoring applications. Their limited resources (CPU, memory and/or communication bandwidth, and power) pose some interesting challenges. We need concise, expressive models to represent the important features of the data and that lend themselves to efficient estimation. In particular, under these severe constraints, we want models and estimation methods that (a) require little memory and a single pass over the data, (b) can adapt and handle arbitrary periodic components, and (c) can deal with various types of noise. We propose \ensuremath{\mathrm{AWSOM}} (Arbitrary Window Stream mOdeling Method), which allows sensors in remote or hostile environments to efficiently and effectively discover interesting patterns and trends. This can be done automatically, i.e., with no prior inspection of the data or any user intervention and expert tuning before or during data gathering. Our algorithms require limited resources and can thus be incorporated into sensors - possibly alongside a distributed query processing engine [10,6,27]. Updates are performed in constant time with respect to stream size using logarithmic space. Existing forecasting methods (SARIMA, GARCH, etc.) and “traditional” Fourier and wavelet analysis fall short on one or more of these requirements. To the best of our knowledge, \ensuremath{\mathrm{AWSOM}} is the first framework that combines all of the above characteristics. Experiments on real and synthetic datasets demonstrate that \ensuremath{\mathrm{AWSOM}} discovers meaningful patterns over long time periods. Thus, the patterns can also be used to make long-range forecasts, which are notoriously difficult to perform. In fact, \ensuremath{\mathrm{AWSOM}} outperforms manually set up autoregressive models, both in terms of long-term pattern detection and modeling and by at least 10 x in resource consumption.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Akay M (1997) Time frequency and wavelets in biomedical signal processing. Wiley, New York

  2. Arasu A, Babcock B, Babu S, McAlister J, Widom J (2002) Characterizing memory requirements for queries over continuous data streams. In: Proc PODS

  3. Babcock B, Olston C (2003) Distributed top-k monitoring. In: Proc SIGMOD

  4. Beran J (1994) Statistics for long-memory processes. Chapman & Hall, London

  5. Bollerslev T (1986) Generalized autoregressive conditional heteroskedasticity. J Econometr 31:307-327

    Article  MathSciNet  MATH  Google Scholar 

  6. Bonnet P, Gehrke JE, Seshadri P (2001) Towards sensor database systems. In: Proc MDM

  7. Brockwell PJ, Davis RA (1991) Time series: theory and methods. Springer series in statistics, 2nd edn. Springer, Berlin Heidelberg New York

    Google Scholar 

  8. Bulut A, Singh AK (2003) SWAT: Hierarchical stream summarization in large networks. In: Proc 19th ICDE

  9. Carley LR, Ganger GR, Nagle D (2000) Mems-based integrated-circuit mass-storage systems. Commun ACM 43(11):72-80

    Article  Google Scholar 

  10. Carney D, Cetintemel U, Cherniack M, Convey C, Lee S, Seidman G, Stonebraker M, Tatbul N, Zdonik SB (2002) Monitoring streams - a new class of data management applications. In: Proc VLDB

  11. Chen Y, Dong G, Han J, Wah BW, Wang J (2002) Multi-dimensional regression analysis of time-series data streams. In: Proc VLDB

  12. Considine J, Li F, Kollios G, Byers JW (2004) Approximate aggregation techniques for sensor databases. In: Proc ICDE

  13. Das G, Lin L-I, Mannila H, Renganathan G, Smyth P (1998) Rule discovery from time series. In: Proc KDD

  14. Datar M, Gionis A, Indyk P, Motwani R (2002) Maintaining stream statistics over sliding windows. In: Proc SODA

  15. DeGroot MH, Schervish MJ (2002) Probability and statistics, 3rd edn. Addison-Wesley, Reading, MA

  16. Dobra A, Garofalakis MN, Gehrke J, Rastogi R (2002) Processing complex aggregate queries over data streams. In: Proc SIGMOD

  17. Ergün F, Muthukrishnan S, Sahinalp SC (2004) Sublinear methods for detecting periodic trends in datastreams. In: Proc LATIN

  18. C. Faloutsos (1996) Searching multimedia databases by content. Kluwer, Dordrecht

  19. Garofalakis MN, Gibbons PB (2002) Wavelet synopses with error guarantees. In: Proc SIGMOD

  20. Gehrke J, Korn F, Srivastava D (2001) On computing correlated aggregates over continual data streams. In: Proc SIGMOD

  21. Gencay R, Selcuk F, Whitcher B (2001) An introduction to wavelets and other filtering methods in finance and economics. Academic, New York

  22. Gilbert AC, Kotidis Y, Muthukrishnan S, Strauss M (2001) Surfing wavelets on streams: one-pass summaries for approximate aggregate queries. In: Proc VLDB

  23. Guha S, Koudas N (2002) Approximating a data stream for querying and estimation: algorithms and performance evaluation. In: Proc ICDE

  24. Hill J, Szewczyk R, Woo A, Hollar S, Culler D, Pister K (2000) System architecture directions for networked sensors. In: Proc ASPLOS-IX

  25. Indyk P, Koudas N, Muthukrishnan S (2000) Identifying representative trends in massive time series data sets using sketches. In: Proc VLDB

  26. Leland W, Taqqu M, Willinger W, Wilson D (1994) On the self-similar nature of ethernet traffic. IEEE Trans Network 2(1):1-15

    Article  MATH  Google Scholar 

  27. Madden SR, Shah MA, Hellerstein JM, Raman V (2002) Continuously adaptive continuous queries over streams. In: Proc SIGMOD

  28. Olston C, Jiang J, Widom J (2003) Adaptive filters for continuous queries over distributed data streams. In: Proc SIGMOD

  29. Palpanas T, Vlachos M, Keogh EJ, Gunopulos D, Truppel W (2004) Online amnesic approximation of streaming time series. In: Proc ICDE

  30. Percival DB, Walden AT (2000) Wavelet methods for time series analysis. Cambridge University Press, Cambridge, UK

  31. Riedel E, Faloutsos C, Ganger GR, Nagle D (2000) Data mining on an OLTP system (nearly) for free. In Proc SIGMOD

  32. Tao Y, Faloutsos C, Papadias D, Liu B (2004) Prediction and indexing of moving objects with unknown motion patterns. In: Proc SIGMOD

  33. Weigend AS, Gerschenfeld NA (1994) Time series prediction: forecasting the future and understanding the past. Addison-Wesley, Reading, MA

    Google Scholar 

  34. Yi B-K, Sidiropoulos N, Johnson T, Jagadish H, Faloutsos C, Biliris A (2000) Online data mining for co-evolving time sequences. Proc ICDE

  35. Young P (1984) Recursive estimation and time-series analysis: an introduction. Springer, Berlin Heidelberg New York

  36. Zhang D, Gunopulos D, Tsotras VJ, Seeger B (2002) Temporal aggregation over data streams using multiple granularities. In: Proc EDBT

  37. Zhu Y, Shasha D (2002) Statstream: statistical monitoring of thousands of data streams in real time. In: Proc VLDB

  38. Zhu Y, Shasha D (2003) Efficient elastic burst detection in data streams. In: Proc KDD

  39. Zuidwijk R, de Zeeuw P (1998) Fast algorithm for directional time-scale analysis using wavelets. In: Proc SPIE, Wavelet Applications in Signal and Image Processing VI, vol 3458

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Spiros Papadimitriou.

Additional information

Received: 2 January 2004, Accepted: 23 March 2004, Published online: 12 August 2004

Edited by: S. Abitebou

Anthony Brockwell: This material is based upon work supported by the National Science Foundation under Grants Nos. DMS-9819950 and IIS-0083148.

Christos Faloutsos: This material is based upon work supported by the National Science Foundation under Grants Nos. IIS-9817496, IIS-9988876, IIS-0083148, IIS-0113089, IIS-0209107, IIS-0205224, INT-0318547, SE NSOR-0329549, EF-0331657, and IIS-0326322, by the Pennsylvania Infrastructure Technology Alliance (PITA) Grant No. 22-901-0001, and by the Defense Advanced Research Projects Agency under Contract No. N66001-00-1-8936. Additional funding was provided by donations from Intel and by a gift from Northrop-Grumman Corporation. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation or other funding parties.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Papadimitriou, S., Brockwell, A. & Faloutsos, C. Adaptive, unsupervised stream mining. VLDB 13, 222–239 (2004). https://doi.org/10.1007/s00778-004-0130-8

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-004-0130-8

Keywords

Navigation