Skip to main content

Adaptive, Automatic Stream Mining

  • Chapter
  • First Online:
Data Stream Management

Abstract

Sensor devices and embedded processors are becoming widespread, especially in measurement/monitoring applications. Their limited resources (CPU, memory and/or communication bandwidth and power) pose some interesting challenges. We need concise, expressive models to represent the important features of the data, and lend themselves to efficient estimation. In particular, under these severe constraints, we want models and estimation methods which (a) require little memory and a single pass over the data, (b) can adapt and handle arbitrary periodic components, and (c) can deal with various types of noise. We propose AWSOM (Arbitrary Window Stream mOdeling Method), which allows sensors in remote or hostile environments to efficiently and effectively discover interesting patterns and trends. This can be done automatically, i.e., with no prior inspection of the data or any user intervention and expert tuning before or during data gathering. Our algorithms require limited resources and can thus be incorporated in sensors—possibly alongside a distributed query processing engine. Updates are performed in constant time with respect to stream size, using logarithmic space. Existing forecasting methods (SARIMA, GARCH, etc.) or “traditional” Fourier and wavelet analysis fall short on one or more of these requirements. To the best of our knowledge, AWSOM is the first framework that combines all of the above characteristics.

This material is based upon work supported by the National Science Foundation under Grants No. DMS-9819950 and IIS-0083148.

This material is based upon work supported by the National Science Foundation under Grants No. IIS-9817496, IIS-9988876, IIS-0083148, IIS-0113089, IIS-0209107 IIS-0205224 INT-0318547 SENSOR-0329549 EF-0331657IIS-0326322 by the Pennsylvania Infrastructure Technology Alliance (PITA) Grant No. 22-901-0001, and by the Defense Advanced Research Projects Agency under Contract No. N66001-00-1-8936. Additional funding was provided by donations from Intel, and by a gift from Northrop-Grumman Corporation. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, or other funding parties.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 99.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. M. Akay (ed.), Time Frequency and Wavelets in Biomedical Signal Processing (Wiley, New York, 1997)

    MATH  Google Scholar 

  2. A. Arasu, B. Babcock, S. Babu, J. McAlister, J. Widom, Characterizing memory requirements for queries over continuous data streams, in PODS (2002)

    Google Scholar 

  3. B. Babcock, C. Olston, Distributed top-\(k\) monitoring, in Proc. SIGMOD (2003)

    Google Scholar 

  4. J. Beran, Statistics for Long-Memory Processes (Chapman & Hall, London, 1994)

    MATH  Google Scholar 

  5. T. Bollerslev, Generalized autoregressive conditional heteroskedasticity. J. Econom. 31, 307–327 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  6. P. Bonnet, J.E. Gehrke, P. Seshadri, Towards sensor database systems, in Proc. MDM (2001)

    Google Scholar 

  7. P.J. Brockwell, R.A. Davis, Time Series: Theory and Methods, 2nd edn. Springer Series in Statistics (Springer, Berlin, 1991)

    Book  MATH  Google Scholar 

  8. A. Bulut, A.K. Singh, SWAT: hierarchical stream summarization in large networks, in Proc. 19th ICDE (2003)

    Google Scholar 

  9. L.R. Carley, G.R. Ganger, D. Nagle, Mems-based integrated-circuit mass-storage systems. Commun. ACM 43(11), 72–80 (2000)

    Article  Google Scholar 

  10. D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, S.B. Zdonik, Monitoring streams—a new class of data management applications, in Proc. VLDB (2002)

    Google Scholar 

  11. Y. Chen, G. Dong, J. Han, B.W. Wah, J. Wang, Multi-dimensional regression analysis of time-series data streams, in Proc. VLDB (2002)

    Google Scholar 

  12. J. Considine, F. Li, G. Kollios, J.W. Byers, Approximate aggregation techniques for sensor databases, in Proc. ICDE (2004)

    Google Scholar 

  13. G. Das, K.-I. Lin, H. Mannila, G. Renganathan, P. Smyth, Rule discovery from time series, in Proc. KDD (1998)

    Google Scholar 

  14. M. Datar, A. Gionis, P. Indyk, R. Motwani, Maintaining stream statistics over sliding windows, in Proc. SODA (2002)

    Google Scholar 

  15. M.H. DeGroot, M.J. Schervish, Probability and Statistics, 3rd edn. (Addison-Wesley, Reading, 2002)

    Google Scholar 

  16. A. Dobra, M.N. Garofalakis, J. Gehrke, R. Rastogi, Processing complex aggregate queries over data streams, in Proc. SIGMOD (2002)

    Google Scholar 

  17. C. Faloutsos, Searching Multimedia Databases by Content (Kluwer Academic, Norwell, 1996)

    Book  MATH  Google Scholar 

  18. M.N. Garofalakis, P.B. Gibbons, Wavelet synopses with error guarantees, in Proc. SIGMOD (2002)

    Google Scholar 

  19. J. Gehrke, F. Korn, D. Srivastava, On computing correlated aggregates over continual data streams, in Proc. SIGMOD (2001)

    Google Scholar 

  20. R. Gencay, F. Selcuk, B. Whitcher, An Introduction to Wavelets and Other Filtering Methods in Finance and Economics (Academic Press, San Diego, 2001)

    MATH  Google Scholar 

  21. A.C. Gilbert, Y. Kotidis, S. Muthukrishnan, M. Strauss, Surfing wavelets on streams: one-pass summaries for approximate aggregate queries, in Proc. VLDB (2001)

    Google Scholar 

  22. S. Guha, N. Koudas, Approximating a data stream for querying and estimation: algorithms and performance evaluation, in Proc. ICDE (2002)

    Google Scholar 

  23. J. Hill, R. Szewczyk, A. Woo, S. Hollar, D. Culler, K. Pister, System architecture directions for networked sensors, in Proc. ASPLOS-IX (2000)

    Google Scholar 

  24. P. Indyk, N. Koudas, S. Muthukrishnan, Identifying representative trends in massive time series data sets using sketches, in Proc. VLDB (2000)

    Google Scholar 

  25. W. Leland, M. Taqqu, W. Willinger, D. Wilson, On the self-similar nature of Ethernet traffic. IEEE Trans. Netw. 2(1), 1–15 (1994)

    Article  Google Scholar 

  26. S.R. Madden, M.A. Shah, J.M. Hellerstein, V. Raman, Continuously adaptive continuous queries over streams, in SIGMOD Conf. (2002)

    Google Scholar 

  27. C. Olston, J. Jiang, J. Widom, Adaptive filters for continuous queries over distributed data streams, in Proc. SIGMOD (2003)

    Google Scholar 

  28. T. Palpanas, M. Vlachos, E.J. Keogh, D. Gunopulos, W. Truppel, Online amnesic approximation of streaming time series, in Proc. ICDE (2004)

    Google Scholar 

  29. D.B. Percival, A.T. Walden, Wavelet Methods for Time Series Analysis (Cambridge University Press, Cambridge, 2000)

    Book  MATH  Google Scholar 

  30. E. Riedel, C. Faloutsos, G.R. Ganger, D. Nagle, Data mining on an OLTP system (nearly) for free, in SIGMOD Conf. (2000)

    Google Scholar 

  31. Y. Tao, C. Faloutsos, D. Papadias, B. Liu, Prediction and indexing of moving objects with unknown motion patterns, in Proc. SIGMOD (2004)

    Google Scholar 

  32. A.S. Weigend, N.A. Gerschenfeld, Time Series Prediction: Forecasting the Future and Understanding the Past (Addison-Wesley, Reading, 1994)

    Google Scholar 

  33. B.-K. Yi, N. Sidiropoulos, T. Johnson, H. Jagadish, C. Faloutsos, A. Biliris, Online data mining for co-evolving time sequences, in Proc. ICDE (2000)

    Google Scholar 

  34. P. Young, Recursive Estimation and Time-Series Analysis: An Introduction (Springer, Berlin, 1984)

    Book  MATH  Google Scholar 

  35. D. Zhang, D. Gunopulos, V.J. Tsotras, B. Seeger, Temporal aggregation over data streams using multiple granularities, in Proc. EDBT (2002)

    Google Scholar 

  36. Y. Zhu, D. Shasha, Statstream: statistical monitoring of thousands of data streams in real time, in Proc. VLDB (2002)

    Google Scholar 

  37. Y. Zhu, D. Shasha, Efficient elastic burst detection in data streams, in Proc. KDD (2003)

    Google Scholar 

  38. R. Zuidwijk, P. de Zeeuw, Fast algorithm for directional time-scale analysis using wavelets, in Proc. SPIE, Wavelet Applications in Signal and Image Processing VI, vol. 3458 (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Spiros Papadimitriou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Papadimitriou, S., Brockwell, A., Faloutsos, C. (2016). Adaptive, Automatic Stream Mining. In: Garofalakis, M., Gehrke, J., Rastogi, R. (eds) Data Stream Management. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28608-0_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-28608-0_24

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28607-3

  • Online ISBN: 978-3-540-28608-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics