Abstract
Sensor devices and embedded processors are becoming widespread, especially in measurement/monitoring applications. Their limited resources (CPU, memory and/or communication bandwidth and power) pose some interesting challenges. We need concise, expressive models to represent the important features of the data, and lend themselves to efficient estimation. In particular, under these severe constraints, we want models and estimation methods which (a) require little memory and a single pass over the data, (b) can adapt and handle arbitrary periodic components, and (c) can deal with various types of noise. We propose AWSOM (Arbitrary Window Stream mOdeling Method), which allows sensors in remote or hostile environments to efficiently and effectively discover interesting patterns and trends. This can be done automatically, i.e., with no prior inspection of the data or any user intervention and expert tuning before or during data gathering. Our algorithms require limited resources and can thus be incorporated in sensors—possibly alongside a distributed query processing engine. Updates are performed in constant time with respect to stream size, using logarithmic space. Existing forecasting methods (SARIMA, GARCH, etc.) or “traditional” Fourier and wavelet analysis fall short on one or more of these requirements. To the best of our knowledge, AWSOM is the first framework that combines all of the above characteristics.
This material is based upon work supported by the National Science Foundation under Grants No. DMS-9819950 and IIS-0083148.
This material is based upon work supported by the National Science Foundation under Grants No. IIS-9817496, IIS-9988876, IIS-0083148, IIS-0113089, IIS-0209107 IIS-0205224 INT-0318547 SENSOR-0329549 EF-0331657IIS-0326322 by the Pennsylvania Infrastructure Technology Alliance (PITA) Grant No. 22-901-0001, and by the Defense Advanced Research Projects Agency under Contract No. N66001-00-1-8936. Additional funding was provided by donations from Intel, and by a gift from Northrop-Grumman Corporation. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, or other funding parties.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
M. Akay (ed.), Time Frequency and Wavelets in Biomedical Signal Processing (Wiley, New York, 1997)
A. Arasu, B. Babcock, S. Babu, J. McAlister, J. Widom, Characterizing memory requirements for queries over continuous data streams, in PODS (2002)
B. Babcock, C. Olston, Distributed top-\(k\) monitoring, in Proc. SIGMOD (2003)
J. Beran, Statistics for Long-Memory Processes (Chapman & Hall, London, 1994)
T. Bollerslev, Generalized autoregressive conditional heteroskedasticity. J. Econom. 31, 307–327 (1986)
P. Bonnet, J.E. Gehrke, P. Seshadri, Towards sensor database systems, in Proc. MDM (2001)
P.J. Brockwell, R.A. Davis, Time Series: Theory and Methods, 2nd edn. Springer Series in Statistics (Springer, Berlin, 1991)
A. Bulut, A.K. Singh, SWAT: hierarchical stream summarization in large networks, in Proc. 19th ICDE (2003)
L.R. Carley, G.R. Ganger, D. Nagle, Mems-based integrated-circuit mass-storage systems. Commun. ACM 43(11), 72–80 (2000)
D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, S.B. Zdonik, Monitoring streams—a new class of data management applications, in Proc. VLDB (2002)
Y. Chen, G. Dong, J. Han, B.W. Wah, J. Wang, Multi-dimensional regression analysis of time-series data streams, in Proc. VLDB (2002)
J. Considine, F. Li, G. Kollios, J.W. Byers, Approximate aggregation techniques for sensor databases, in Proc. ICDE (2004)
G. Das, K.-I. Lin, H. Mannila, G. Renganathan, P. Smyth, Rule discovery from time series, in Proc. KDD (1998)
M. Datar, A. Gionis, P. Indyk, R. Motwani, Maintaining stream statistics over sliding windows, in Proc. SODA (2002)
M.H. DeGroot, M.J. Schervish, Probability and Statistics, 3rd edn. (Addison-Wesley, Reading, 2002)
A. Dobra, M.N. Garofalakis, J. Gehrke, R. Rastogi, Processing complex aggregate queries over data streams, in Proc. SIGMOD (2002)
C. Faloutsos, Searching Multimedia Databases by Content (Kluwer Academic, Norwell, 1996)
M.N. Garofalakis, P.B. Gibbons, Wavelet synopses with error guarantees, in Proc. SIGMOD (2002)
J. Gehrke, F. Korn, D. Srivastava, On computing correlated aggregates over continual data streams, in Proc. SIGMOD (2001)
R. Gencay, F. Selcuk, B. Whitcher, An Introduction to Wavelets and Other Filtering Methods in Finance and Economics (Academic Press, San Diego, 2001)
A.C. Gilbert, Y. Kotidis, S. Muthukrishnan, M. Strauss, Surfing wavelets on streams: one-pass summaries for approximate aggregate queries, in Proc. VLDB (2001)
S. Guha, N. Koudas, Approximating a data stream for querying and estimation: algorithms and performance evaluation, in Proc. ICDE (2002)
J. Hill, R. Szewczyk, A. Woo, S. Hollar, D. Culler, K. Pister, System architecture directions for networked sensors, in Proc. ASPLOS-IX (2000)
P. Indyk, N. Koudas, S. Muthukrishnan, Identifying representative trends in massive time series data sets using sketches, in Proc. VLDB (2000)
W. Leland, M. Taqqu, W. Willinger, D. Wilson, On the self-similar nature of Ethernet traffic. IEEE Trans. Netw. 2(1), 1–15 (1994)
S.R. Madden, M.A. Shah, J.M. Hellerstein, V. Raman, Continuously adaptive continuous queries over streams, in SIGMOD Conf. (2002)
C. Olston, J. Jiang, J. Widom, Adaptive filters for continuous queries over distributed data streams, in Proc. SIGMOD (2003)
T. Palpanas, M. Vlachos, E.J. Keogh, D. Gunopulos, W. Truppel, Online amnesic approximation of streaming time series, in Proc. ICDE (2004)
D.B. Percival, A.T. Walden, Wavelet Methods for Time Series Analysis (Cambridge University Press, Cambridge, 2000)
E. Riedel, C. Faloutsos, G.R. Ganger, D. Nagle, Data mining on an OLTP system (nearly) for free, in SIGMOD Conf. (2000)
Y. Tao, C. Faloutsos, D. Papadias, B. Liu, Prediction and indexing of moving objects with unknown motion patterns, in Proc. SIGMOD (2004)
A.S. Weigend, N.A. Gerschenfeld, Time Series Prediction: Forecasting the Future and Understanding the Past (Addison-Wesley, Reading, 1994)
B.-K. Yi, N. Sidiropoulos, T. Johnson, H. Jagadish, C. Faloutsos, A. Biliris, Online data mining for co-evolving time sequences, in Proc. ICDE (2000)
P. Young, Recursive Estimation and Time-Series Analysis: An Introduction (Springer, Berlin, 1984)
D. Zhang, D. Gunopulos, V.J. Tsotras, B. Seeger, Temporal aggregation over data streams using multiple granularities, in Proc. EDBT (2002)
Y. Zhu, D. Shasha, Statstream: statistical monitoring of thousands of data streams in real time, in Proc. VLDB (2002)
Y. Zhu, D. Shasha, Efficient elastic burst detection in data streams, in Proc. KDD (2003)
R. Zuidwijk, P. de Zeeuw, Fast algorithm for directional time-scale analysis using wavelets, in Proc. SPIE, Wavelet Applications in Signal and Image Processing VI, vol. 3458 (1998)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Papadimitriou, S., Brockwell, A., Faloutsos, C. (2016). Adaptive, Automatic Stream Mining. In: Garofalakis, M., Gehrke, J., Rastogi, R. (eds) Data Stream Management. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28608-0_24
Download citation
DOI: https://doi.org/10.1007/978-3-540-28608-0_24
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28607-3
Online ISBN: 978-3-540-28608-0
eBook Packages: Computer ScienceComputer Science (R0)