Abstract
Computing statistical measures is a fundamental problem for mining data streams. Sometimes user wants to query the realtime correlation of data streams. In this paper, we introduce a system for computing realtime statistical measures of data streams. The system updates the realtime summaries which are used to compute affine relationships. We process every elements in every data stream only once, and get a similar accuracy rating compared with the static methods. To the best of our knowledge, we present a new method of computing affine relationship. Our system employs the multi-Hierarchies approach in the Sliding Window Model. First, we change AFCLST Clustering algorithm. Second, the Bottom-Up Updating algorithm updates the summaries which every hierarchy has stored after the Cumulative Calculation algorithms. Third, the Query Response algorithm uses summaries to compute the statistical measure. Finally, we establish the accuracy rating of our approach by performing several experiments on real datasets.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Sathe, S., Aberer, K.: AFFINITY: Efficiently Querying Statistical Measures on Time-Series Data. In: ICDE 2006 (2013)
Zhu, Y., Shasha, D.: Statstream:Statistical monitoring of thousands of data streams in real time. In: VLDB, pp. 358–369 (2002)
Li, C.-S., Yu, P.S., Castelli, V.: HierarchyScan:A hierarchical similarity search algorithm for databases of long sequences. In: ICDE, pp. 546–553 (1996)
Cole, R., Shasha, D., Zhao, X.: Fast window correlations over uncooperative time series. In: SIGKDD, pp. 743–749 (2005)
Maronna, R., Martin, R., Yohai, V.: Robust statics. Wiley Series in Probability and Statistics (2006)
Golub, G., Van Loan, C.: Matrix computations. The Johns Hopkins University Press (1996)
Sathe, S., Aberer, K.: AFFINITY:Efficiently querying statistical measures on time-series data. EPFL. Tech. Rep. (2012), http://infoscience.epfl.cn/record/180121
Bishop, C.: Pattern recognition and machine learning. Springer (2006)
Gehrke, J., Korn, F., Srivastava, D.: Oncomputing correlated aggregates over continual data streams. In: SIGMOD, pp. 13–24 (2001)
Ke, Y., Cheng, J., Ng, W.: Correlation search in graph databases. In: SIGKDD, pp. 390–399 (2007)
Agrawal, R., Lin, K., Sawhney, H., Shim, K.: Fast similarity search in the presence of noise,scaling and translation in time-series databses. In: VLDB (1995)
Reeves, G., Liu, J., Nath, S., Zhao, F.: Managing massive time series streams with multi-scale compressed trickles. In: VLDB, pp. 97–108 (2009)
Bulut, A., Singh, A.: SWAT: Hierarchical stream Summarization in Large Networks. In: Proc. of the 19th International Conference on Data Engineering, pp. 303–314 (2003)
Bulut, A., Ambuj, K., Singh, A.: A Unified Framework for Monitoring Data Stream in Real Time. In: Proc. of the 21st International Conference on Data Engineering, pp. 44–55 (2005)
Richard, A.J., Dean, W.W.: Applied Multivariate Statical Analysis, 6th edn. Prentice Hall, New York (2007)
Rodrigues, P.P., Gama, J., Pedroso, J.P.: ODAC: Hierarchical clustering of time series data streams. In: SIAM (2006)
Domingos, P., Hulten, C.: Mining high-speed data streams. In: Proc. of the KDD (2000), http://citeseer.ist.psu.edu/domingos00mining.html
Greenwald, M., Khanna, S.: Space-efficient online computation of quantile summaries. In: SIGMOD, pp. 58–66 (2001)
Qiao, L., Agrawal, D., El Abbadi, A.: Rhist: adaptive summarization over continuous data streams. In: Proceeding of the Eleventh International Conference on Information and Knowledge Management, pp. 469–476 (2002)
Babcock, B., Datar, M., Motwani, R., Callaghan, L.: Maintaining covariance and k-medians over data stream windows. In: Proc. of the 22nd ACM SIGACT-SIGMOD-SIGART Symp., Principles of Database Systems, pp. 234–243 (2003)
Jagadish, H., Mendelzon, A.: Similarity-based queries for time series data. In: SIGMOD, pp. 13–25 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Qi, P., Shi, S. (2014). Multi-Hierarchies: Accurately Computing Realtime Statistical Measures on Data Streams. In: Cai, Z., Wang, C., Cheng, S., Wang, H., Gao, H. (eds) Wireless Algorithms, Systems, and Applications. WASA 2014. Lecture Notes in Computer Science, vol 8491. Springer, Cham. https://doi.org/10.1007/978-3-319-07782-6_65
Download citation
DOI: https://doi.org/10.1007/978-3-319-07782-6_65
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07781-9
Online ISBN: 978-3-319-07782-6
eBook Packages: Computer ScienceComputer Science (R0)