Abstract
Sliding-window multi-stream join (SWMJ) is a fundamental operation for correlating information from different streams. We provide a solution to the problem of assessing significance of the SWMJ result by focusing on the relative frequency of windows satisfying a given equijoin predicate as the most important parameter of the SWMJ result. In particular, we derive a formula for computing the expected relative frequency of windows satisfying a given equijoin predicate that can be evaluated in quadratic time in the window size given a proposed probabilistic model of the multi-stream. In experiments conducted on a daily rainfall data set we demonstrate the remarkable accuracy of our method, which confirms our theoretical analysis.
Similar content being viewed by others
References
Xie J, Yang J. A survey of join processing in data streams. In: Aggarwal C C, eds. Data Streams-Models and Algorithms. Advances in Database Systems, Vol 31. Berlin: Springer, 2007, 209–306
Golab L, Özsu M T. Processing sliding window multi-joins in continuous queries over data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases. 2003, 500–511
Gu X, Wen Z, Lin C, Yu P. ViCo: an adaptive distributed video correlation system. In: Proceedings of the 14th ACM International Conference on Multimedia. 2006, 559–568
Hammad M A, Aref W G, Elmagarmid A K. Query processing of multi-way stream window joins. VLDB Journal, 2008, 17(3): 469–488
Gwadera R. MDL-based segmentation of multi-attribute sequences. In: Proceedings of 2011 IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services. 2011, 106–111
Iwanuma K, Ishihara R, Takano Y, Nabeshima H. Extracting frequent subsequences from a single long data sequence: a novel anti-monotonic measure and a simple on-line algorithm. In: Proceedings of the 5th IEEE International Conference on Data Mining. 2005, 186–193
Wilschut A N, Apers P M G. Dataflow query execution in a parallel main-memory. In: Proceedings of the 1st International Conference on Parallel and Distributed Information Systems. 1991, 68–77
Oates T, Cohen P R. Searching for structure in multiple streams of data. In: Proceedings of the 13th International Conference on Machine Learning. 1996, 346–354
Srivastava U, Widom J. Memory-limited execution of windowed stream joins. In: Proceedings of the 30th International Conference on Very Large Data Bases. 2004, 324–335
Gwadera R, Gionis A, Mannila H. Optimal segmentation using tree models. Knowledge and Information Systems, 2008, 15(3): 259–283
Gwadera R, Atallah M J, Szpankowski W. Markov models for identifi- cation of significant episodes. In: Proceedings of 2005 SIAM International Data Mining Conference. 2005, 404–414
Gwadera R, Atallah M J, Szpankowski W. Reliable detection of episodes in event sequences. Knowledge and Information Systems, 2005, 7(4): 415–437
Gwadera R, Crestani F. Discovering significant patterns in multistream sequences. In: Proceedings of the 8th IEEE International Conference on Data Mining. 2008, 827–832
Atallah M J, Gwadera R, Szpankowski W. Detection of significant sets of episodes in event sequences. In: Proceedings of the 4th IEEE International Conference on Data Mining. 2004, 3–10
Hughes J P, Guttorp P, Charles S P. A non-homogeneous hidden Markov model for precipitation occurrence. Journal of the Royal Statistical Society: Series C (Applied Statistics), 1999, 48(1): 15–30
Author information
Authors and Affiliations
Corresponding author
Additional information
Robert Gwadera received his M.S. in Electrical and Computer Engineering from the Technical University of Gdansk, Poland in 1995. He received his M.S. and Ph.D in Computer Sciences from Purdue University in 2003 and 2005, correspondingly. He was then a researcher at Helsinki University of Technology, Finland and a Hasler Foundation researcher at University of Lugano, Switzerland. He is currently a research staff member in data analytics at IBM Zurich Research Laboratory, Switzerland. His research interests span data mining, machine learning, and databases.
Rights and permissions
About this article
Cite this article
Gwadera, R. Multi-stream join answering for mining significant cross-stream correlations. Front. Comput. Sci. 6, 131–142 (2012). https://doi.org/10.1007/s11704-012-2862-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11704-012-2862-8