Skip to main content
Log in

Multi-stream join answering for mining significant cross-stream correlations

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Sliding-window multi-stream join (SWMJ) is a fundamental operation for correlating information from different streams. We provide a solution to the problem of assessing significance of the SWMJ result by focusing on the relative frequency of windows satisfying a given equijoin predicate as the most important parameter of the SWMJ result. In particular, we derive a formula for computing the expected relative frequency of windows satisfying a given equijoin predicate that can be evaluated in quadratic time in the window size given a proposed probabilistic model of the multi-stream. In experiments conducted on a daily rainfall data set we demonstrate the remarkable accuracy of our method, which confirms our theoretical analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Xie J, Yang J. A survey of join processing in data streams. In: Aggarwal C C, eds. Data Streams-Models and Algorithms. Advances in Database Systems, Vol 31. Berlin: Springer, 2007, 209–306

    Google Scholar 

  2. Golab L, Özsu M T. Processing sliding window multi-joins in continuous queries over data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases. 2003, 500–511

  3. Gu X, Wen Z, Lin C, Yu P. ViCo: an adaptive distributed video correlation system. In: Proceedings of the 14th ACM International Conference on Multimedia. 2006, 559–568

  4. Hammad M A, Aref W G, Elmagarmid A K. Query processing of multi-way stream window joins. VLDB Journal, 2008, 17(3): 469–488

    Article  Google Scholar 

  5. Gwadera R. MDL-based segmentation of multi-attribute sequences. In: Proceedings of 2011 IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services. 2011, 106–111

  6. Iwanuma K, Ishihara R, Takano Y, Nabeshima H. Extracting frequent subsequences from a single long data sequence: a novel anti-monotonic measure and a simple on-line algorithm. In: Proceedings of the 5th IEEE International Conference on Data Mining. 2005, 186–193

  7. Wilschut A N, Apers P M G. Dataflow query execution in a parallel main-memory. In: Proceedings of the 1st International Conference on Parallel and Distributed Information Systems. 1991, 68–77

  8. Oates T, Cohen P R. Searching for structure in multiple streams of data. In: Proceedings of the 13th International Conference on Machine Learning. 1996, 346–354

  9. Srivastava U, Widom J. Memory-limited execution of windowed stream joins. In: Proceedings of the 30th International Conference on Very Large Data Bases. 2004, 324–335

  10. Gwadera R, Gionis A, Mannila H. Optimal segmentation using tree models. Knowledge and Information Systems, 2008, 15(3): 259–283

    Article  Google Scholar 

  11. Gwadera R, Atallah M J, Szpankowski W. Markov models for identifi- cation of significant episodes. In: Proceedings of 2005 SIAM International Data Mining Conference. 2005, 404–414

  12. Gwadera R, Atallah M J, Szpankowski W. Reliable detection of episodes in event sequences. Knowledge and Information Systems, 2005, 7(4): 415–437

    Article  Google Scholar 

  13. Gwadera R, Crestani F. Discovering significant patterns in multistream sequences. In: Proceedings of the 8th IEEE International Conference on Data Mining. 2008, 827–832

  14. Atallah M J, Gwadera R, Szpankowski W. Detection of significant sets of episodes in event sequences. In: Proceedings of the 4th IEEE International Conference on Data Mining. 2004, 3–10

  15. Hughes J P, Guttorp P, Charles S P. A non-homogeneous hidden Markov model for precipitation occurrence. Journal of the Royal Statistical Society: Series C (Applied Statistics), 1999, 48(1): 15–30

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robert Gwadera.

Additional information

Robert Gwadera received his M.S. in Electrical and Computer Engineering from the Technical University of Gdansk, Poland in 1995. He received his M.S. and Ph.D in Computer Sciences from Purdue University in 2003 and 2005, correspondingly. He was then a researcher at Helsinki University of Technology, Finland and a Hasler Foundation researcher at University of Lugano, Switzerland. He is currently a research staff member in data analytics at IBM Zurich Research Laboratory, Switzerland. His research interests span data mining, machine learning, and databases.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gwadera, R. Multi-stream join answering for mining significant cross-stream correlations. Front. Comput. Sci. 6, 131–142 (2012). https://doi.org/10.1007/s11704-012-2862-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-012-2862-8

Keywords

Navigation