Skip to main content
Log in

Parallel processing of continuous queries over data streams

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

In this paper, we propose parallel processing of continuous queries over data streams to handle the bottleneck of single processor DSMSs. Queries are executed in parallel over the logical machines in a multiprocessing environment. Scheduling parallel execution of operators is performed via finding the shortest path in a weighted graph called Query Mega Graph (QMG), which is a logical view of K machines. By lapse of time, number of tuples waiting in queues of different operators may be very different. When a queue becomes full, re-scheduling is done by updating weight of edges of QMG. In the new computed path, machines with more workload will be used less. The proposed system is formally presented and its correctness is proved. It is also modeled in PetriNets and its performance is evaluated and compared with serial query processing as well as the Min-Latency scheduling algorithm. The presented system is shown to outperform them w.r.t. tuple latency (response time), memory usage, throughput and also tuple loss- critical parameters in any data stream management systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Babcock, B., et al.: Models and issues in data stream systems. In: Proceeding of PODS, Invited paper (2002)

  2. The STREAM Group.: STREAM: The Stanford stream data manager. IEEE Data Engineering Bulletin, March (2003)

  3. Abadi, D., et al.: Aurora: A new model and architecture for data stream management. VLDB J. 2(12), 120–139 (2003)

    Google Scholar 

  4. Babcock, B.: Models and issues in data stream systems. In: Proceeding of PODS (2002)

  5. Golab, L., Ozsu, M.T.: Issues in data stream management. SIGMOD Records (2003)

  6. Tatbul, N., et al.: Load shedding in a data stream manager. In: Proceedings of VLDB’03, Germany, pp. 309–320 (2003)

  7. Chekuri, C., et al.: Scheduling problems in parallel query optimization. In: Proceeding of PODS (1995)

  8. Babcock, B., et al.: Operator scheduling in data stream systems. VLDB J. 13(4), 333–353 (2004)

    Article  Google Scholar 

  9. Hong, W.: Parallel query processing using shared memory multiprocessors and disk arrays. Ph.D. thesis (1992)

  10. Hasan, W., et al.: Open issues in parallel query optimization. ACM SIGMOD Rec. 25(3), 28–33 (1996)

    Article  Google Scholar 

  11. Graefe, G., et al.: Extensible Query Optimization and Parallel Execution in Volcano. Query Processing for Advanced Database Systems. Morgan Kaufmann, San Mateo (1994)

    Google Scholar 

  12. DeWitt, D.J., Gray, J.: Parallel database systems: The future of high performance database processing. Commun. ACM 36(6), 377–387 (1992)

    Google Scholar 

  13. Babu, S., Widom, J.: Continuous queries over data streams. In: SIGMOD Record (2001)

  14. DeWitt, C., Naughton: Design and evaluation of alternative selection placement strategies in optimizing continuous queries. In: Proceeding of ICDE (2002)

  15. Movaghar, A.: Performability modeling with stochastic activity networks, Ph.D. dissertation, University of Michigan (1985)

  16. Abdollahi Azgomi, M., Movaghar, A.: Coloured stochastic activity networks: Definitions and behavior. In: Proceeding of UKPEW’04, UK, pp. 297–308 (2004)

  17. Khalili, A., et al.: PDETool: A multi-formalism modeling tool for discrete-event systems based on SDES description. In: Lecture Notes in Computer Science, vol. 5606, pp. 343–352. Springer, Berlin (2009)

    Google Scholar 

  18. Carney, D., et al.: Operator scheduling in a data stream manager. In: Proceedings of the VLDB, Germany, pp. 838–849 (2003)

  19. Arasu, et al.: Characterizing memory requirements for queries over continuous data streams. In: Proceeding of the PODS (2002)

  20. Babcock, B., et al.: Chain: Operator scheduling for memory minimization in data stream systems. In: Proceeding of the ACM SIGMOD (2003)

  21. Babu, et al.: Exploiting k-constraints to reduce memory overhead in continuous queries over data streams. Technical Report, November (2002)

  22. Ghalambor, M., Safaeei, A.A., Azgomi, M.A.: DSMS scheduling regarding complex QoS metrics. In: IEEE/ACS International Conference on Computer Systems and Applications (AICCSA), 10–13 May 2009

  23. Abadi, D.J., et al.: The design of the Borealis stream processing engine. In: Proceeding of the CIDR (2005)

  24. Xing, Y., et al.: Dynamic load distribution in the Borealis stream processor. In: Proceeding of ICDE (2005)

  25. Babu, S.: Adaptive query processing in data stream management systems. Ph.D. thesis (2005)

  26. Graefe, G.: Volcano—An extensible and parallel query evaluation system. IEEE Trans. Knowl. Data Eng. 6(1), 120–135 (1994)

    Article  Google Scholar 

  27. Apers, P.M.G., et al.: PRISMA/DB: A parallel, main memory relational DBMS. IEEE Trans. Knowl. Data Eng. 4(6), 4–24 (1992)

    Article  Google Scholar 

  28. Beringer, J., Hullermeier, E.: Online clustering of parallel data streams. Int. J. Data Knowl. Eng. 58(2), 180–204 (2006)

    Article  Google Scholar 

  29. Kramer1, J.: Dynamic plan migration for snapshot-equivalent continuous queries in data stream systems. In: Proceeding of ICSWN (2006)

  30. Zhu, Y., et al.: Dynamic plan migration for continuous queries over data streams. In: Proceeding of SIGMOD (2004)

  31. Safaei, A.A., et al.: Using finite state machines in processing continuous queries. Int. Rev. Comput. Softw. 4(5), 551–556 (2009)

    Google Scholar 

  32. Tian, F., DeWitt, D.J.: Tuple routing strategies for distributed eddies. In: Proceeding of the VLDB (2003)

  33. Chakravarthy, S., Pajjuri, V.: Scheduling Strategies and Their Evaluation in a Data Stream Management System. LNCS, vol. 4042. Springer, Berlin (2006)

    Google Scholar 

  34. Shah, M.A., et al.: Flux: An adaptive partitioning operator for continuous query systems. In: Proceeding of the ICDE (2003)

  35. Tian, F., DeWitt, D.J.: Tuple routing strategies for distributed eddies. In: Proceeding of the VLDB (2003)

  36. Nehme, R.V., et al.: Query mesh: Multiroute query processing technology. In: Proceeding of the VLDB (2009)

  37. Nehme, R., et al.: Self-tuning query mesh for adaptive multi-route query processing. In: Proceeding of the ACM EDBT (2009)

  38. Alqadi, Z.A.A., et al.: Performance analysis and evaluation of parallel matrix multiplication algorithms. World Appl. Sci. J. 5(2), 211–214 (2008)

    Google Scholar 

  39. Akl, S.G., et al.: Data-movement-intensive problems: Two folk theorems in parallel computation revisited. Theor. Comput. Sci. 95, 323–337 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  40. Almasi, G.S., Gottlieb, A.: Highly Parallel Computing. Benjamin/Cummings, Redwood City (1989)

    MATH  Google Scholar 

  41. Cosnard, M., Trystram, D.: Parallel Algorithms and Architectures. International Thompson Computer Press, Boston (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ali A. Safaei.

Additional information

Communicated by Ahmed K. Elmagarmid.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Safaei, A.A., Haghjoo, M.S. Parallel processing of continuous queries over data streams. Distrib Parallel Databases 28, 93–118 (2010). https://doi.org/10.1007/s10619-010-7066-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-010-7066-3

Keywords

Navigation