Abstract
In this paper, a method for fast processing of data stream tuples in parallel execution of continuous queries over a multiprocessing environment is proposed. A copy of the query plan is assigned to each of processing units in the multiprocessing environment. Dynamic and continuous routing of input data stream tuples among the graph constructed by these copies (called the Query Mega Graph) for each input tuple determines that, after getting processed by each processing unit (e.g., processor), to which next processor it should be forwarded. Selection of the proper next processor is performed such that the destination processor imposes the minimum tuple latency to the corresponding tuple, among all of the alternative processors. The tuple latency is derived from processing, buffering and communication time delay which varies in different practical parallel systems.
Parallel system architectures that would be suitable as the desired multiprocessing environment for employing the proposed Dynamic Tuple Routing (DTR) method are considered and analyzed. Also, practical challenges and issues for the proper parallel underlying system are discussed. Implementation of the desired parallel system on multi-core systems is provided and used for evaluating the proposed DTR method. Evaluation results show that the proposed DTR method outperforms similar method such as the Eddies in terms of tuple latency, throughput and tuple loss.
Similar content being viewed by others
Notes
Notations are based on predicate logic in the Z notation [40].
Symmetric Multi-Processors.
ASymmetric Multi-Processors.
Instruction Set Architecture.
Application Programming Interface.
References
Safaei, A.A., Haghjoo, M.S.: Parallel processing of data stream query operators. Distrib. Parallel Databases 282, 93–118 (2010). doi:10.1007/s10619-010-7066-3
Safaei, Ali A., Haghjoo, Mostafa S.: Dispatching stream operators in parallel execution of continuous queries. J. Supercomput. (2011). doi:10.1007/s11227-011-0621-5
Babcock, Brian, et al.: Operator scheduling in data stream systems. VLDB J. 13, 333–353 (2004)
Replicate and migrate objects in the runtime, not cache lines or pages in hardware (Invited Plenary Lecture). In: Barcelona Multicore Workshop 2010, Barcelona, Spain, 21–22 Oct. (2010)
El-Rewini, H., Abd-El-Barr, M.: Advanced Computer Architecture and Parallel Processing. Wiley, Hoboken (2005). doi:10.1002/0471478385.index
Feng, T.Y.: A survey of interconnection networks. Computer 14, 12–27 (1981)
Singah, B.: On multistage interconnection network. M.Sc. thesis (2000)
Aljundi, C., Chadi, A., Jundi, A., Dekeyser, J.-l., Scherson, I.D.: An interconnection networks comparative performance evaluation methodology: the case of delta and over-sized delta multistage interconnection networks. In: Proc. of the 16th International Conference on Parallel and Distributed Computing Systems (2003)
Lawrie, D.H.: Access and alignment of data in an array processor. IEEE Trans. Comput. C-24, 1145–1155 (1975)
Thomas, R.H.: Behavior of butterfly parallel processor in the presence of memory hot spots. In: Proc. of the 1986 Int. Conf. Parallel Processing, pp. 46–50 (1986)
Lin, W., et al.: A conflict routing scheme on multistage interconnection networks. IEEE Trans. Comput. 38(8), 1086–1097 (1989)
Tian, H., Katangur, A.K., Yipan, J.Z.: A novel multistage network architecture with multicast and broadcast capability. J. Supercomput. 35, 277–300 (2006)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proc. of the 6th OSDI Symp. (2004)
Upadhyaya, P., Kwon, Y., Latency, A., Balazinska, M.: Fault-tolerance optimizer for online parallel query plans. In: Proceedings of the ACM SIGMOD (2011)
Grama, A., Karypis, G., Kumar, V., Gupta, A.: Introduction to Parallel Computing, 2nd edn. Addison-Wesley, Reading (2003)
Avnur, R., Hellerstein, J.M.: Eddies: continuously adaptive query processing. In: Proceedings of the ACM SIGMOD (2000)
The Internet traffic archive, http://ita.ee.lbl.gov/html/contrib/DEC-PKT.html
Chakravarthy, S., Pajjuri, V.: Scheduling strategies and their evaluation in a data stream management system. In: Lecture Notes in Computer Science, vol. 4042. Springer, Berlin (2006)
LeBlanc, T.J.: Shared memory versus message passing in a tightly coupled multiprocessor: a case study. In: Proc. 1986 Int. Conf. Parallel Processing, pp. 463–466 (1986)
Babcock, B., et al.: Chain: operator scheduling for memory minimization in data stream systems. In: Proceedings of the ACM SIGMOD International Conference (2003)
Sharaf, M.A.: Preemptive rate-based operator scheduling in a data stream management system. In: IEEE/AICCSA (2005)
Soliman, M.S., Tan, G.: Operator-scheduling using dynamic chain for continuous-query processing. In: IEEE Int. Conference on Computer Science and Software Engineering (2008)
Sharaf, M.A., et al.: Scheduling continuous queries in data stream management systems. In: PVLDB (2008)
Don Carney, et al.: Operator scheduling in a data stream manager. In: Proceedings of the 29th International Conference on Very Large Data Bases, Germany, pp. 838–849 (2003)
Ghalambor, M., Safaeei, Ali A., Azgomi, M.A.: DSMS scheduling regarding complex QoS metrics. In: IEEE/ACS International Conference on Computer Systems and Applications (AICCSA), 10–13 May (2009)
Srivastava B., Widom: exploiting k-constraints to reduce memory overhead in continuous queries over data streams. Technical report, November 2002
Graefe, G., et al.: Extensible query optimization and parallel execution in volcano. In: Query Processing for Advanced Database Systems. Morgan Kaufman, San Mateo (1994)
DeWitt, D.J., Gray, J.: Parallel database systems: the future of high performance database processing. Commun. ACM 36(6), 85–98 (1992)
Graefe, G.: Volcano—an extensible and parallel query evaluation system. IEEE Trans. Knowl. Data Eng. 6(1), 120–135 (1994)
Apers, P.M.G., et al.: PRISMA/DB: a parallel, main memory relational DBMS. IEEE Trans. Knowl. Data Eng. 4(6), 541–554 (1992)
Graefe, G.: Query evaluation techniques for large databases. ACM Comput. Surv. 25, 73–170 (1993)
Abadi, D., et al.: Aurora: a new model and architecture for data stream management. VLDB J. 2, 120–139 (2003)
Deshpande, A.: An initial study of overheads of eddies. SIGMOD Rec. 33, 44–49 (2004)
Tian, F., DeWitt, D.J.: Tuple routing strategies for distributed eddies. In: Proceedings of the 29th VLDB (2000)
Osman, A., Ammar, H.: Dynamic load management for distributed continuous query systems. In: Proceedings of the ICDE (2005)
Zhou, Y., et al.: Efficient dynamic operator placement in a locally distributed continuous query system. In: Lecture Notes in Computer Science, vol. 4275 (2006)
Johnson, T., et al.: Query-aware partitioning for monitoring massive network data streams. In: Proceedings of the ACM SIGMOD (2008)
Tian, F., DeWitt, D.J.: Tuple routing strategies for distributed eddies. In: Proceedings of 29th VLDB Conference, September 2003, pp. 333–344 (2003) (ISBN 0-12-722442-4)
Gu, X., et al.: Online failure forecast for fault-tolerant data stream processing. In: Proceeding of ICDE (2008)
Woodcock, J., Davies, J.: Using Z: Specification, Refinement, and Proof. Prentice-Hall International Series in Computer Science. Prentice-Hall, New York (1996). ISBN: 0-13-948472-8
Babu, S.: Adaptive query processing in data stream management systems. Ph.D. thesis, Stanford University (2005)
Babu, S., Motwani, R., Munagala, K., Nishizawa, I., Widom, J.: Adaptive ordering of pipelined stream filters. In: Proc. SIGMOD Conference, pp. 407–418 (2004)
Das, A., Gehrke, J., Riedewald, M.: Approximate join processing over data streams. In: Proc. SIGMOD Conference, pp. 40–51 (2003)
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Mohamed F. Mokbel.
Rights and permissions
About this article
Cite this article
Safaei, A.A., Sharifrazavian, A., Sharifi, M. et al. Dynamic routing of data stream tuples among parallel query plan running on multi-core processors. Distrib Parallel Databases 30, 145–176 (2012). https://doi.org/10.1007/s10619-012-7090-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-012-7090-6