Abstract
In the era of data explosion, high volume of various data is generated rapidly at each moment of time; and if not processed, the profits of their latent information would be missed. This is the main current challenge of most enterprises and Internet mega-companies (also known as the big data problem). Big data is composed of three dimensions: Volume, Variety, and Velocity. The velocity refers to the high speed, both in data arrival rate (e.g., streaming data) and in data processing (i.e., real-time processing). In this paper, the velocity dimension of big data is concerned; so, real-time processing of streaming big data is addressed in detail. For each real-time system, to be fast is inevitable and a necessary condition (although it is not sufficient and some other concerns e.g., real-time scheduling must be issued, too). Fast processing is achieved by parallelism via the proposed deadline-aware dispatching method. For the other prerequisite of real-time processing (i.e., real-time scheduling of the tasks), a hybrid clustering multiprocessor real-time scheduling algorithm is proposed in which both the partitioning and global real-time scheduling approaches are employed to have better schedulablity and resource utilization, with a tolerable overhead. The other components required for real-time processing of streaming big data are also designed and proposed as real time streaming big data (RT-SBD) processing engine. Its prototype is implemented and experimentally evaluated and compared with the Storm, a well-known real-time streaming big data processing engine. Experimental results show that the proposed RT-SBD significantly outperforms the Storm engine in terms of proportional deadline miss ratio, tuple latency and system throughput.
Similar content being viewed by others
Notes
Earliest deadline first.
Rate monotonic.
Real time-streaming big data.
For each edge (a,b) from an operator to its immediate subsequent in the graph, the weight is processing time of all operators running on the destination operator’s machine (i.e., aggregation of number of tuples waiting in input queue of each operator multiplied by the corresponding operator’s execution time).
An algorithm is said to be work conserving if it does not idle any processor when one or more jobs are pending, and non- work conserving, otherwise.
Data stream management system.
Proportional-integral-derivative.
Proportional-integral.
Single real-time disp.
Quick Real-time Stream processor.
Complex-event processing.
References
Abadi D et al (2003) Aurora: a new model and architecture for data stream management. VLDB J 12(2):120–139
Alemi M, Safaei AA, Hagjhoo MS, Abdi F (2011) PDMRTS: multiprocessor real-time scheduling considering process distribution in data stream management system. In: International conference on digital information and communication technology and its applications, pp 166–179
Anderson J, Srinivasan A (2000) Early release fair scheduling. In: Proceedings of the euromicro conference on real-time systems. IEEE Computer Society Press, Stockholm, pp 35–43
Anderson J, Srinivasan A (2004) Mixed Pfair/ERfair scheduling of asynchronous periodic tasks. J Comput Syst Sci 68(1):157–204
Andersson B, Jonsson J (2003) The utilization bounds of partitioned and pfair static-priority scheduling on multiprocessors are 50 percent. In: 15th euromicro conference on real-time systems (ECRTS’03), Porto, Portugal, 02–04 July
Andersson B, Tovar E (2006) Multiprocessor scheduling with few preemptions. In: Proceedings of the international conference on embedded and real-time computing systems and applications (RTCSA)
Arasu A et al (2004) Linear road: a stream data management benchmark. In: Proceedings of the thirtieth international conference on very large data bases, vol 30. VLDB Endowment
Åsberg M et al (2012) Exsched: an external cpu scheduler framework for real-time systems. In: 2012 IEEE 18th international conference on embedded and real-time computing systems and applications (RTCSA). IEEE
Astrom KJ, Hagglund TH (1995) New tuning methods for PID controllers. In: Proceedings of the 3rd European control conference
Babcock B et al (2003) Models and issues in data stream systems. In: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems
Babcock B et al (2004) Load shedding for aggregation queries over data streams. In: International conference on data engineering (ICDE)
Babcock B et al (2004) Operator scheduling in data stream systems. VLDB J 13(4):333–353
Bans JM, Arenas A, Labarta J (2002) Efficient scheme to allocate soft-aperiodic tasks in multiprocessor hard real-time systems. In: PDPTA, pp 809–815
Baruah N et al (1996) Proportionate progress: a notion of fairness in resource allocation. Algorithmica 15:600–625
Baruah S, Gehrke J, Plaxton C (1995) Fast scheduling of periodic tasks on multiple resources. In: Proceedings of the 9th international parallel processing symposium, April 1995, pp 280–288
Beloglazov A, Abawajy J, Buyya R (2012) Energy-aware allocation heuristics for efficient management of data centers for cloud computing. Future Gener Comput Syst 28:755–768
Bestavros A, Nagy S (1996) An admission control paradigm for real-time databases. Technical Report BUCS-TR-96-902, Computer Science Department, Boston University, Boston
Bestavros A, Nagy S (1996) Value-cognizant admission control for RTDB systems. In: IEEE 16th real-time systems symposium, December 1996
Block A et al (2008) An adaptive framework for multiprocessor real-time system. In: Euromicro conference on real-time systems (ECRTS’08). IEEE
Bollella G, James G (2000) The real-time specification for Java. Computer 33(6):47–54
Bu Y et al (2010) HaLoop: efficient iterative data processing on large clusters. Proc VLDB Endow 3(1–2):285–296
Carpenter J et al (2004) A categorization of real-time multiprocessor scheduling problems and algorithms. In: Handbook on scheduling: algorithms, models and performance analysis
Chen Philip CL, Zhang C-Y (2014) Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf Sci 275:314–347
Condie T et al (2010) MapReduce Online.In: NSDI, vol. 10, no. 4
Devi UC (2006) Soft real-time scheduling on multiprocessors. Ph.D. Thesis, University of North Carolina at Chapel Hill
Dhall S, Liu C (1978) On a real-time scheduling problem. Oper Res 26:127–140
Elliott GA, Ward BC, Anderson JH (2013) GPUSync: a framework for real-time GPU management. In: 2013 IEEE 34th real-time systems symposium (RTSS). IEEE
Golab L (2004) Querying sliding windows over on-line data streams. In: Proceedings of ICDE/EDBT Ph.D. workshop, March, pp 1–10
Golatowski F et al (2002) Framework for validation, test and analysis of real-time scheduling algorithms and scheduler implementations. In: Proceedings of the 13th IEEE international workshop on rapid system prototyping, 2002. IEEE
Graefe G et al (1994) Extensible query optimization and parallel execution in volcano. Query processing for advanced database systems, Morgan Kafman
Holman P, Anderson J (2006) Group-based pfair scheduling. Real Time Syst 32(1–2):125–168
http://www.espertech.com/esper/. Accessed 20 May 2016
https://storm.apache.org/. Accessed 20 March 2015
Jamin S et al (1993) An admission control algorithm for predictive real-time service. LNCS 712(1993):347–356
Johnson T et al (2008) Query-aware partitioning for monitoring massive network data streams. In: SIGMOD
Kato S, Yamasaki N (2007) Real-time scheduling with task splitting on multiprocessors. In: Proceedings of the international conference on embedded and real-time computing systems and applications, pp 441–450
Kato S, Yamasaki N (2008) Portioned EDF-based scheduling on multiprocessors. In: Proceedings of the international conference on embedded software, pp 139–148
Kato S, Yamasaki N (2008) Scheduling aperiodic tasks using total bandwidth server on multiprocessors. EUC, vol 1, pp 82–89
Kleiminger W, Kalyvianaki E, Pietzuch P (2011) Balancing load in stream processing with the cloud.In: 2011 IEEE 27th international conference on data engineering workshops (ICDEW). IEEE
Kontaki M (2010) Continuous processing of preference queries in data streams. In: 36th international conference on current trends in theory and practice of computer science (SOFSEM)
Kramer J (2009) Continuous queries over data streams- semantics and implementation. kra
Kulkarni S et al (2015) Twitter heron: stream processing at scale. In: Proceeding of the ACM SIGMOD’15, pp 239–250
Kwon J, Cho H, Ravindran B (2012) A framework accommodating categorized multiprocessor real-time scheduling in the RTSJ. In: Proceedings of the 10th international workshop on java technologies for real-time and embedded systems. ACM
Lakshmanan K et al (2009) Partitioned fixed-priority preemptive scheduling formulti-core processors. In: Proceedings of the euromicro conference on real-time systems, pp 39–248
Lam W et al (2012) Muppet: MapReduce-style processing of fast data. Proc VLDB Endow 5(12):1814–1825
Lehner W, Sattler K-U (2013) Web-scale data management for the cloud. Springer, Berlin
Leontyev H (2010) Compositional analysis techniques for multiprocessor soft real-time scheduling. Ph. D. Thesis, University of North Carolina at Chapel Hill
Li X, Wang HA (2007) Adaptive real-time query scheduling over data streams. VLDB ’07, 23–28 September, Vienna
Lopez J, Garcia M, Diaz J, Garcia D (2000) Worst-case utilization bound for EDF scheduling on real-time multiprocessor systems. In: Proceedings of the 12th euromicro conference on real-time systems, June, pp 25–33
Ma L et al (2009) Real-time scheduling for continuous queries with deadlines. SAC’09, Honolulu, HI
Marisol G-V, Tommaso C, Chenyang L (2014) Challenges in real-time virtualization and predictable cloud computing. J Syst Architect 60:726–740
Mohammadi S (2010) Continuous query response time improvement based on system conditions and stream featuress. M.Sc. Thesis, Iran University of Science and Technology
Neumeyer L et al (2010) S4: distributed stream computing platform. 2010 IEEE international conference on data mining workshops (ICDMW). IEEE
Regehr J, Stankovic JA (2001) HLS: a framework for composing soft real-time schedulers. In: Proceedings of the 22nd IEEE real-time systems symposium (RTSS 2001). IEEE
Safaei AA, Haghjoo MS, Abdi F (2011) PFGN: a hybrid multiprocessor real-time scheduling algorithm for data stream management systems. In: Proceeding of international conference on digital information and communication technology and its applications, pp 180–192
Safaei AA, Alemi M, Haghjoo MS, Mohammadi S (2011) Hybrid multiprocessor real-time scheduling approach. Int J Comput Sci Issues 8(2):171
Safaei AA, Sharif-Razavian A, Sharifi M, Haghjoo MS (2012) Dynamic routing of data stream tuples among parallel query plan running on multi-core processors. J Distrib Parallel Databases 30(2):145–176. doi:10.1007/s10619-012-7090-6
Safaei AA, Haghjoo MS (2010) Parallel processing of continuous queries over data streams. Distrib Parallel Databases 28(2–3):93–118. doi:10.1007/s10619-010-70663
Safaei AA, Haghjoo MS (2012) Dispatching of stream operators in parallel execution of continuous queries. J Supercomput 61(3):619–641. doi:10.1007/s11227-011-0621-5
Safaei AA, Haghjoo MS (2014) Parallel processing of data streams. J Comput Sci Eng 11(2):11–29
Srinivasan A (2003) Effcient and flexible fair scheduling of real-time tasks on multiprocessors. Ph.D. Thesis, University of North Carolina, Chapel Hill
Srinivasan A, Anderson JH (2004) Efficient scheduling of soft real-time applications on multiprocessors. J Embed Comput 1(3):1–14
Stankovic JA et al (1999) Misconceptions about real-time databases. J Comput 32(6):29–36
Stankovic JA, Ramamritham K (1990) What is predictability for real-time systems? Real Time Syst 2(4):247–254
Stonebraker M et al (2005) The 8 requirements of real-time stream processing. SIGMOD Rec 34(4):42–47
Tatbul N et al (2003) Load shedding in a data stream manager. In: Proceedings of VLDB, pp 309–320
The STREAM Group (2003) STREAM: the Stanford stream data manager. IEEE data engineering bulletin, March 2003
Valls MG, Lopez IR, Villar LF (2013) iLAND: an enhanced middleware for real-time reconfiguration of service oriented distributed real-time systems. IEEE Trans Ind Inform 9(1):228–236
Wei Y et al (2007) QoS management of real-time data stream queries in distributed environments. In: IEEE international symposium on object-oriented real-time distributed
Wei Y, Son SH, Stankovic JA (2006a) RTSTREAM: real-time query processing for data streams. In: 9th IEEE international symposium on object/component/service-oriented real-time distributed computing, pp 141–150
Wei Y, Prasad V, Son SH, Stankovic J (2006b) Prediction-based QoS management for real-time data stream. In: Proceedings of IEEE real-time systems symposium (RTSS’06), December
Yang, H et al (2007) Map-reduce-merge: simplified relational data processing on large clusters. In: Proceedings of the 2007 ACM SIGMOD international conference on management of data. ACM
Yang Q, Koutsopoulos HN (1996) A microscopic traffic simulator for evaluation of dynamic traffic management systems. Transp Res C 4(3):113–129
Author information
Authors and Affiliations
Corresponding author
Appendix: performance evaluation for different configuration of the contributed system
Appendix: performance evaluation for different configuration of the contributed system
Experiment 5
measuring performance parameters for the contributed system with different configuration and components.
A stated in Sect. 4.2—Experimental results, by time-varing vlaues charts (Figs. 8, 9, 10, 11, 12), it may be hard to judge about the performance of the compared alternatives. So, the average value of each parameter for each of the compared configurations and systems, are computed and represented in Figs. 13, 14, 15, and 16. As a complementary experiment, what is the effect of feedback control mechanism used in the RT-SDB is issued; e.g., how is the systems performance while eminitin feedback control mechanism. In order to evaluate the contributed system in such other configurations, parameters are measured in the case that the deadline monitor unit, the admision control unit, and also both of them are emmited (Fig. 20a–c, respectively), and the closed-loop control becomes open-loop.
Rights and permissions
About this article
Cite this article
Safaei, A.A. Real-time processing of streaming big data. Real-Time Syst 53, 1–44 (2017). https://doi.org/10.1007/s11241-016-9257-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11241-016-9257-0