Skip to main content
Log in

Real-time processing of streaming big data

  • Published:
Real-Time Systems Aims and scope Submit manuscript

Abstract

In the era of data explosion, high volume of various data is generated rapidly at each moment of time; and if not processed, the profits of their latent information would be missed. This is the main current challenge of most enterprises and Internet mega-companies (also known as the big data problem). Big data is composed of three dimensions: Volume, Variety, and Velocity. The velocity refers to the high speed, both in data arrival rate (e.g., streaming data) and in data processing (i.e., real-time processing). In this paper, the velocity dimension of big data is concerned; so, real-time processing of streaming big data is addressed in detail. For each real-time system, to be fast is inevitable and a necessary condition (although it is not sufficient and some other concerns e.g., real-time scheduling must be issued, too). Fast processing is achieved by parallelism via the proposed deadline-aware dispatching method. For the other prerequisite of real-time processing (i.e., real-time scheduling of the tasks), a hybrid clustering multiprocessor real-time scheduling algorithm is proposed in which both the partitioning and global real-time scheduling approaches are employed to have better schedulablity and resource utilization, with a tolerable overhead. The other components required for real-time processing of streaming big data are also designed and proposed as real time streaming big data (RT-SBD) processing engine. Its prototype is implemented and experimentally evaluated and compared with the Storm, a well-known real-time streaming big data processing engine. Experimental results show that the proposed RT-SBD significantly outperforms the Storm engine in terms of proportional deadline miss ratio, tuple latency and system throughput.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Notes

  1. Earliest deadline first.

  2. Rate monotonic.

  3. Real time-streaming big data.

  4. For each edge (a,b) from an operator to its immediate subsequent in the graph, the weight is processing time of all operators running on the destination operator’s machine (i.e., aggregation of number of tuples waiting in input queue of each operator multiplied by the corresponding operator’s execution time).

  5. An algorithm is said to be work conserving if it does not idle any processor when one or more jobs are pending, and non- work conserving, otherwise.

  6. Data stream management system.

  7. Proportional-integral-derivative.

  8. Proportional-integral.

  9. Single real-time disp.

  10. Quick Real-time Stream processor.

  11. Complex-event processing.

References

  • Abadi D et al (2003) Aurora: a new model and architecture for data stream management. VLDB J 12(2):120–139

    Article  Google Scholar 

  • Alemi M, Safaei AA, Hagjhoo MS, Abdi F (2011) PDMRTS: multiprocessor real-time scheduling considering process distribution in data stream management system. In: International conference on digital information and communication technology and its applications, pp 166–179

  • Anderson J, Srinivasan A (2000) Early release fair scheduling. In: Proceedings of the euromicro conference on real-time systems. IEEE Computer Society Press, Stockholm, pp 35–43

  • Anderson J, Srinivasan A (2004) Mixed Pfair/ERfair scheduling of asynchronous periodic tasks. J Comput Syst Sci 68(1):157–204

    Article  MathSciNet  MATH  Google Scholar 

  • Andersson B, Jonsson J (2003) The utilization bounds of partitioned and pfair static-priority scheduling on multiprocessors are 50 percent. In: 15th euromicro conference on real-time systems (ECRTS’03), Porto, Portugal, 02–04 July

  • Andersson B, Tovar E (2006) Multiprocessor scheduling with few preemptions. In: Proceedings of the international conference on embedded and real-time computing systems and applications (RTCSA)

  • Arasu A et al (2004) Linear road: a stream data management benchmark. In: Proceedings of the thirtieth international conference on very large data bases, vol 30. VLDB Endowment

  • Åsberg M et al (2012) Exsched: an external cpu scheduler framework for real-time systems. In: 2012 IEEE 18th international conference on embedded and real-time computing systems and applications (RTCSA). IEEE

  • Astrom KJ, Hagglund TH (1995) New tuning methods for PID controllers. In: Proceedings of the 3rd European control conference

  • Babcock B et al (2003) Models and issues in data stream systems. In: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems

  • Babcock B et al (2004) Load shedding for aggregation queries over data streams. In: International conference on data engineering (ICDE)

  • Babcock B et al (2004) Operator scheduling in data stream systems. VLDB J 13(4):333–353

    Article  Google Scholar 

  • Bans JM, Arenas A, Labarta J (2002) Efficient scheme to allocate soft-aperiodic tasks in multiprocessor hard real-time systems. In: PDPTA, pp 809–815

  • Baruah N et al (1996) Proportionate progress: a notion of fairness in resource allocation. Algorithmica 15:600–625

    Article  MathSciNet  MATH  Google Scholar 

  • Baruah S, Gehrke J, Plaxton C (1995) Fast scheduling of periodic tasks on multiple resources. In: Proceedings of the 9th international parallel processing symposium, April 1995, pp 280–288

  • Beloglazov A, Abawajy J, Buyya R (2012) Energy-aware allocation heuristics for efficient management of data centers for cloud computing. Future Gener Comput Syst 28:755–768

    Article  Google Scholar 

  • Bestavros A, Nagy S (1996) An admission control paradigm for real-time databases. Technical Report BUCS-TR-96-902, Computer Science Department, Boston University, Boston

  • Bestavros A, Nagy S (1996) Value-cognizant admission control for RTDB systems. In: IEEE 16th real-time systems symposium, December 1996

  • Block A et al (2008) An adaptive framework for multiprocessor real-time system. In: Euromicro conference on real-time systems (ECRTS’08). IEEE

  • Bollella G, James G (2000) The real-time specification for Java. Computer 33(6):47–54

    Article  Google Scholar 

  • Bu Y et al (2010) HaLoop: efficient iterative data processing on large clusters. Proc VLDB Endow 3(1–2):285–296

    Article  Google Scholar 

  • Carpenter J et al (2004) A categorization of real-time multiprocessor scheduling problems and algorithms. In: Handbook on scheduling: algorithms, models and performance analysis

  • Chen Philip CL, Zhang C-Y (2014) Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf Sci 275:314–347

    Article  Google Scholar 

  • Condie T et al (2010) MapReduce Online.In: NSDI, vol. 10, no. 4

  • Devi UC (2006) Soft real-time scheduling on multiprocessors. Ph.D. Thesis, University of North Carolina at Chapel Hill

  • Dhall S, Liu C (1978) On a real-time scheduling problem. Oper Res 26:127–140

    Article  MathSciNet  MATH  Google Scholar 

  • Elliott GA, Ward BC, Anderson JH (2013) GPUSync: a framework for real-time GPU management. In: 2013 IEEE 34th real-time systems symposium (RTSS). IEEE

  • Golab L (2004) Querying sliding windows over on-line data streams. In: Proceedings of ICDE/EDBT Ph.D. workshop, March, pp 1–10

  • Golatowski F et al (2002) Framework for validation, test and analysis of real-time scheduling algorithms and scheduler implementations. In: Proceedings of the 13th IEEE international workshop on rapid system prototyping, 2002. IEEE

  • Graefe G et al (1994) Extensible query optimization and parallel execution in volcano. Query processing for advanced database systems, Morgan Kafman

  • Holman P, Anderson J (2006) Group-based pfair scheduling. Real Time Syst 32(1–2):125–168

    Article  MATH  Google Scholar 

  • http://www.espertech.com/esper/. Accessed 20 May 2016

  • https://storm.apache.org/. Accessed 20 March 2015

  • Jamin S et al (1993) An admission control algorithm for predictive real-time service. LNCS 712(1993):347–356

    Google Scholar 

  • Johnson T et al (2008) Query-aware partitioning for monitoring massive network data streams. In: SIGMOD

  • Kato S, Yamasaki N (2007) Real-time scheduling with task splitting on multiprocessors. In: Proceedings of the international conference on embedded and real-time computing systems and applications, pp 441–450

  • Kato S, Yamasaki N (2008) Portioned EDF-based scheduling on multiprocessors. In: Proceedings of the international conference on embedded software, pp 139–148

  • Kato S, Yamasaki N (2008) Scheduling aperiodic tasks using total bandwidth server on multiprocessors. EUC, vol 1, pp 82–89

  • Kleiminger W, Kalyvianaki E, Pietzuch P (2011) Balancing load in stream processing with the cloud.In: 2011 IEEE 27th international conference on data engineering workshops (ICDEW). IEEE

  • Kontaki M (2010) Continuous processing of preference queries in data streams. In: 36th international conference on current trends in theory and practice of computer science (SOFSEM)

  • Kramer J (2009) Continuous queries over data streams- semantics and implementation. kra

  • Kulkarni S et al (2015) Twitter heron: stream processing at scale. In: Proceeding of the ACM SIGMOD’15, pp 239–250

  • Kwon J, Cho H, Ravindran B (2012) A framework accommodating categorized multiprocessor real-time scheduling in the RTSJ. In: Proceedings of the 10th international workshop on java technologies for real-time and embedded systems. ACM

  • Lakshmanan K et al (2009) Partitioned fixed-priority preemptive scheduling formulti-core processors. In: Proceedings of the euromicro conference on real-time systems, pp 39–248

  • Lam W et al (2012) Muppet: MapReduce-style processing of fast data. Proc VLDB Endow 5(12):1814–1825

    Article  MathSciNet  Google Scholar 

  • Lehner W, Sattler K-U (2013) Web-scale data management for the cloud. Springer, Berlin

    Book  Google Scholar 

  • Leontyev H (2010) Compositional analysis techniques for multiprocessor soft real-time scheduling. Ph. D. Thesis, University of North Carolina at Chapel Hill

  • Li X, Wang HA (2007) Adaptive real-time query scheduling over data streams. VLDB ’07, 23–28 September, Vienna

  • Lopez J, Garcia M, Diaz J, Garcia D (2000) Worst-case utilization bound for EDF scheduling on real-time multiprocessor systems. In: Proceedings of the 12th euromicro conference on real-time systems, June, pp 25–33

  • Ma L et al (2009) Real-time scheduling for continuous queries with deadlines. SAC’09, Honolulu, HI

  • Marisol G-V, Tommaso C, Chenyang L (2014) Challenges in real-time virtualization and predictable cloud computing. J Syst Architect 60:726–740

    Article  Google Scholar 

  • Mohammadi S (2010) Continuous query response time improvement based on system conditions and stream featuress. M.Sc. Thesis, Iran University of Science and Technology

  • Neumeyer L et al (2010) S4: distributed stream computing platform. 2010 IEEE international conference on data mining workshops (ICDMW). IEEE

  • Regehr J, Stankovic JA (2001) HLS: a framework for composing soft real-time schedulers. In: Proceedings of the 22nd IEEE real-time systems symposium (RTSS 2001). IEEE

  • Safaei AA, Haghjoo MS, Abdi F (2011) PFGN: a hybrid multiprocessor real-time scheduling algorithm for data stream management systems. In: Proceeding of international conference on digital information and communication technology and its applications, pp 180–192

  • Safaei AA, Alemi M, Haghjoo MS, Mohammadi S (2011) Hybrid multiprocessor real-time scheduling approach. Int J Comput Sci Issues 8(2):171

    Google Scholar 

  • Safaei AA, Sharif-Razavian A, Sharifi M, Haghjoo MS (2012) Dynamic routing of data stream tuples among parallel query plan running on multi-core processors. J Distrib Parallel Databases 30(2):145–176. doi:10.1007/s10619-012-7090-6

    Article  Google Scholar 

  • Safaei AA, Haghjoo MS (2010) Parallel processing of continuous queries over data streams. Distrib Parallel Databases 28(2–3):93–118. doi:10.1007/s10619-010-70663

    Article  Google Scholar 

  • Safaei AA, Haghjoo MS (2012) Dispatching of stream operators in parallel execution of continuous queries. J Supercomput 61(3):619–641. doi:10.1007/s11227-011-0621-5

    Article  Google Scholar 

  • Safaei AA, Haghjoo MS (2014) Parallel processing of data streams. J Comput Sci Eng 11(2):11–29

    Google Scholar 

  • Srinivasan A (2003) Effcient and flexible fair scheduling of real-time tasks on multiprocessors. Ph.D. Thesis, University of North Carolina, Chapel Hill

  • Srinivasan A, Anderson JH (2004) Efficient scheduling of soft real-time applications on multiprocessors. J Embed Comput 1(3):1–14

    Google Scholar 

  • Stankovic JA et al (1999) Misconceptions about real-time databases. J Comput 32(6):29–36

    Article  Google Scholar 

  • Stankovic JA, Ramamritham K (1990) What is predictability for real-time systems? Real Time Syst 2(4):247–254

    Article  Google Scholar 

  • Stonebraker M et al (2005) The 8 requirements of real-time stream processing. SIGMOD Rec 34(4):42–47

    Article  Google Scholar 

  • Tatbul N et al (2003) Load shedding in a data stream manager. In: Proceedings of VLDB, pp 309–320

  • The STREAM Group (2003) STREAM: the Stanford stream data manager. IEEE data engineering bulletin, March 2003

  • Valls MG, Lopez IR, Villar LF (2013) iLAND: an enhanced middleware for real-time reconfiguration of service oriented distributed real-time systems. IEEE Trans Ind Inform 9(1):228–236

    Article  Google Scholar 

  • Wei Y et al (2007) QoS management of real-time data stream queries in distributed environments. In: IEEE international symposium on object-oriented real-time distributed

  • Wei Y, Son SH, Stankovic JA (2006a) RTSTREAM: real-time query processing for data streams. In: 9th IEEE international symposium on object/component/service-oriented real-time distributed computing, pp 141–150

  • Wei Y, Prasad V, Son SH, Stankovic J (2006b) Prediction-based QoS management for real-time data stream. In: Proceedings of IEEE real-time systems symposium (RTSS’06), December

  • Yang, H et al (2007) Map-reduce-merge: simplified relational data processing on large clusters. In: Proceedings of the 2007 ACM SIGMOD international conference on management of data. ACM

  • Yang Q, Koutsopoulos HN (1996) A microscopic traffic simulator for evaluation of dynamic traffic management systems. Transp Res C 4(3):113–129

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ali A. Safaei.

Appendix: performance evaluation for different configuration of the contributed system

Appendix: performance evaluation for different configuration of the contributed system

Experiment 5

measuring performance parameters for the contributed system with different configuration and components.

A stated in Sect. 4.2—Experimental results, by time-varing vlaues charts (Figs. 8, 9, 10, 11, 12), it may be hard to judge about the performance of the compared alternatives. So, the average value of each parameter for each of the compared configurations and systems, are computed and represented in Figs. 13, 14, 15, and 16. As a complementary experiment, what is the effect of feedback control mechanism used in the RT-SDB is issued; e.g., how is the systems performance while eminitin feedback control mechanism. In order to evaluate the contributed system in such other configurations, parameters are measured in the case that the deadline monitor unit, the admision control unit, and also both of them are emmited (Fig. 20a–c, respectively), and the closed-loop control becomes open-loop.

Fig. 20
figure 20

Parameters when a the admission control unit, b deadline monitor unit, and c both of them in RT-SBD are emitted. a Parameters when the deadline monitor unit of RT-SBD is emitted, b parameters when the admission control unit of RT-SBD is emitted, c parameters when both (the admission control and deadline monitor units) of RT-SBD are emitted

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Safaei, A.A. Real-time processing of streaming big data. Real-Time Syst 53, 1–44 (2017). https://doi.org/10.1007/s11241-016-9257-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11241-016-9257-0

Keywords

Navigation