Skip to main content
Log in

Toward optimal operator parallelism for stream processing topology with limited buffers

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Stream processing is an emerging in-memory computing paradigm to handle massive amounts of real-time data. It is vital to have a mechanism to propose proper parallelism for the operators to handle streaming data efficiently. Previous research has mostly focused on parallelism optimization with infinite buffers; however, the topology’s quality of service is severely affected by network buffers. Thus, in this paper, we introduce an extended queueing network to model the relationship between the parallelism and tuple’s average sojourn time with limited buffers. Based on this model, we also propose greedy algorithms to calculate the optimal parallelism for both the minimum latency and maximum throughput with resource constraints. To fairly evaluate the performance of different models, a random parameter generator for the streaming topology is presented. Experiments show that the extended queuing model may properly forecast performance. Compared to the state-of-the-art method, the proposed algorithms reduce the median total sojourn time by 3.74 times and increase the average maximum sustainable throughput by 1.69 times.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Iqbal MH, Soomro TR (2015) Big data analysis: apache storm perspective. Int J Comput Trends Technol 19:9–14

    Article  Google Scholar 

  2. Carbone P, Katsifodimos A, Ewen S, Markl V, Haridi S, Tzoumas K (2015) Apache Flink: Stream and Batch Processing in a Single Engine. Bull IEEE Comput Soc Tech Committee Data Eng 36(4):28–38

    Google Scholar 

  3. Liu X (2018) Robust resource management in distributed stream processing systems. Doctoral dissertation

  4. Cervino J, Kalyvianaki E, Salvachua J, Pietzuch P (2012) Adaptive provisioning of stream processing systems in the cloud. In: 2012 IEEE 28th International Conference on Data Engineering Workshops. IEEE, pp 295–301

  5. Lohrmann B, Warneke D, Kao O (2012) Massively-parallel stream processing under QoS constraints with nephele. In: Proceedings of the 21st international symposium on high-performance parallel and distributed computing, pp 271–282

  6. Wilmanns PS, Geuns SJ, Hausmans JP, Bekooij MJ (2015) Buffer sizing to reduce interference and increase throughput of real-time stream processing applications. In: 2015 IEEE 18th international symposium on real-time distributed computing. IEEE, pp 9–18

  7. Mudassar M, Zhai Y, Liao L (2019) Efficient state management for scaling out stateful operators in stream processing systems. Big data 7(3):192–206

    Article  Google Scholar 

  8. Gulisano V, Jimenez-Peris R, Patino-Martinez M, Soriente C, Valduriez P (2012) Streamcloud: an elastic and scalable data streaming system. IEEE Trans Parallel Distrib Syst 23(12):2351–2365

    Article  Google Scholar 

  9. Lombardi F, Aniello L, Bonomi S, Querzoni L (2017) Elastic symbiotic scaling of operators and resources in stream processing systems. IEEE Trans Parallel Distrib Syst 29(3):572–585

    Article  Google Scholar 

  10. Kombi RK, Lumineau N, Lamarre P (2017) A preventive auto-parallelization approach for elastic stream processing. In: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). IEEE, pp 1532–1542

  11. Marangozova-Martin V, De Palma N, El Rheddane A (2019) Multi-level elasticity for data stream processing. IEEE Trans Parallel Distrib Syst 30(10):2326–2337

    Article  Google Scholar 

  12. Sahni J, Vidyarthi DP (2021) Heterogeneity-aware elastic scaling of streaming applications on cloud platforms. J Supercomput 1–28

  13. Kahveci B, Gedik B (2020) Joker: elastic stream processing with organic adaptation. J Parallel Distrib Comput 137:205–223

    Article  Google Scholar 

  14. Gedik B, Schneider S, Hirzel M, Wu KL (2013) Elastic scaling for data stream processing. IEEE Trans Parallel Distrib Syst 25(6):1447–1463

    Article  Google Scholar 

  15. Floratou A, Agrawal A, Graham B, Rao S, Ramasamy K (2017) Dhalion: self-regulating stream processing in heron. Proc VLDB Endow 10(12):1825–1836

    Article  Google Scholar 

  16. Xu L, Peng B, Gupta I (2016) Stela: enabling stream processing systems to scale-in and scale-out on-demand. In: 2016 IEEE International Conference on Cloud Engineering (IC2E). IEEE, pp 22–31

  17. Zacheilas N, Kalogeraki V, Zygouras N, Panagiotou N, Gunopulos D (2015) Elastic complex event processing exploiting prediction. In: 2015 IEEE International Conference on Big Data (Big Data). IEEE, pp 213–222

  18. Wang C, Meng X, Guo Q, Weng Z, Yang C (2017) Automating characterization deployment in distributed data stream management systems. IEEE Trans Knowl Data Eng 29(12):2669–2681

    Article  Google Scholar 

  19. Yang Y, Zhao L, Li Z, Nie L, Chen P, Li K (2019) ElaX: provisioning resource elastically for containerized online cloud services. In: 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). IEEE, pp 1987–1994

  20. Foroni D, Axenie C, Bortoli S, Al Hajj Hassan M, Acker R, Tudoran R, Velegrakis Y (2018) Moira: a goal-oriented incremental machine learning approach to dynamic resource cost estimation in distributed stream processing systems. In: Proceedings of the international workshop on real-time business intelligence and analytics, pp 1–10

  21. Lombardi F, Muti A, Aniello L, Baldoni R, Bonomi S, Querzoni L (2019) PASCAL: an architecture for proactive auto-scaling of distributed services. Futur Gener Comput Syst 98:342–361

    Article  Google Scholar 

  22. Liu N, Li Z, Xu J, Xu Z, Lin S, Qiu Q, Wang Y (2017) A hierarchical framework of cloud resource allocation and power management using deep reinforcement learning. In: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). IEEE, pp 372–382

  23. Cardellini V, Presti FL, Nardelli M, Russo GR (2017) Auto-scaling in data stream processing applications: a model-based reinforcement learning approach. In: Workshop on new frontiers in quantitative methods in informatics. Springer, Cham, pp 97–110

  24. Rossi F, Nardelli M, Cardellini V (2019) Horizontal and vertical scaling of container-based applications using reinforcement learning. In: 2019 IEEE 12th International Conference on Cloud Computing (CLOUD). IEEE, pp 329–338

  25. De Matteis T, Mencagli G (2016) Keep calm and react with foresight: strategies for low-latency and energy-efficient elastic data stream processing. ACM SIGPLAN Not 51(8):1–12

    Article  Google Scholar 

  26. Farahabady MRH, Zomaya AY, Tari Z (2017) QoS-and contention-aware resource provisioning in a stream processing engine. In 2017 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, pp 137–146

  27. Wei X, Li L, Li X, Wang X, Gao S, Li H (2019) Pec: Proactive elastic collaborative resource scheduling in data stream processing. IEEE Trans Parallel Distrib Syst 30(7):1628–1642

    Article  Google Scholar 

  28. Nasir MAU, Morales GDF, Garcia-Soriano D, Kourtellis N, Serafini M (2015) The power of both choices: practical load balancing for distributed stream processing engines. In: 2015 IEEE 31st International Conference on Data Engineering. IEEE, pp 137–148

  29. Calzarossa MC, Massari L, Tessera D (2016) Workload characterization: a survey revisited. ACM Comput Surv 48(3):1–43

    Article  Google Scholar 

  30. Kerbache L, Smith JM (1988) Asymptotic behavior of the expansion method for open finite queueing networks. Comput Oper Res 15(2):157–169

    Article  Google Scholar 

  31. Bhat UN (2015) An introduction to queueing theory: modeling and analysis in applications. Birkhäuser, Basel

    Book  Google Scholar 

  32. Labetoulle J, Pujolle G (1980) Isolation method in a network of queues. IEEE Trans Softw Eng 4:373–381

    Article  Google Scholar 

  33. Grassmann WK (1977) Transient solutions in Markovian queueing systems. Comput Oper Res 4(1):47–53

    Article  Google Scholar 

  34. Bitran GR, Morabito R (1996) State-of-the-art survey: open queueing networks: optimization and performance evaluation models for discrete manufacturing systems. Prod Oper Manag 5(2):163–193

    Article  Google Scholar 

  35. Liu X, Dastjerdi AV, Calheiros RN, Qu C, Buyya R (2017) A stepwise auto-profiling method for performance optimization of streaming applications. ACM Trans Auton Adapt Syst (TAAS) 12(4):1–33

    Google Scholar 

  36. Fu TZ, Ding J, Ma RT, Winslett M, Yang Y, Zhang Z (2017) DRS: auto-scaling for real-time stream analytics. IEEE/ACM Trans Netw 25(6):3338–3352

    Article  Google Scholar 

  37. Chu Z, Yu J, Hamdull A (2020) Maximum sustainable throughput evaluation using an adaptive method for stream processing platforms. IEEE Access 8:40977–40988

    Article  Google Scholar 

  38. Röger H, Mayer R (2019) A comprehensive survey on parallelization and elasticity in stream processing. ACM Comput Surv (CSUR) 52(2):1–37

    Article  Google Scholar 

  39. Agnihotri P (2021) Autonomous resource management in distributed stream processing systems. In: Proceedings of the 22nd International Middleware Conference: Doctoral Symposium, pp 19–22

Download references

Funding

This work was supported by the National Natural Science Foundation of China (NSFC) under Grant 62171155.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongwei Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, W., Zhang, Z., Shu, Y. et al. Toward optimal operator parallelism for stream processing topology with limited buffers. J Supercomput 78, 13276–13297 (2022). https://doi.org/10.1007/s11227-022-04376-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-022-04376-9

Keywords

Navigation