Abstract
Stream processing is an emerging in-memory computing paradigm to handle massive amounts of real-time data. It is vital to have a mechanism to propose proper parallelism for the operators to handle streaming data efficiently. Previous research has mostly focused on parallelism optimization with infinite buffers; however, the topology’s quality of service is severely affected by network buffers. Thus, in this paper, we introduce an extended queueing network to model the relationship between the parallelism and tuple’s average sojourn time with limited buffers. Based on this model, we also propose greedy algorithms to calculate the optimal parallelism for both the minimum latency and maximum throughput with resource constraints. To fairly evaluate the performance of different models, a random parameter generator for the streaming topology is presented. Experiments show that the extended queuing model may properly forecast performance. Compared to the state-of-the-art method, the proposed algorithms reduce the median total sojourn time by 3.74 times and increase the average maximum sustainable throughput by 1.69 times.








Similar content being viewed by others
References
Iqbal MH, Soomro TR (2015) Big data analysis: apache storm perspective. Int J Comput Trends Technol 19:9–14
Carbone P, Katsifodimos A, Ewen S, Markl V, Haridi S, Tzoumas K (2015) Apache Flink: Stream and Batch Processing in a Single Engine. Bull IEEE Comput Soc Tech Committee Data Eng 36(4):28–38
Liu X (2018) Robust resource management in distributed stream processing systems. Doctoral dissertation
Cervino J, Kalyvianaki E, Salvachua J, Pietzuch P (2012) Adaptive provisioning of stream processing systems in the cloud. In: 2012 IEEE 28th International Conference on Data Engineering Workshops. IEEE, pp 295–301
Lohrmann B, Warneke D, Kao O (2012) Massively-parallel stream processing under QoS constraints with nephele. In: Proceedings of the 21st international symposium on high-performance parallel and distributed computing, pp 271–282
Wilmanns PS, Geuns SJ, Hausmans JP, Bekooij MJ (2015) Buffer sizing to reduce interference and increase throughput of real-time stream processing applications. In: 2015 IEEE 18th international symposium on real-time distributed computing. IEEE, pp 9–18
Mudassar M, Zhai Y, Liao L (2019) Efficient state management for scaling out stateful operators in stream processing systems. Big data 7(3):192–206
Gulisano V, Jimenez-Peris R, Patino-Martinez M, Soriente C, Valduriez P (2012) Streamcloud: an elastic and scalable data streaming system. IEEE Trans Parallel Distrib Syst 23(12):2351–2365
Lombardi F, Aniello L, Bonomi S, Querzoni L (2017) Elastic symbiotic scaling of operators and resources in stream processing systems. IEEE Trans Parallel Distrib Syst 29(3):572–585
Kombi RK, Lumineau N, Lamarre P (2017) A preventive auto-parallelization approach for elastic stream processing. In: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). IEEE, pp 1532–1542
Marangozova-Martin V, De Palma N, El Rheddane A (2019) Multi-level elasticity for data stream processing. IEEE Trans Parallel Distrib Syst 30(10):2326–2337
Sahni J, Vidyarthi DP (2021) Heterogeneity-aware elastic scaling of streaming applications on cloud platforms. J Supercomput 1–28
Kahveci B, Gedik B (2020) Joker: elastic stream processing with organic adaptation. J Parallel Distrib Comput 137:205–223
Gedik B, Schneider S, Hirzel M, Wu KL (2013) Elastic scaling for data stream processing. IEEE Trans Parallel Distrib Syst 25(6):1447–1463
Floratou A, Agrawal A, Graham B, Rao S, Ramasamy K (2017) Dhalion: self-regulating stream processing in heron. Proc VLDB Endow 10(12):1825–1836
Xu L, Peng B, Gupta I (2016) Stela: enabling stream processing systems to scale-in and scale-out on-demand. In: 2016 IEEE International Conference on Cloud Engineering (IC2E). IEEE, pp 22–31
Zacheilas N, Kalogeraki V, Zygouras N, Panagiotou N, Gunopulos D (2015) Elastic complex event processing exploiting prediction. In: 2015 IEEE International Conference on Big Data (Big Data). IEEE, pp 213–222
Wang C, Meng X, Guo Q, Weng Z, Yang C (2017) Automating characterization deployment in distributed data stream management systems. IEEE Trans Knowl Data Eng 29(12):2669–2681
Yang Y, Zhao L, Li Z, Nie L, Chen P, Li K (2019) ElaX: provisioning resource elastically for containerized online cloud services. In: 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). IEEE, pp 1987–1994
Foroni D, Axenie C, Bortoli S, Al Hajj Hassan M, Acker R, Tudoran R, Velegrakis Y (2018) Moira: a goal-oriented incremental machine learning approach to dynamic resource cost estimation in distributed stream processing systems. In: Proceedings of the international workshop on real-time business intelligence and analytics, pp 1–10
Lombardi F, Muti A, Aniello L, Baldoni R, Bonomi S, Querzoni L (2019) PASCAL: an architecture for proactive auto-scaling of distributed services. Futur Gener Comput Syst 98:342–361
Liu N, Li Z, Xu J, Xu Z, Lin S, Qiu Q, Wang Y (2017) A hierarchical framework of cloud resource allocation and power management using deep reinforcement learning. In: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). IEEE, pp 372–382
Cardellini V, Presti FL, Nardelli M, Russo GR (2017) Auto-scaling in data stream processing applications: a model-based reinforcement learning approach. In: Workshop on new frontiers in quantitative methods in informatics. Springer, Cham, pp 97–110
Rossi F, Nardelli M, Cardellini V (2019) Horizontal and vertical scaling of container-based applications using reinforcement learning. In: 2019 IEEE 12th International Conference on Cloud Computing (CLOUD). IEEE, pp 329–338
De Matteis T, Mencagli G (2016) Keep calm and react with foresight: strategies for low-latency and energy-efficient elastic data stream processing. ACM SIGPLAN Not 51(8):1–12
Farahabady MRH, Zomaya AY, Tari Z (2017) QoS-and contention-aware resource provisioning in a stream processing engine. In 2017 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, pp 137–146
Wei X, Li L, Li X, Wang X, Gao S, Li H (2019) Pec: Proactive elastic collaborative resource scheduling in data stream processing. IEEE Trans Parallel Distrib Syst 30(7):1628–1642
Nasir MAU, Morales GDF, Garcia-Soriano D, Kourtellis N, Serafini M (2015) The power of both choices: practical load balancing for distributed stream processing engines. In: 2015 IEEE 31st International Conference on Data Engineering. IEEE, pp 137–148
Calzarossa MC, Massari L, Tessera D (2016) Workload characterization: a survey revisited. ACM Comput Surv 48(3):1–43
Kerbache L, Smith JM (1988) Asymptotic behavior of the expansion method for open finite queueing networks. Comput Oper Res 15(2):157–169
Bhat UN (2015) An introduction to queueing theory: modeling and analysis in applications. Birkhäuser, Basel
Labetoulle J, Pujolle G (1980) Isolation method in a network of queues. IEEE Trans Softw Eng 4:373–381
Grassmann WK (1977) Transient solutions in Markovian queueing systems. Comput Oper Res 4(1):47–53
Bitran GR, Morabito R (1996) State-of-the-art survey: open queueing networks: optimization and performance evaluation models for discrete manufacturing systems. Prod Oper Manag 5(2):163–193
Liu X, Dastjerdi AV, Calheiros RN, Qu C, Buyya R (2017) A stepwise auto-profiling method for performance optimization of streaming applications. ACM Trans Auton Adapt Syst (TAAS) 12(4):1–33
Fu TZ, Ding J, Ma RT, Winslett M, Yang Y, Zhang Z (2017) DRS: auto-scaling for real-time stream analytics. IEEE/ACM Trans Netw 25(6):3338–3352
Chu Z, Yu J, Hamdull A (2020) Maximum sustainable throughput evaluation using an adaptive method for stream processing platforms. IEEE Access 8:40977–40988
Röger H, Mayer R (2019) A comprehensive survey on parallelization and elasticity in stream processing. ACM Comput Surv (CSUR) 52(2):1–37
Agnihotri P (2021) Autonomous resource management in distributed stream processing systems. In: Proceedings of the 22nd International Middleware Conference: Doctoral Symposium, pp 19–22
Funding
This work was supported by the National Natural Science Foundation of China (NSFC) under Grant 62171155.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, W., Zhang, Z., Shu, Y. et al. Toward optimal operator parallelism for stream processing topology with limited buffers. J Supercomput 78, 13276–13297 (2022). https://doi.org/10.1007/s11227-022-04376-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-022-04376-9