Toward optimal operator parallelism for stream processing topology with limited buffers

Li, Wenhao; Zhang, Zhan; Shu, Yanjun; Liu, Hongwei; Liu, Tianming

doi:10.1007/s11227-022-04376-9

Toward optimal operator parallelism for stream processing topology with limited buffers

Published: 16 March 2022

Volume 78, pages 13276–13297, (2022)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Wenhao Li¹,
Zhan Zhang¹,
Yanjun Shu¹,
Hongwei Liu ORCID: orcid.org/0000-0002-9215-7173¹ &
…
Tianming Liu¹

283 Accesses
2 Citations
Explore all metrics

Abstract

Stream processing is an emerging in-memory computing paradigm to handle massive amounts of real-time data. It is vital to have a mechanism to propose proper parallelism for the operators to handle streaming data efficiently. Previous research has mostly focused on parallelism optimization with infinite buffers; however, the topology’s quality of service is severely affected by network buffers. Thus, in this paper, we introduce an extended queueing network to model the relationship between the parallelism and tuple’s average sojourn time with limited buffers. Based on this model, we also propose greedy algorithms to calculate the optimal parallelism for both the minimum latency and maximum throughput with resource constraints. To fairly evaluate the performance of different models, a random parameter generator for the streaming topology is presented. Experiments show that the extended queuing model may properly forecast performance. Compared to the state-of-the-art method, the proposed algorithms reduce the median total sojourn time by 3.74 times and increase the average maximum sustainable throughput by 1.69 times.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Lightweight Elastic Queue Middleware for Distributed Streaming Pipeline

Load-Aware Shedding in Stream Processing Systems

Rafiki: Task-Level Capacity Planning in Distributed Stream Processing Systems

References

Iqbal MH, Soomro TR (2015) Big data analysis: apache storm perspective. Int J Comput Trends Technol 19:9–14
Article Google Scholar
Carbone P, Katsifodimos A, Ewen S, Markl V, Haridi S, Tzoumas K (2015) Apache Flink: Stream and Batch Processing in a Single Engine. Bull IEEE Comput Soc Tech Committee Data Eng 36(4):28–38
Google Scholar
Liu X (2018) Robust resource management in distributed stream processing systems. Doctoral dissertation
Cervino J, Kalyvianaki E, Salvachua J, Pietzuch P (2012) Adaptive provisioning of stream processing systems in the cloud. In: 2012 IEEE 28th International Conference on Data Engineering Workshops. IEEE, pp 295–301
Lohrmann B, Warneke D, Kao O (2012) Massively-parallel stream processing under QoS constraints with nephele. In: Proceedings of the 21st international symposium on high-performance parallel and distributed computing, pp 271–282
Wilmanns PS, Geuns SJ, Hausmans JP, Bekooij MJ (2015) Buffer sizing to reduce interference and increase throughput of real-time stream processing applications. In: 2015 IEEE 18th international symposium on real-time distributed computing. IEEE, pp 9–18
Mudassar M, Zhai Y, Liao L (2019) Efficient state management for scaling out stateful operators in stream processing systems. Big data 7(3):192–206
Article Google Scholar
Gulisano V, Jimenez-Peris R, Patino-Martinez M, Soriente C, Valduriez P (2012) Streamcloud: an elastic and scalable data streaming system. IEEE Trans Parallel Distrib Syst 23(12):2351–2365
Article Google Scholar
Lombardi F, Aniello L, Bonomi S, Querzoni L (2017) Elastic symbiotic scaling of operators and resources in stream processing systems. IEEE Trans Parallel Distrib Syst 29(3):572–585
Article Google Scholar
Kombi RK, Lumineau N, Lamarre P (2017) A preventive auto-parallelization approach for elastic stream processing. In: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). IEEE, pp 1532–1542
Marangozova-Martin V, De Palma N, El Rheddane A (2019) Multi-level elasticity for data stream processing. IEEE Trans Parallel Distrib Syst 30(10):2326–2337
Article Google Scholar
Sahni J, Vidyarthi DP (2021) Heterogeneity-aware elastic scaling of streaming applications on cloud platforms. J Supercomput 1–28
Kahveci B, Gedik B (2020) Joker: elastic stream processing with organic adaptation. J Parallel Distrib Comput 137:205–223
Article Google Scholar
Gedik B, Schneider S, Hirzel M, Wu KL (2013) Elastic scaling for data stream processing. IEEE Trans Parallel Distrib Syst 25(6):1447–1463
Article Google Scholar
Floratou A, Agrawal A, Graham B, Rao S, Ramasamy K (2017) Dhalion: self-regulating stream processing in heron. Proc VLDB Endow 10(12):1825–1836
Article Google Scholar
Xu L, Peng B, Gupta I (2016) Stela: enabling stream processing systems to scale-in and scale-out on-demand. In: 2016 IEEE International Conference on Cloud Engineering (IC2E). IEEE, pp 22–31
Zacheilas N, Kalogeraki V, Zygouras N, Panagiotou N, Gunopulos D (2015) Elastic complex event processing exploiting prediction. In: 2015 IEEE International Conference on Big Data (Big Data). IEEE, pp 213–222
Wang C, Meng X, Guo Q, Weng Z, Yang C (2017) Automating characterization deployment in distributed data stream management systems. IEEE Trans Knowl Data Eng 29(12):2669–2681
Article Google Scholar
Yang Y, Zhao L, Li Z, Nie L, Chen P, Li K (2019) ElaX: provisioning resource elastically for containerized online cloud services. In: 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). IEEE, pp 1987–1994
Foroni D, Axenie C, Bortoli S, Al Hajj Hassan M, Acker R, Tudoran R, Velegrakis Y (2018) Moira: a goal-oriented incremental machine learning approach to dynamic resource cost estimation in distributed stream processing systems. In: Proceedings of the international workshop on real-time business intelligence and analytics, pp 1–10
Lombardi F, Muti A, Aniello L, Baldoni R, Bonomi S, Querzoni L (2019) PASCAL: an architecture for proactive auto-scaling of distributed services. Futur Gener Comput Syst 98:342–361
Article Google Scholar
Liu N, Li Z, Xu J, Xu Z, Lin S, Qiu Q, Wang Y (2017) A hierarchical framework of cloud resource allocation and power management using deep reinforcement learning. In: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). IEEE, pp 372–382
Cardellini V, Presti FL, Nardelli M, Russo GR (2017) Auto-scaling in data stream processing applications: a model-based reinforcement learning approach. In: Workshop on new frontiers in quantitative methods in informatics. Springer, Cham, pp 97–110
Rossi F, Nardelli M, Cardellini V (2019) Horizontal and vertical scaling of container-based applications using reinforcement learning. In: 2019 IEEE 12th International Conference on Cloud Computing (CLOUD). IEEE, pp 329–338
De Matteis T, Mencagli G (2016) Keep calm and react with foresight: strategies for low-latency and energy-efficient elastic data stream processing. ACM SIGPLAN Not 51(8):1–12
Article Google Scholar
Farahabady MRH, Zomaya AY, Tari Z (2017) QoS-and contention-aware resource provisioning in a stream processing engine. In 2017 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, pp 137–146
Wei X, Li L, Li X, Wang X, Gao S, Li H (2019) Pec: Proactive elastic collaborative resource scheduling in data stream processing. IEEE Trans Parallel Distrib Syst 30(7):1628–1642
Article Google Scholar
Nasir MAU, Morales GDF, Garcia-Soriano D, Kourtellis N, Serafini M (2015) The power of both choices: practical load balancing for distributed stream processing engines. In: 2015 IEEE 31st International Conference on Data Engineering. IEEE, pp 137–148
Calzarossa MC, Massari L, Tessera D (2016) Workload characterization: a survey revisited. ACM Comput Surv 48(3):1–43
Article Google Scholar
Kerbache L, Smith JM (1988) Asymptotic behavior of the expansion method for open finite queueing networks. Comput Oper Res 15(2):157–169
Article Google Scholar
Bhat UN (2015) An introduction to queueing theory: modeling and analysis in applications. Birkhäuser, Basel
Book Google Scholar
Labetoulle J, Pujolle G (1980) Isolation method in a network of queues. IEEE Trans Softw Eng 4:373–381
Article Google Scholar
Grassmann WK (1977) Transient solutions in Markovian queueing systems. Comput Oper Res 4(1):47–53
Article Google Scholar
Bitran GR, Morabito R (1996) State-of-the-art survey: open queueing networks: optimization and performance evaluation models for discrete manufacturing systems. Prod Oper Manag 5(2):163–193
Article Google Scholar
Liu X, Dastjerdi AV, Calheiros RN, Qu C, Buyya R (2017) A stepwise auto-profiling method for performance optimization of streaming applications. ACM Trans Auton Adapt Syst (TAAS) 12(4):1–33
Google Scholar
Fu TZ, Ding J, Ma RT, Winslett M, Yang Y, Zhang Z (2017) DRS: auto-scaling for real-time stream analytics. IEEE/ACM Trans Netw 25(6):3338–3352
Article Google Scholar
Chu Z, Yu J, Hamdull A (2020) Maximum sustainable throughput evaluation using an adaptive method for stream processing platforms. IEEE Access 8:40977–40988
Article Google Scholar
Röger H, Mayer R (2019) A comprehensive survey on parallelization and elasticity in stream processing. ACM Comput Surv (CSUR) 52(2):1–37
Article Google Scholar
Agnihotri P (2021) Autonomous resource management in distributed stream processing systems. In: Proceedings of the 22nd International Middleware Conference: Doctoral Symposium, pp 19–22

Download references

Funding

This work was supported by the National Natural Science Foundation of China (NSFC) under Grant 62171155.

Author information

Authors and Affiliations

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
Wenhao Li, Zhan Zhang, Yanjun Shu, Hongwei Liu & Tianming Liu

Authors

Wenhao Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yanjun Shu
View author publications
You can also search for this author in PubMed Google Scholar
Hongwei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Tianming Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongwei Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, W., Zhang, Z., Shu, Y. et al. Toward optimal operator parallelism for stream processing topology with limited buffers. J Supercomput 78, 13276–13297 (2022). https://doi.org/10.1007/s11227-022-04376-9

Download citation

Accepted: 15 February 2022
Published: 16 March 2022
Issue Date: July 2022
DOI: https://doi.org/10.1007/s11227-022-04376-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Toward optimal operator parallelism for stream processing topology with limited buffers

Abstract

Access this article

Similar content being viewed by others

A Lightweight Elastic Queue Middleware for Distributed Streaming Pipeline

Load-Aware Shedding in Stream Processing Systems

Rafiki: Task-Level Capacity Planning in Distributed Stream Processing Systems

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Toward optimal operator parallelism for stream processing topology with limited buffers

Abstract

Access this article

Similar content being viewed by others

A Lightweight Elastic Queue Middleware for Distributed Streaming Pipeline

Load-Aware Shedding in Stream Processing Systems

Rafiki: Task-Level Capacity Planning in Distributed Stream Processing Systems

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation