Abstract
Existing Data Stream Processing (DSP) systems perform poorly while encountering heavy workloads, particularly on clustered set of (heterogeneous) computers. Elasticity and changing application parallelism degree can limit the performance degradation in the face of varying workloads that negatively impact the overall application response time. Elasticity can be achieved by operator scaling, i.e., by replication and relocation in operators at runtime. However, scaling decisions at runtime is challenging, since it first increases the overall communication overhead between operators and secondly changes any initial scheduling that could lead to a non-optimal scheduling plan. In this paper, we investigate the problem of elasticity and scaling decisions and propose a DSP system called ER-Storm. To curb communication overhead, we propose a new 3-step mechanism for replication and relocation of operators upon detecting a bottleneck operator that overutilizes a worker node. The other challenge is to select the proper worker nodes to host relocated operators. By discretizing the input workload, we model the relocation of operators between worker nodes at runtime through a scalable Markov Decision Process (MDP) and use a model-free notion of reinforcement learning (Q-Learning) to find optimal solutions. We have implemented our propositions on the Apache Storm version 2.1.0. Our experimental results show that ER-Storm reduces the average topology response time by 20–60 percent based on the rate of input workload (low or high) compared to the R-Storm scheduler and the Online-Scheduler of Storm.
Similar content being viewed by others
References
Lal DK, Suman U (2019) Towards comparison of real time stream processing engines. In: Proceedings of the IEEE Conference on Information and Communication Technology, pp 1–5
Nardelli M, Cardellini V, Grassi V, Presti FL (2019) Efficient operator placement for distributed data stream processing applications. IEEE Trans Parallel Distrib Syst 30(8):1753–1767
Govindarajan K, Kamburugamuve S, Wickramasinghe P, Abeykoon V, Fox G (2017) Task scheduling in big data-review, research challenges, and prospects. In: Proceedings of the Ninth International Conference on Advanced Computing (ICoAC), pp 165–173
Sun D, Gao S, Liu X, Li F, Zheng X, Buyya R (2019) State and runtime-aware scheduling in elastic stream computing systems. Futur Gener Comput Syst 97:194–209
Russo GR, Cardellini V, Presti FL (2019) Reinforcement learning based policies for elastic stream processing on heterogeneous resources. In: Proceedings of the 13th ACM International Conference on Distributed and Event-based Systems, pp 31–42
Schneider S, Hirzel M, Gedik B, and Wu KL (2012) Auto-parallelizing stateful distributed streaming applications. In: Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, pp 53–64
Koliousis A, Weidlich M, Castro Fernandez R, Wolf A L, Costa P, Pietzuch P Saber (2016) Window-based hybrid stream processing for heterogeneous architectures. In: Proceedings of the International Conference on Management of Data, pp 555–569
Heinze T, Roediger L, Meister A, Ji Y, Jarak Z, and Fetzer C (2015) Online parameter optimization for elastic data stream processing, In: Proceedings of the Sixth ACM Symposium on Cloud Computing pp 276–287
Kombi RK, Lumineau N, Lamarre P (2017) A preventive auto-parallelization approach for elastic stream processing. In: Proceedings of the IEEE 37th International Conference on Distributed Computing Systems (ICDCS), IEEE, pp1532–1542
Xu J, Chen Z, Tang J, Su S (2014) T-storm: Traffic-aware online scheduling in Storm. In: Proceedings of the 2014 IEEE 34th International Conference on Distributed Computing Systems, pp 535–544
Peng B, Hosseini M, Hong Z, Farivar R, Campbell R (2015) R-storm resource-aware scheduling in Storm, In: Proceedings of the 16th Annual Middleware Conference, pp 149–161
Cardellini V, Lo Presti F, Nardelli M, Russo Russo G (2018) Optimal operator deployment and replication for elastic distributed data stream processing. Concurr Comput Pract Exp 30(9):e4334
Aniello L, Baldoni R, Querzoni L (2013) Adaptive online scheduling in storm. In: Proceedings of the 7th ACM International Conference on Distributed Event-based Systems, pp 207–218
Lombardi F, Aniello L, Bonomi S, Querzoni L (2018) Elastic symbiotic scaling of operators and resources in stream processing systems. IEEE Trans Parallel Distrib Syst 29(3):572–585
Liu X, Buyya R (2017) D-storm: Dynamic resource-efficient scheduling of stream processing applications. In: Proceedings of the 2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS), pp 485–492
Muhammad A, Aleem M, Islam MA (2021) TOP-storm: a topology-based resource-aware scheduler for stream processing engine. Clust Comput 24(1):417–431
Fu X, Ghaffar T, Davis JC, Lee D (2019) EdgeWise: a better stream processing engine for the edge. In: 2019 USENIX Annual Technical Conference (USENIX ATC 19), pp 929–946
Russo Russo G, Schiazza A, Cardellini V (2021) Elastic pulsar functions for distributed stream processing. In: Companion of the ACM/SPEC International Conference on Performance Engineering, pp 9–16
Liu P, Da Silva D, Hu L (2021) DART: A scalable and adaptive edge stream processing engine. In: 2021 USENIX Annual Technical Conference (USENIX ATC 21)
Heinze T, Pappalardo V, Jerzak Z, Fetzer C (2014) Auto-scaling techniques for elastic data stream processing. In: Proceedings of the IEEE 30th International Conference on Data Engineering Workshops, IEEE, pp 296–302
Cardellini V, Presti FL, Nardelli M, Russo GR (2017) Auto-scaling in data stream processing applications: A model-based reinforcement learning approach. In: Proceedings of the Workshop on New Frontiers in Quantitative Methods in Informatics, pp 97–110
Sun D, He H, Yan H, Gao S, Liu X, Zheng X (2021) Lr-Stream: using latency and resource aware scheduling to improve latency and throughput for streaming applications. Futur Gener Comput Syst 114:243–258
Eskandari L, Mair J, Huang Z, Eyers D (2020) I-Scheduler: iterative scheduling for distributed stream processing systems. Future Gener Comput Syst 17:219–233
Mencagli G, Torquati M, Danelutto M (2018) Elastic-PPQ: a two-level autonomic system for spatial preference query processing over dynamic data streams. Futur Gener Comput Syst 79:862–877
Farrokh M, Hadian H, Sharifi M, Jafari A (2022) SP-ant: An ant colony optimization-based operator scheduler for high performance distributed stream processing on heterogeneous clusters. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2021.116322
Toshniwal A, Taneja S, Shukla A, Ramasamy K, Patel JM, Kulkarni S, Jackson J et al (2014) Storm@ twitter. In: Proceedings of the ACM SIGMOD International Conference on Management of data, pp 147–156
Kulkarni S, Bhagat N, Fu M, Kedigehalli V, Kellogg C, Mittal S, Patel JM, Ramasamy K, Taneja S (2015) Twitter Heron: Stream processing at scale. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp 239–250
Flink.apache.org. Apache Flink: Stateful computations over data streams. [online] Available at: <http://flink.apache.org/> [Accessed 19 Aug 2021]
Spark.apache.org. Apache Spark™ - Unified analytics engine for big data. [online] Available at: <http://spark.apache.org/> [Accessed 19 Aug 2020]
Cardellini V, Nardelli M, Luzi D (2016) Elastic stateful stream processing in storm. In: Proceedings of the International Conference on High Performance Computing & Simulation (HPCS), pp 583–590
Farahabady MRH, Samani HRD, Wang Y, Zomaya AY, Tari Z (2016) A QOS-aware controller for Apache Storm. In: Proceedings of the IEEE 15th International Symposium on Network Computing and Applications (NCA), pp 334–342
Eskandari L, Huang Z, Eyers D (2016) P-Scheduler: Adaptive hierarchical scheduling in Apache Storm. In: Proceedings of the Australasian Computer Science Week Multiconference, pp 1–10
Zookeeper.apache.org. Apache Zookeeper. [online] Available at: <https://zookeeper.apache.org/> [Accessed 19 Aug 2021]
Bilal M, Canini M (2017) Towards automatic parameter tuning of stream processing systems. In: Proceedings of the Symposium on Cloud Computing, pp189–200
Liu S, Weng J, Wang JH, An C, Zhou Y, Wang J (2019) An adaptive online scheme for scheduling and resource enforcement in Storm. IEEE/ACM Trans Netw 27(4):1373–1386
Tantalaki N, Souravlas S, Roumeliotis M (2020) A review on big data real-time stream processing and its scheduling techniques. Int J Parallel Emerg Distrib Syst 35(5):571–601
Howe B, Balazinska M (2012) Beyond MapReduce: New requirements for scalable data processing, data-intensive computing: architectures, algorithms, and applications
Liu X, Dastjerdi AV, Calheiros RN, Qu C, Buyya R (2017) A stepwise auto-profiling method for performance optimization of streaming applications. ACM Trans Autonom Adapt Syst (TAAS) 12(4):1–33
Schneider S, Andrade H, Gedik B, Biem A, Wu KL (2009) Elastic scaling of data parallel operators in stream processing. In: Proceedings of the IEEE International Symposium on Parallel & Distributed Processing, pp 1–12
Kombi RK, Lumineau N, Lamarre P, Rivetti N, Busnel Y (2019) DABS-Storm: a data-aware approach for elastic stream processing. Transactions on large-scale data-and knowledge-centered systems XL. Springer, Berlin, Heidelberg, pp 58–93
De Assuncao MD, da Silva Veith A, Buyya R (2018) Distributed data stream processing and edge computing. In: Proceedings of a survey on resource elasticity and future directions. Journal of Network and Computer Applications. vol 103, pp 1–17
Liu X, Buyya R (2017) Performance-oriented deployment of streaming applications on cloud. IEEE Trans Big Data 5(1):46–59
Fukunaga AS, Korf RE (2005) Bin-completion algorithms for multi-container packing and covering problems. In: Proceedings of the IJCAI International Joint Conference on Artificial Intelligence, vol 28, pp 117–124
Dai Y, Xiang Y, Zhang G (2009) Self-healing and hybrid diagnosis in cloud computing. In: Proceedings of the IEEE International Conference on Cloud Computing. Springer, Berlin, Heidelberg, pp 45–56
Fekade B, Maksymyuk T, Jo M (2016) Clustering hypervisors to minimize failures in mobile cloud computing. Wirel Commun Mob Comput 16(18):3455–3465
Kombi RK, Lumineau N, Lamarre P, Rivetti N, Busnel Y (2019) DABS-Storm: A data-aware approach for elastic stream processing. Transactions on large-scale Data-and Knowledge-centered Systems XL, Springer, Berlin, Heidelberg pp 58–93
Watkins CJ, Dayan P (1992) Q-learning. In: Proceedings of the Machine learning vol 8, no. 3-4, pp 279-292
Domingos P, Pazzani M (1997) On the optimality of the simple Bayesian classifier under zero-one loss. Mach Learn 29(2–3):103–130
Tan PN, Steinbach M, Kumar V (2016) Introduction to data mining. Pearson Education India
Carroll, A., 2022. Alice's Adventures in Wonderland by Lewis Carroll. [online] Project Gutenberg. Available at: <https://www.gutenberg.org/ebooks/11> [Accessed 17 June 2022]
Illecker M (2015) SentiStorm, [Online]. Available: https:// github.com/millecker/senti-storm
Kaggle.com. 2022. Sentiment140 dataset with 1.6 million tweets. [online] Available at: <https://www.kaggle.com/kazanova/sentiment140> [Accessed 17 June 2022]
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: The process of scheduling in ER-Storm
Rights and permissions
About this article
Cite this article
Hadian, H., Farrokh, M., Sharifi, M. et al. An elastic and traffic-aware scheduler for distributed data stream processing in heterogeneous clusters. J Supercomput 79, 461–498 (2023). https://doi.org/10.1007/s11227-022-04669-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-022-04669-z