Abstract
Like other emerging fields, Stream Processing Engines (SPEs) pose several challenges to the researchers e.g., resource awareness, dynamic configurations, heterogeneous clusters, load balancing, and topology awareness. All of these aspects play a major role in the job scheduling process. Currently, SPEs ignore topology’s structure while scheduling. Due to this, frequently communicating tasks may end up at different computing nodes which causes problems for achieving the maximum throughput. In this paper, TOP-Storm—a scheduler based on topology’s DAG (Directed Acyclic Graph) is proposed for Apache Storm (a popular open-source SPE) that optimize resource usage for heterogeneous clusters. The aim is to improve efficiency using resource-aware task assignments that results in enhanced throughput and optimize resource utilization. TOP-Storm is divided into two phases: In the first phase, executors are logically grouped with the help of DAG to minimize inter-group communication. In the second phase, these groups are assigned to physical nodes starting from the most powerful node. Results are generated with the help of two benchmark topologies and results are compared with two state-of-the-art scheduling algorithms. Experiment results show up to 39% and 11% improvement in throughput as compared to the default Apache Storm scheduler and R-Storm, respectively.
Similar content being viewed by others
References
A. S. Foundation, Apache Storm Documentation [Online]. Available: https://storm.incubator.apache.org/documentation/Home.html. (2014) Accessed 13 Nov2017
Apache Software Foundation, S4 Incubation Status—Apache Incubator [Online]. Available: https://incubator.apache.org/projects/s4.html. (2014) Accessed 23 Aug 2019
The Apache Software Foundation Apache SparkTM is a unified analytics engine for large-scale data processing, Apache Spark, [Online]. Available: https://spark.apache.org/. (2018) Accessed 23 Aug 2019
SQLstream | Streaming SQL Analytics for Kafka & Kinesis—SQLstream provides the power to create streaming Kafka & Kinesis applications with continuous SQL queries to discover, analyze and act on data in real time. [Online]. Available: https://sqlstream.com/. Accessed 06 Sep 2018
Illecker, M.: Real-time twitter sentiment classification based on Apache Storm (2015)
Aniello, L., Baldoni, R., Querzoni, L.: Adaptive online scheduling in storm, in DEBS 2013—Proceedings of the 7th ACM International Conference on Distributed Event-Based Systems (2013) pp. 207–218.
Light, J.: Energy usage profiling for green computing. Proceeding—IEEE Int. Conf. Comput. Commun. Autom. ICCCA (2017) vol. 2017-January, pp. 1287–1291
Liu, X., Buyya, R.: D-Storm: dynamic resource-efficient scheduling of stream processing applications, Proc. Int. Conf. Parallel Distrib. Syst.—ICPADS (2018) vol. 2017-December, pp. 485–492
Peng, B., Hosseini, M., Hong, Z., Farivar, R., Campbell, R.: R-storm: resource-aware scheduling in storm, in Middleware 2015— Proceedings of the 16th Annual Middleware Conference (2015) pp. 149–161.
Weng, Z., Guo, Q., Wang, C., Meng, X., He, B.: AdaStorm: resource efficient storm with adaptive configuration, in Proceedings—International Conference on Data Engineering (2017) pp. 1363–1364.
Li, C., Zhang, J.: Real-time scheduling based on optimized topology and communication traffic in distributed real-time computation platform of storm. J. Netw. Comput. Appl. 87, 100–115 (Jun. 2017)
Eskandari, L., Huang, Z., Eyers, D.: P-scheduler: adaptive hierarchical scheduling in Apache Storm, in ACM International Conference Proceeding Series (2016) vol. 01–05-February-2016, pp. 1–10.
Apache Storm: Architecture - DZone Big Data. [Online]. Available: https://dzone.com/articles/apache-storm-architecture. Accessed 27 Jun 2018
Palmer, N.: Scheduler, in Encyclopedia of Database Systems (2016) pp. 1–1
Chen, M., Mao, S., Liu, Y.: Big data: a survey. Mob. Networks Appl. 19(2), 171–209 (Apr. 2014)
Hussain, A., Aleem, M., Khan, A., Iqbal, M.A., Islam, M.A.: RALBA: a computation-aware load balancing scheduler for cloud computing. Cluster Comput. 21(3), 1667–1680 (Sep. 2018)
Apache Zookeeper, Apache ZooKeeper— Home [Online]. Available: https://zookeeper.apache.org/. (2016) Accessed 13 Nov 2018][18] P. Smirnov, M. Melnik, and D. Nasonov, Performance-aware scheduling of streaming applications using genetic algorithm, Procedia Comput. Sci., vol. 108, no. 3, pp. 2240–2249, 2017.
Smirnov, P., Melnik, M., Nasonov, D.: Performance-aware scheduling of streaming applications using genetic algorithm. Procedia Comput. Sci. 108(3), 2240–2249 (2017)
Xu, J., Chen, Z., Tang, J., Su, S.: T-storm: traffic-aware online scheduling in storm, in Proceedings—International Conference on Distributed Computing Systems (2014) pp. 535–544.
FLOPS - Wikipedia. [Online]. Available: https://en.wikipedia.org/wiki/FLOPS. Accessed 30 Jan 2020
FLOPS (Floating Point Operations Per Second) Definition. [Online]. Available: https://techterms.com/definition/flops. Accessed 30 Jan 2020
Khalid, Y.N., Aleem, M., Prodan, R., Iqbal, M.A., Islam, M.A.: E-OSched: a load balancing scheduler for heterogeneous multicores. J. Supercomput. 74(10), 5399–5431 (Oct. 2018)
Dolbeau, R.: Theoretical peak FLOPS per instruction set: a tutorial. J. Supercomput. 74(3), 1341–1377 (Mar. 2018)
Default Scheduler, GitHub. [Online]. Available: https://github.com/apache/storm/blob/v2.0.0/storm-server/src/main/java/org/apache/storm/scheduler/DefaultScheduler.java. (2019) Accessed 23 Aug 2019
Shukla, A., Simmhan, Y.: Model-driven scheduling for distributed stream processing systems. J. Parallel Distrib. Comput. 117, 98–114 (Jul. 2018)
Li, T., Xu, Z., Tang, J., Wang, Y.: Model-free control for distributed stream data processing using deep reinforcement learning. Proc. VLDB Endow. 11(6), 705–718 (2018)
Resource Aware Scheduler [Online]. Available: http://storm.apache.org/releases/2.0.0/Resource_Aware_Scheduler_overview.html. (2019) Accessed 23 Aug 2019
Word Count, SpringerReference [Online]. Available: https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/org/apache/storm/starter/WordCountTopology.java. (2011) Accessed 23 Aug 2019
Storm Topology Explained using Word Count Topology Example | CoreJavaGuru. [Online]. Available: https://www.corejavaguru.com/bigdata/storm/word-count-topology. Accessed 09 Jun 2019
Creating your first topology—Building Python Real-Time Applications with Storm [Book]. [Online]. Available: https://www.oreilly.com/library/view/building-python-real-time/9781784392857/ch03s03.html. Accessed 05 Sep 2019
Exclamation Topology, GitHub [Online]. Available: https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/org/apache/storm/starter/ExclamationTopology.java. (2019) Accessed 23 Aug 2019
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Muhammad, A., Aleem, M. & Islam, M.A. TOP-Storm: A topology-based resource-aware scheduler for Stream Processing Engine. Cluster Comput 24, 417–431 (2021). https://doi.org/10.1007/s10586-020-03117-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-020-03117-y