Skip to main content
Log in

TOP-Storm: A topology-based resource-aware scheduler for Stream Processing Engine

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Like other emerging fields, Stream Processing Engines (SPEs) pose several challenges to the researchers e.g., resource awareness, dynamic configurations, heterogeneous clusters, load balancing, and topology awareness. All of these aspects play a major role in the job scheduling process. Currently, SPEs ignore topology’s structure while scheduling. Due to this, frequently communicating tasks may end up at different computing nodes which causes problems for achieving the maximum throughput. In this paper, TOP-Storm—a scheduler based on topology’s DAG (Directed Acyclic Graph) is proposed for Apache Storm (a popular open-source SPE) that optimize resource usage for heterogeneous clusters. The aim is to improve efficiency using resource-aware task assignments that results in enhanced throughput and optimize resource utilization. TOP-Storm is divided into two phases: In the first phase, executors are logically grouped with the help of DAG to minimize inter-group communication. In the second phase, these groups are assigned to physical nodes starting from the most powerful node. Results are generated with the help of two benchmark topologies and results are compared with two state-of-the-art scheduling algorithms. Experiment results show up to 39% and 11% improvement in throughput as compared to the default Apache Storm scheduler and R-Storm, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Notes

  1. https://github.com/apache/storm

References

  1. A. S. Foundation, Apache Storm Documentation [Online]. Available: https://storm.incubator.apache.org/documentation/Home.html. (2014) Accessed 13 Nov2017

  2. Apache Software Foundation, S4 Incubation Status—Apache Incubator [Online]. Available: https://incubator.apache.org/projects/s4.html. (2014) Accessed 23 Aug 2019

  3. The Apache Software Foundation Apache SparkTM is a unified analytics engine for large-scale data processing, Apache Spark, [Online]. Available: https://spark.apache.org/. (2018) Accessed 23 Aug 2019

  4. SQLstream | Streaming SQL Analytics for Kafka & Kinesis—SQLstream provides the power to create streaming Kafka & Kinesis applications with continuous SQL queries to discover, analyze and act on data in real time. [Online]. Available: https://sqlstream.com/. Accessed 06 Sep 2018

  5. Illecker, M.: Real-time twitter sentiment classification based on Apache Storm (2015)

  6. Aniello, L., Baldoni, R., Querzoni, L.: Adaptive online scheduling in storm, in DEBS 2013Proceedings of the 7th ACM International Conference on Distributed Event-Based Systems (2013) pp. 207–218.

  7. Light, J.: Energy usage profiling for green computing. Proceeding—IEEE Int. Conf. Comput. Commun. Autom. ICCCA (2017) vol. 2017-January, pp. 1287–1291

  8. Liu, X., Buyya, R.: D-Storm: dynamic resource-efficient scheduling of stream processing applications, Proc. Int. Conf. Parallel Distrib. Syst.—ICPADS (2018) vol. 2017-December, pp. 485–492

  9. Peng, B., Hosseini, M., Hong, Z., Farivar, R., Campbell, R.: R-storm: resource-aware scheduling in storm, in Middleware 2015— Proceedings of the 16th Annual Middleware Conference (2015) pp. 149–161.

  10. Weng, Z., Guo, Q., Wang, C., Meng, X., He, B.: AdaStorm: resource efficient storm with adaptive configuration, in Proceedings—International Conference on Data Engineering (2017) pp. 1363–1364.

  11. Li, C., Zhang, J.: Real-time scheduling based on optimized topology and communication traffic in distributed real-time computation platform of storm. J. Netw. Comput. Appl. 87, 100–115 (Jun. 2017)

    Article  Google Scholar 

  12. Eskandari, L., Huang, Z., Eyers, D.: P-scheduler: adaptive hierarchical scheduling in Apache Storm, in ACM International Conference Proceeding Series (2016) vol. 01–05-February-2016, pp. 1–10.

  13. Apache Storm: Architecture - DZone Big Data. [Online]. Available: https://dzone.com/articles/apache-storm-architecture. Accessed 27 Jun 2018

  14. Palmer, N.: Scheduler, in Encyclopedia of Database Systems (2016) pp. 1–1

  15. Chen, M., Mao, S., Liu, Y.: Big data: a survey. Mob. Networks Appl. 19(2), 171–209 (Apr. 2014)

    Article  Google Scholar 

  16. Hussain, A., Aleem, M., Khan, A., Iqbal, M.A., Islam, M.A.: RALBA: a computation-aware load balancing scheduler for cloud computing. Cluster Comput. 21(3), 1667–1680 (Sep. 2018)

    Article  Google Scholar 

  17. Apache Zookeeper, Apache ZooKeeper— Home [Online]. Available: https://zookeeper.apache.org/. (2016) Accessed 13 Nov 2018][18] P. Smirnov, M. Melnik, and D. Nasonov, Performance-aware scheduling of streaming applications using genetic algorithm, Procedia Comput. Sci., vol. 108, no. 3, pp. 2240–2249, 2017.

  18. Smirnov, P., Melnik, M., Nasonov, D.: Performance-aware scheduling of streaming applications using genetic algorithm. Procedia Comput. Sci. 108(3), 2240–2249 (2017)

    Article  Google Scholar 

  19. Xu, J., Chen, Z., Tang, J., Su, S.: T-storm: traffic-aware online scheduling in storm, in Proceedings—International Conference on Distributed Computing Systems (2014) pp. 535–544.

  20. FLOPS - Wikipedia. [Online]. Available: https://en.wikipedia.org/wiki/FLOPS. Accessed 30 Jan 2020

  21. FLOPS (Floating Point Operations Per Second) Definition. [Online]. Available: https://techterms.com/definition/flops. Accessed 30 Jan 2020

  22. Khalid, Y.N., Aleem, M., Prodan, R., Iqbal, M.A., Islam, M.A.: E-OSched: a load balancing scheduler for heterogeneous multicores. J. Supercomput. 74(10), 5399–5431 (Oct. 2018)

    Article  Google Scholar 

  23. Dolbeau, R.: Theoretical peak FLOPS per instruction set: a tutorial. J. Supercomput. 74(3), 1341–1377 (Mar. 2018)

    Article  Google Scholar 

  24. Default Scheduler, GitHub. [Online]. Available: https://github.com/apache/storm/blob/v2.0.0/storm-server/src/main/java/org/apache/storm/scheduler/DefaultScheduler.java. (2019) Accessed 23 Aug 2019

  25. Shukla, A., Simmhan, Y.: Model-driven scheduling for distributed stream processing systems. J. Parallel Distrib. Comput. 117, 98–114 (Jul. 2018)

    Article  Google Scholar 

  26. Li, T., Xu, Z., Tang, J., Wang, Y.: Model-free control for distributed stream data processing using deep reinforcement learning. Proc. VLDB Endow. 11(6), 705–718 (2018)

    Article  Google Scholar 

  27. Resource Aware Scheduler [Online]. Available: http://storm.apache.org/releases/2.0.0/Resource_Aware_Scheduler_overview.html. (2019) Accessed 23 Aug 2019

  28. Word Count, SpringerReference [Online]. Available: https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/org/apache/storm/starter/WordCountTopology.java. (2011) Accessed 23 Aug 2019

  29. Storm Topology Explained using Word Count Topology Example | CoreJavaGuru. [Online]. Available: https://www.corejavaguru.com/bigdata/storm/word-count-topology. Accessed 09 Jun 2019

  30. Creating your first topology—Building Python Real-Time Applications with Storm [Book]. [Online]. Available: https://www.oreilly.com/library/view/building-python-real-time/9781784392857/ch03s03.html. Accessed 05 Sep 2019

  31. Exclamation Topology, GitHub [Online]. Available: https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/org/apache/storm/starter/ExclamationTopology.java. (2019) Accessed 23 Aug 2019

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muhammad Aleem.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Muhammad, A., Aleem, M. & Islam, M.A. TOP-Storm: A topology-based resource-aware scheduler for Stream Processing Engine. Cluster Comput 24, 417–431 (2021). https://doi.org/10.1007/s10586-020-03117-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-020-03117-y

Keywords

Navigation