Skip to main content
Log in

A cost-efficient scheduling algorithm for streaming processing applications on cloud

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Stream processing is a new memory computing paradigm that deals with dynamic data streams efficiently. Storm is one of the stream processing frameworks, but the default stream processing scheduler of storm also has some problems. For example, it does not consider reducing the cost in the cloud environment while ensuring the performance requirements. In this paper, a cost-efficient scheduling algorithm for Storm framework (CE-Storm) is proposed to reduce cost while satisfying deadline constrain. First, a new cost-efficient model (including resources usage cost, energy cost and communication cost) based on Storm framework is built. Then, based on the cost model, a cost-efficient scheduling algorithm which integrated resource monitoring module and communication detection module is designed. The nodes in the cluster are prioritized according to the cost-efficient information, and the nodes with higher priority are assigned tasks first to minimize the total cost of the cluster. Furthermore, this algorithm also reduces the communication cost between nodes and improves the cost effectiveness of the Storm cluster. We have performed extensive experiments on Storm clusters using Hibench’s workloads in cloud environment. The result shows that the cost consumption of Storm clusters in cloud environment is reduced by 19.25% on average compared with the traditional scheduling algorithm. In others words, the proposed algorithms effectively improve the cost efficiency of Storm cluster in the cloud environment while satisfying the performance constrains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Data availability

The datasets generated during the current study are available from the corresponding author on reasonable request.

Code availability

The code during the current study are available from the corresponding author on reasonable request.

References

  1. Muhammad, A., Aleem, M., Islam, M.A.: TOP-Storm: a topology-based resource-aware scheduler for Stream Processing Engine. Cluster Comput. 24(1), 417–431 (2021)

    Article  Google Scholar 

  2. Lattuada, M., Barbierato, E., Gianniti, E., Ardagna, D.: Optimal Resource Allocation of Cloud-Based Spark Applications. IEEE Trans. Cloud Comput. (2020). https://doi.org/10.1109/TCC.2020.2985682

    Article  Google Scholar 

  3. Cheng, D., Zhou, X., Wang, Y., Jiang, C.: Adaptive scheduling parallel jobs with dynamic batching in spark streaming. IEEE Trans. Parallel Distrib. Syst. 29(12), 2672–2685 (2018)

    Article  Google Scholar 

  4. Alfailakawi, M.G., Aljame, M., Ahmad, I.: Parallel and distributed implementation of sine cosine algorithm on apache spark platform. IEEE Access 9, 77188–77202 (2021)

    Article  Google Scholar 

  5. Tang, Z., Liu, Z., Li, K., Li, K.: Real-time incremental recommendation for streaming data based on apache flink. Intell. Data Analysis 23(6), 1421–1437 (2019)

    Article  Google Scholar 

  6. Li, H., Dai, H., Liu, Z., Fu, H., Zou, Y.: Dynamic energy-efficient scheduling for streaming applications in storm. Computing 20, 1–20 (2021)

    Google Scholar 

  7. Islam, M.T., Srirama, S.N., Karunasekera, S., Buyya, R.: Cost-efficient dynamic scheduling of big data applications in apache spark on cloud. J. Syst. Softw. 162, 110515 (2019)

    Article  Google Scholar 

  8. Cao, H., Wu, C.Q., Bao, L., Hou, A., Shen, W.: Throughput optimization for Storm-based processing of stream data on clouds. Futur. Gener. Comp. Syst. 112, 567–579 (2020)

    Article  Google Scholar 

  9. Mäcker, A., Malatyali, M., auf der Heide, F.M., Riechers, S.: Cost-efficient scheduling on machines from the cloud. J. Comb. Optim. 36(4), 1168–1194 (2018)

    Article  MathSciNet  Google Scholar 

  10. Son, S., Moon, Y.S.: Locality/fairness-aware job scheduling in distributed stream processing engines. Electronics 9(11), 1857 (2020)

    Article  Google Scholar 

  11. Li, C., Zhang, J., Luo, Y.: Real-time scheduling based on optimized topology and communication traffic in distributed real-time computation platform of storm. J. Netw. Comput. Appl. 87, 100–115 (2017)

    Article  Google Scholar 

  12. Requeno, J.I., Merseguer, J., Bernardi, S., Perez-Palacin, D., Giotis, G., Papanikolaou, V.: Quantitative analysis of apache storm applications: the newsasset case study. Inf. Syst. Front. 21(1), 67–85 (2019)

    Article  Google Scholar 

  13. Bordin, M.V., Griebler, D., Mencagli, G., Geyer, C.F., Fernandes, L.G.L.: DSPBench: a suite of benchmark applications for distributed data stream processing systems. IEEE Access 8, 222900–222917 (2020)

    Article  Google Scholar 

  14. Zhang, Z., Liu, Z., Jiang, Q., Chen, J., An, H.: RDMA-based apache storm for high-performance stream data processing. IntJ. Parallel Program. 14, 1–14 (2021)

    Google Scholar 

  15. Muhammad, A., Aleem, M.: BAN-storm: a bandwidth-aware scheduling mechanism for stream jobs. J. Grid Comput. 19(3), 1–16 (2021)

    Article  Google Scholar 

  16. Deng, S., Wang, B., Huang, S., Yue, C., Zhou, J., Wang, G.: Self-adaptive framework for efficient stream data classification on storm. IEEE Trans. Syst. Man Cybern. Syst. 50(1), 123–136 (2020)

    Article  Google Scholar 

  17. Shukla, A., Simmhan, Y.: Model-driven scheduling for distributed stream processing systems. J. Parallel Distrib. Comput. 117, 98–114 (2018)

    Article  Google Scholar 

  18. Sun, D., Gao, S., Liu, X., Li, F., Zheng, X., Buyya, R.: State and runtime-aware scheduling in elastic stream computing systems. Futur. Gener. Comp. Syst. 97, 194–209 (2019)

    Article  Google Scholar 

  19. Kim, Y., Son, S., Moon, Y.S.: SPMgr: dynamic workflow manager for sampling and filtering data streams over Apache Storm. Int. J. Distrib. Sens. Netw. 15(7), 1550147719862206 (2019)

    Article  Google Scholar 

  20. Liu, X.: Robust resource management in distributed stream processing systems. PhD thesis. (2018)

  21. Sun, D., He, H., Yan, H., Gao, S., Liu, X., Zheng, X.: Lr-Stream: Using latency and resource aware scheduling to improve latency and throughput for streaming applications. Futur. Gener. Comput. Syst. 114, 243–258 (2021)

    Article  Google Scholar 

  22. Tantalaki, N., Souravlas, S., Roumeliotis, M., Katsavounis, S.: Pipeline-based linear scheduling of big data streams in the cloud. IEEE Access. 8, 117182–117202 (2020)

    Article  Google Scholar 

  23. Al-Sinayyid, A., Zhu, M.: Job scheduler for streaming applications in heterogeneous distributed processing systems. J. Supercomput. 76(12), 9609–9628 (2020)

    Article  Google Scholar 

  24. Muhammad, A., Aleem, M.: A3-Storm: topology-, traffic-, and resource-aware storm scheduler for heterogeneous clusters. J. Supercomput. (2020). https://doi.org/10.1007/s11227-020-03289-9

    Article  Google Scholar 

  25. Sun, D., Gao, S., Liu, X., Li, F., Buyya, R.: Performance-aware deployment of streaming applications in distributed stream computing systems. Int. J. Bio-Inspired Comput. 15(1), 52–62 (2020)

    Article  Google Scholar 

  26. Sun, D., Yan, H., Gao, S., Liu, X., Buyya, R.: Rethinking elastic online scheduling of big data streaming applications over high-velocity continuous data streams. J. Supercomput. 74(2), 615–636 (2018)

    Article  Google Scholar 

  27. Liu, X., Buyya, R.: Performance-oriented deployment of streaming applications on cloud. IEEE Trans. Big Data. 5(1), 46–59 (2019)

    Article  Google Scholar 

  28. Liu, S., Weng, J., Wang, J.H., An, C., Zhou, Y., Wang, J.: An adaptive online scheme for scheduling and resource enforcement in storm. IEEE-ACM Trans. Netw. 27(4), 1373–1386 (2019)

    Article  Google Scholar 

  29. Abualigah, L., Yousri, D., Abd-Elaziz, M., Ewees, A.A., Al-qaness, M.A., Gandomi, A.H.: Aquila optimizer: a novel meta-heuristic optimization algorithm. Comput. Ind. Eng. 157, 107250 (2021)

    Article  Google Scholar 

  30. Abualigah, L., Diabat, A., Mirjalili, S., Abd-Elaziz, M., Gandomi, A.H.: The arithmetic optimization algorithm. Comput. Methods Appl. Mech. Eng. 376, 113609 (2021)

    Article  MathSciNet  Google Scholar 

  31. Abualigah, L., Diabat, A.: Advances in sine cosine algorithm: a comprehensive survey. Artif. Intell. Rev. 14, 1–42 (2021)

    Google Scholar 

  32. Abualigah, L., Dulaimi, A.J.: A novel feature selection method for data mining tasks using hybrid sine cosine algorithm and genetic algorithm. Clust. Comput. 15, 1–16 (2021)

    Google Scholar 

  33. Samadi, Y., Zbakh, M., Tadonki, C.: Performance comparison between Hadoop and Spark frameworks using HiBench benchmarks. Concurr. Comput. Pract. Exp. 30(12), e4367 (2018)

    Article  Google Scholar 

  34. Hussain, A., Aleem, M., Iqbal, M.A., Islam, M.A.: SLA-RALBA: cost-efficient and resource-aware load balancing algorithm for cloud computing. J. Supercomput. 75(10), 6777–6803 (2019)

    Article  Google Scholar 

  35. Stavrinides, G.L., Karatza, H.D.: An energy-efficient, QoS-aware and cost-effective scheduling approach for real-time workflow applications in cloud computing systems utilizing DVFS and approximate computations. Futur. Gener. Comp. Syst. 96, 216–226 (2019)

    Article  Google Scholar 

  36. Abualigah, L., Diabat, A.: A novel hybrid antlion optimization algorithm for multi-objective task scheduling problems in cloud computing environments. Clust. Comput. 24(1), 205–223 (2021)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by Chongqing science and Technology Commission Project (Grant No: cstc2018jcyjAX0525), Key Research and Development Projects of Sichuan Science and Technology Department (Grant No: 2019YFG0107).

Funding

This work was supported by Chongqing science and Technology Commission Project (Grant No: cstc2018jcyjAX0525; Recipient: Hongjian Li), Key Research and Development Projects of Sichuan Science and Technology Department (Grant No: 2019YFG0107; Recipient: Hongjian Li).

Author information

Authors and Affiliations

Authors

Contributions

HL: Proposed an idea, Experiment, Wrote the manuscript. HF: Proposed an idea, Experiment, Wrote the manuscript. HD: Experiment, Helped to wrote also several sections of the manuscript, Proofreading. TZ: Helped to wrote also several sections of the manuscript, Proof reading WS: Helped to wrote also several sections of the manuscript, Proofreading. JW: Proofreading, also gave valuable comments to improve the manuscript quality. CX: Proofreading, also gave valuable comments to improve the manuscript quality.

Corresponding author

Correspondence to Hongjian Li.

Ethics declarations

Conflict of interest

None. The authors declare that they have no known conflict financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, H., Fang, H., Dai, H. et al. A cost-efficient scheduling algorithm for streaming processing applications on cloud. Cluster Comput 25, 781–803 (2022). https://doi.org/10.1007/s10586-021-03462-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-021-03462-6

Keywords

Navigation