Abstract
Stream processing is a new memory computing paradigm that deals with dynamic data streams efficiently. Storm is one of the stream processing frameworks, but the default stream processing scheduler of storm also has some problems. For example, it does not consider reducing the cost in the cloud environment while ensuring the performance requirements. In this paper, a cost-efficient scheduling algorithm for Storm framework (CE-Storm) is proposed to reduce cost while satisfying deadline constrain. First, a new cost-efficient model (including resources usage cost, energy cost and communication cost) based on Storm framework is built. Then, based on the cost model, a cost-efficient scheduling algorithm which integrated resource monitoring module and communication detection module is designed. The nodes in the cluster are prioritized according to the cost-efficient information, and the nodes with higher priority are assigned tasks first to minimize the total cost of the cluster. Furthermore, this algorithm also reduces the communication cost between nodes and improves the cost effectiveness of the Storm cluster. We have performed extensive experiments on Storm clusters using Hibench’s workloads in cloud environment. The result shows that the cost consumption of Storm clusters in cloud environment is reduced by 19.25% on average compared with the traditional scheduling algorithm. In others words, the proposed algorithms effectively improve the cost efficiency of Storm cluster in the cloud environment while satisfying the performance constrains.
Similar content being viewed by others
Data availability
The datasets generated during the current study are available from the corresponding author on reasonable request.
Code availability
The code during the current study are available from the corresponding author on reasonable request.
References
Muhammad, A., Aleem, M., Islam, M.A.: TOP-Storm: a topology-based resource-aware scheduler for Stream Processing Engine. Cluster Comput. 24(1), 417–431 (2021)
Lattuada, M., Barbierato, E., Gianniti, E., Ardagna, D.: Optimal Resource Allocation of Cloud-Based Spark Applications. IEEE Trans. Cloud Comput. (2020). https://doi.org/10.1109/TCC.2020.2985682
Cheng, D., Zhou, X., Wang, Y., Jiang, C.: Adaptive scheduling parallel jobs with dynamic batching in spark streaming. IEEE Trans. Parallel Distrib. Syst. 29(12), 2672–2685 (2018)
Alfailakawi, M.G., Aljame, M., Ahmad, I.: Parallel and distributed implementation of sine cosine algorithm on apache spark platform. IEEE Access 9, 77188–77202 (2021)
Tang, Z., Liu, Z., Li, K., Li, K.: Real-time incremental recommendation for streaming data based on apache flink. Intell. Data Analysis 23(6), 1421–1437 (2019)
Li, H., Dai, H., Liu, Z., Fu, H., Zou, Y.: Dynamic energy-efficient scheduling for streaming applications in storm. Computing 20, 1–20 (2021)
Islam, M.T., Srirama, S.N., Karunasekera, S., Buyya, R.: Cost-efficient dynamic scheduling of big data applications in apache spark on cloud. J. Syst. Softw. 162, 110515 (2019)
Cao, H., Wu, C.Q., Bao, L., Hou, A., Shen, W.: Throughput optimization for Storm-based processing of stream data on clouds. Futur. Gener. Comp. Syst. 112, 567–579 (2020)
Mäcker, A., Malatyali, M., auf der Heide, F.M., Riechers, S.: Cost-efficient scheduling on machines from the cloud. J. Comb. Optim. 36(4), 1168–1194 (2018)
Son, S., Moon, Y.S.: Locality/fairness-aware job scheduling in distributed stream processing engines. Electronics 9(11), 1857 (2020)
Li, C., Zhang, J., Luo, Y.: Real-time scheduling based on optimized topology and communication traffic in distributed real-time computation platform of storm. J. Netw. Comput. Appl. 87, 100–115 (2017)
Requeno, J.I., Merseguer, J., Bernardi, S., Perez-Palacin, D., Giotis, G., Papanikolaou, V.: Quantitative analysis of apache storm applications: the newsasset case study. Inf. Syst. Front. 21(1), 67–85 (2019)
Bordin, M.V., Griebler, D., Mencagli, G., Geyer, C.F., Fernandes, L.G.L.: DSPBench: a suite of benchmark applications for distributed data stream processing systems. IEEE Access 8, 222900–222917 (2020)
Zhang, Z., Liu, Z., Jiang, Q., Chen, J., An, H.: RDMA-based apache storm for high-performance stream data processing. IntJ. Parallel Program. 14, 1–14 (2021)
Muhammad, A., Aleem, M.: BAN-storm: a bandwidth-aware scheduling mechanism for stream jobs. J. Grid Comput. 19(3), 1–16 (2021)
Deng, S., Wang, B., Huang, S., Yue, C., Zhou, J., Wang, G.: Self-adaptive framework for efficient stream data classification on storm. IEEE Trans. Syst. Man Cybern. Syst. 50(1), 123–136 (2020)
Shukla, A., Simmhan, Y.: Model-driven scheduling for distributed stream processing systems. J. Parallel Distrib. Comput. 117, 98–114 (2018)
Sun, D., Gao, S., Liu, X., Li, F., Zheng, X., Buyya, R.: State and runtime-aware scheduling in elastic stream computing systems. Futur. Gener. Comp. Syst. 97, 194–209 (2019)
Kim, Y., Son, S., Moon, Y.S.: SPMgr: dynamic workflow manager for sampling and filtering data streams over Apache Storm. Int. J. Distrib. Sens. Netw. 15(7), 1550147719862206 (2019)
Liu, X.: Robust resource management in distributed stream processing systems. PhD thesis. (2018)
Sun, D., He, H., Yan, H., Gao, S., Liu, X., Zheng, X.: Lr-Stream: Using latency and resource aware scheduling to improve latency and throughput for streaming applications. Futur. Gener. Comput. Syst. 114, 243–258 (2021)
Tantalaki, N., Souravlas, S., Roumeliotis, M., Katsavounis, S.: Pipeline-based linear scheduling of big data streams in the cloud. IEEE Access. 8, 117182–117202 (2020)
Al-Sinayyid, A., Zhu, M.: Job scheduler for streaming applications in heterogeneous distributed processing systems. J. Supercomput. 76(12), 9609–9628 (2020)
Muhammad, A., Aleem, M.: A3-Storm: topology-, traffic-, and resource-aware storm scheduler for heterogeneous clusters. J. Supercomput. (2020). https://doi.org/10.1007/s11227-020-03289-9
Sun, D., Gao, S., Liu, X., Li, F., Buyya, R.: Performance-aware deployment of streaming applications in distributed stream computing systems. Int. J. Bio-Inspired Comput. 15(1), 52–62 (2020)
Sun, D., Yan, H., Gao, S., Liu, X., Buyya, R.: Rethinking elastic online scheduling of big data streaming applications over high-velocity continuous data streams. J. Supercomput. 74(2), 615–636 (2018)
Liu, X., Buyya, R.: Performance-oriented deployment of streaming applications on cloud. IEEE Trans. Big Data. 5(1), 46–59 (2019)
Liu, S., Weng, J., Wang, J.H., An, C., Zhou, Y., Wang, J.: An adaptive online scheme for scheduling and resource enforcement in storm. IEEE-ACM Trans. Netw. 27(4), 1373–1386 (2019)
Abualigah, L., Yousri, D., Abd-Elaziz, M., Ewees, A.A., Al-qaness, M.A., Gandomi, A.H.: Aquila optimizer: a novel meta-heuristic optimization algorithm. Comput. Ind. Eng. 157, 107250 (2021)
Abualigah, L., Diabat, A., Mirjalili, S., Abd-Elaziz, M., Gandomi, A.H.: The arithmetic optimization algorithm. Comput. Methods Appl. Mech. Eng. 376, 113609 (2021)
Abualigah, L., Diabat, A.: Advances in sine cosine algorithm: a comprehensive survey. Artif. Intell. Rev. 14, 1–42 (2021)
Abualigah, L., Dulaimi, A.J.: A novel feature selection method for data mining tasks using hybrid sine cosine algorithm and genetic algorithm. Clust. Comput. 15, 1–16 (2021)
Samadi, Y., Zbakh, M., Tadonki, C.: Performance comparison between Hadoop and Spark frameworks using HiBench benchmarks. Concurr. Comput. Pract. Exp. 30(12), e4367 (2018)
Hussain, A., Aleem, M., Iqbal, M.A., Islam, M.A.: SLA-RALBA: cost-efficient and resource-aware load balancing algorithm for cloud computing. J. Supercomput. 75(10), 6777–6803 (2019)
Stavrinides, G.L., Karatza, H.D.: An energy-efficient, QoS-aware and cost-effective scheduling approach for real-time workflow applications in cloud computing systems utilizing DVFS and approximate computations. Futur. Gener. Comp. Syst. 96, 216–226 (2019)
Abualigah, L., Diabat, A.: A novel hybrid antlion optimization algorithm for multi-objective task scheduling problems in cloud computing environments. Clust. Comput. 24(1), 205–223 (2021)
Acknowledgements
This work was supported by Chongqing science and Technology Commission Project (Grant No: cstc2018jcyjAX0525), Key Research and Development Projects of Sichuan Science and Technology Department (Grant No: 2019YFG0107).
Funding
This work was supported by Chongqing science and Technology Commission Project (Grant No: cstc2018jcyjAX0525; Recipient: Hongjian Li), Key Research and Development Projects of Sichuan Science and Technology Department (Grant No: 2019YFG0107; Recipient: Hongjian Li).
Author information
Authors and Affiliations
Contributions
HL: Proposed an idea, Experiment, Wrote the manuscript. HF: Proposed an idea, Experiment, Wrote the manuscript. HD: Experiment, Helped to wrote also several sections of the manuscript, Proofreading. TZ: Helped to wrote also several sections of the manuscript, Proof reading WS: Helped to wrote also several sections of the manuscript, Proofreading. JW: Proofreading, also gave valuable comments to improve the manuscript quality. CX: Proofreading, also gave valuable comments to improve the manuscript quality.
Corresponding author
Ethics declarations
Conflict of interest
None. The authors declare that they have no known conflict financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, H., Fang, H., Dai, H. et al. A cost-efficient scheduling algorithm for streaming processing applications on cloud. Cluster Comput 25, 781–803 (2022). https://doi.org/10.1007/s10586-021-03462-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-021-03462-6