Adaptive Scheduling Framework of Streaming Applications based on Resource Demand Prediction with Hybrid Algorithms

Li, Hongjian; Luo, Wei; Xie, Wenbin; Ye, Huaqing; Duan, Xiaolin

doi:10.1007/s10723-024-09756-4

Adaptive Scheduling Framework of Streaming Applications based on Resource Demand Prediction with Hybrid Algorithms

Research
Published: 09 March 2024

Volume 22, article number 39, (2024)
Cite this article

Journal of Grid Computing Aims and scope Submit manuscript

Hongjian Li¹,
Wei Luo¹,
Wenbin Xie²,
Huaqing Ye¹ &
…
Xiaolin Duan¹

95 Accesses
Explore all metrics

Abstract

Spark Streaming is currently one of the mainstream stream processing frameworks which process real-time stream data by using micro-batch approach. However, there are some issues with its default task scheduling process, such as the high cost of cluster usage due to inappropriate executor placement strategy in heterogeneous cluster environments. Meanwhile, most of the current scheduling studies focus on improving the processing performance of the clusters, while ignoring the cost efficiency and service quality assurance of the clusters. In this paper, we propose a low-cost executor placement method based on resource demand prediction using machine learning under heterogeneous clusters, which is called Cost-Efficient and Best-Fit Decrease (CEBFD). First, a cost-efficient model is constructed for the Spark Streaming framework, then the Sparrow Search Algorithm (SSA) and eXtreme Gradient Boosting (XGboost) algorithm are combined to predict the resources required by streaming tasks, and finally the executor placement method for the heterogeneous Spark Streaming clusters is designed based on the cost-efficient model and resource demand prediction. Furthermore, the proposed method also improves the Service Level Agreement (SLA) of cost minimization and job deadline guarantee for streaming processing. Experimental results show that the proposed approach reduces the cluster usage cost by 6.89% to 52.24% and effectively optimizes SLA compared to existing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Puma optimizer (PO): a novel metaheuristic optimization algorithm and its application in machine learning

Article 19 January 2024

Big data analytics on Apache Spark

Article 13 October 2016

Big data preprocessing: methods and prospects

Article Open access 01 November 2016

Availability of data and materials

The datasets generated during the current study are available from the corresponding author on reasonable request.

References

Sunyaev, A., Sunyaev, A.: Cloud computing. Internet Computing: Principles of Distributed Systems and Emerging Internet-Based Technologies, 195–236 (2020)
Kalia, K., Gupta, N.: Analysis of hadoop mapreduce scheduling in heterogeneous environment. Ain Shams Engineering Journal 12(1), 1101–1110 (2021)
Article Google Scholar
Hu, Z.-Y., Zhang, Z.-H., Cheng, X.-W., Wang, F.-C., Zhang, Y.-F., Li, S.-L.: A review of multi-physical fields induced phenomena and effects in spark plasma sintering: fundamentals and applications. Materials & Design 191, 108662 (2020)
HoseinyFarahabady, M.R., Jannesari, A., Taheri, J., Bao, W., Zomaya, A.Y., Tari, Z.: Q-flink: a qos-aware controller for apache flink. In: 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), pp. 629–638 (2020). IEEE
Liu, X., Buyya, R.: Resource management and scheduling in distributed stream processing systems: a taxonomy, review, and future directions. ACM Computing Surveys (CSUR) 53(3), 1–41 (2020)
Article CAS Google Scholar
Ma, H., Tang, W., Zhu, H., Zhang, H.: Resource utilization-aware collaborative optimization of iaas cloud service composition for data-intensive applications. IEEE Transactions on Systems, Man, and Cybernetics: Systems 51(2), 1322–1333 (2019)
Article Google Scholar
Weinman, J.: Hybrid cloud economics. IEEE Cloud. Computing 3(1), 18–22 (2016)
Google Scholar
Jain, T., Hazra, J.: Hybrid cloud computing investment strategies. Prod. Oper. Manag. 28(5), 1272–1284 (2019)
Article Google Scholar
Thai, L., Varghese, B., Barker, A.: A survey and taxonomy of resource optimisation for executing bag-of-task applications on public clouds. Futur. Gener. Comput. Syst. 82, 1–11 (2018)
Article Google Scholar
Matteussi, K.J., Dos Anjos, J.C., Leithardt, V.R., Geyer, C.F.: Performance evaluation analysis of spark streaming backpressure for data-intensive pipelines. Sensors 22(13), 4756 (2022)
Article ADS PubMed PubMed Central Google Scholar
Cheng, D., Chen, Y., Zhou, X., Gmach, D., Milojicic, D.: Adaptive scheduling of parallel jobs in spark streaming. In: IEEE INFOCOM 2017-IEEE Conference on Computer Communications, pp. 1–9 (2017). IEEE
Cheng, D., Zhou, X., Wang, Y., Jiang, C.: Adaptive scheduling parallel jobs with dynamic batching in spark streaming. IEEE Trans. Parallel Distrib. Syst. 29(12), 2672–2685 (2018)
Article Google Scholar
Khan, A.A., Zakarya, M.: Energy, performance and cost efficient cloud datacentres: a survey. Computer Science Review 40, 100390 (2021)
Kumar, H., Soh, P.J., Ismail, M.A.: Big data streaming platforms: a review. Iraqi Journal for Computer Science and Mathematics 3(2), 95–100 (2022)
Article Google Scholar
Liu, X., Buyya, R.: Performance-oriented deployment of streaming applications on cloud. IEEE Transactions on Big Data 5(1), 46–59 (2017)
Article Google Scholar
Liu, S., Weng, J., Wang, J.H., An, C., Zhou, Y., Wang, J.: An adaptive online scheme for scheduling and resource enforcement in storm. IEEE/ACM Trans. Networking 27(4), 1373–1386 (2019)
Article Google Scholar
Quan, Z., Wang, Z.-J., Ye, T., Guo, S.: Task scheduling for energy consumption constrained parallel applications on heterogeneous computing systems. IEEE Trans. Parallel Distrib. Syst. 31(5), 1165–1182 (2019)
Article Google Scholar
Hu, Z., Li, B., Qin, Z., Goh, R.S.M.: Low latency big data processing without prior information. IEEE Transactions on Cloud Computing 9(4), 1521–1534 (2019)
Rjoub, G., Bentahar, J., Wahab, O.A.: Bigtrustscheduling: trust-aware big data task scheduling approach in cloud computing environments. Futur. Gener. Comput. Syst. 110, 1079–1097 (2020)
Morisawa, Y., Suzuki, M., Kitahara, T.: Flexible executor allocation without latency increase for stream processing in apache spark. In: 2020 IEEE International Conference on Big Data (Big Data), pp. 2198–2206 (2020). IEEE
Ali, H., Tariq, U.U., Zheng, Y., Zhai, X., Liu, L.: Contention & energy-aware real-time task mapping on noc based heterogeneous mpsocs. IEEE Access 6, 75110–75123 (2018)
Article Google Scholar
Yang, C.-T., Chen, S.-T., Liu, J.-C., Chan, Y.-W., Chen, C.-C., Verma, V.K.: An energy-efficient cloud system with novel dynamic resource allocation methods. J. Supercomput. 75, 4408–4429 (2019)
Article Google Scholar
Liu, L., Xu, H.: Elasecutor: Elastic executor scheduling in data analytics systems. IEEE/ACM Trans. Networking 29(2), 681–694 (2021)
Article MathSciNet Google Scholar
Li, H., Xia, J., Luo, W., Fang, H.: Cost-efficient scheduling of streaming applications in apache flink on cloud. IEEE Transactions on Big Data (2022)
Li, H., Dai, H., Liu, Z., Fu, H., Zou, Y.: Dynamic energy-efficient scheduling for streaming applications in storm. Computing 104(2), 413–432 (2022)
Article Google Scholar
Tariq, U.U., Ali, H., Liu, L., Panneerselvam, J., Zhai, X.: Energy-efficient static task scheduling on vfi-based noc-hmpsocs for intelligent edge devices in cyber-physical systems. ACM Transactions on Intelligent Systems and Technology (TIST) 10(6), 1–22 (2019)
Article Google Scholar
Chen, R., Chen, X., Yang, C.: Using a task dependency job-scheduling method to make energy savings in a cloud computing environment. J. Supercomput. 78(3), 4550–4573 (2022)
Article MathSciNet Google Scholar
Li, H., Zhu, L., Wang, S., Wang, L.: Cost-aware scheduling and data skew alleviation for big data processing in heterogeneous cloud environment. Journal of Grid Computing 21(3), 33 (2023)
Article Google Scholar
Mangalampalli, S., Swain, S.K., Mangalampalli, V.K.: Multi objective task scheduling in cloud computing using cat swarm optimization algorithm. Arab. J. Sci. Eng. 47(2), 1821–1830 (2022)
Article Google Scholar
Kakkottakath Valappil Thekkepuryil, J., Suseelan, D.P., Keerikkattil, P.M.: An effective meta-heuristic based multi-objective hybrid optimization method for workflow scheduling in cloud computing environment. Cluster Computing 24, 2367–2384 (2021)
Islam, M.T., Wu, H., Karunasekera, S., Buyya, R.: Sla-based scheduling of spark jobs in hybrid cloud computing environments. IEEE Trans. Comput. 71(5), 1117–1132 (2021)
Article Google Scholar
Islam, M.T., Karunasekera, S., Buyya, R.: Performance and cost-efficient spark job scheduling based on deep reinforcement learning in cloud computing environments. IEEE Trans. Parallel Distrib. Syst. 33(7), 1695–1710 (2021)
Article Google Scholar
Li, H., Wang, H., Fang, S., Zou, Y., Tian, W.: An energy-aware scheduling algorithm for big data applications in spark. Clust. Comput. 23, 593–609 (2020)
Article Google Scholar
Islam, M.T., Wu, H., Karunasekera, S., Buyya, R.: Sla-based scheduling of spark jobs in hybrid cloud computing environments. IEEE Trans. Comput. 71(5), 1117–1132 (2021)
Article Google Scholar
Shabestari, F., Rahmani, A.M., Navimipour, N.J., Jabbehdari, S.: A yarn-based energy-aware scheduling method for big data applications under deadline constraints. Journal of Grid Computing 20(4), 38 (2022)
Article Google Scholar
Li, J., Zhang, R., Zheng, Y.: Qos-aware and multi-objective virtual machine dynamic scheduling for big data centers in clouds. Soft. Comput. 26(19), 10239–10252 (2022)
Article Google Scholar
Kang, Y., Pan, L., Liu, S.: An online algorithm for scheduling big data analysis jobs in cloud environments. Knowl.-Based Syst. 245, 108628 (2022)
Article Google Scholar
Cheng, M., Li, J., Nazarian, S.: Drl-cloud: Deep reinforcement learning-based resource provisioning and task scheduling for cloud service providers. In: 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 129–134 (2018). IEEE
Zhang, H., Sun, D., Sajjanhar, A., Buyya, R.: A data stream prediction strategy for elastic stream computing systems. In: Broadband Communications, Networks, and Systems: 12th EAI International Conference, BROADNETS 2021, Virtual Event, October 28–29, 2021, Proceedings 12, pp. 148–162 (2022). Springer
Shi, W., Li, H., Zeng, H.: Drl-based and bsld-aware job scheduling for apache spark cluster in hybrid cloud computing environments. Journal of Grid Computing 20(4), 1–23 (2022)
Liang, Y., Zhang, C.: Resource scheduling strategy for spark in co-allocated data centers. In: International Conference on Wireless Communications, Networking and Applications, pp. 114–122 (2021). Springer
Cheng, L., Wang, Y., Cheng, F., Liu, C., Zhao, Z., Wang, Y.: A deep reinforcement learning-based preemptive approach for cost-aware cloud job scheduling. IEEE Transactions on Sustainable Computing (2023)
Cheng, F., Huang, Y., Tanpure, B., Sawalani, P., Cheng, L., Liu, C.: Cost-aware job scheduling for cloud instances using deep reinforcement learning. Cluster Computing, 1–13 (2022)
Cheng, L., Kalapgar, A., Jain, A., Wang, Y., Qin, Y., Li, Y., Liu, C.: Cost-aware real-time job scheduling for hybrid cloud using deep reinforcement learning. Neural Comput. Appl. 34(21), 18579–18593 (2022)
Article Google Scholar
Zhou, G., Tian, W., Buyya, R.: Multi-search-routes-based methods for minimizing makespan of homogeneous and heterogeneous resources in cloud computing. Futur. Gener. Comput. Syst. 141, 414–432 (2023)
Article Google Scholar
Samadi, Y., Zbakh, M., Tadonki, C.: Performance comparison between hadoop and spark frameworks using hibench benchmarks. Concurrency and Computation: Practice and Experience 30(12), 4367 (2018)
Article Google Scholar
Sagi, O., Rokach, L.: Approximating xgboost with an interpretable decision tree. Inf. Sci. 572, 522–542 (2021)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
Hongjian Li, Wei Luo, Huaqing Ye & Xiaolin Duan
China Telecom Corporation Limited Chongqing Branch, Chongqing, China
Wenbin Xie

Authors

Hongjian Li
View author publications
You can also search for this author in PubMed Google Scholar
Wei Luo
View author publications
You can also search for this author in PubMed Google Scholar
Wenbin Xie
View author publications
You can also search for this author in PubMed Google Scholar
Huaqing Ye
View author publications
You can also search for this author in PubMed Google Scholar
Xiaolin Duan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Hongjian Li: Proposed an idea, Experiment, Wrote the manuscript. Wei Luo: Proposed an idea, Experiment, Wrote the manuscript. Wenbin Xie: Helped to wrote also several sections of the manuscript, Proofreading. Huaqing Ye: Helped to wrote also several sections of the manuscript, Proofreading. Xiaolin Duan: Helped to wrote also several sections of the manuscript, Proofreading.

Corresponding author

Correspondence to Hongjian Li.

Ethics declarations

Competing interests

None. The authors declare that they have no known conflict financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethics approval and consent to participate

Not applicable

Consent for publication

Not applicable

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, H., Luo, W., Xie, W. et al. Adaptive Scheduling Framework of Streaming Applications based on Resource Demand Prediction with Hybrid Algorithms. J Grid Computing 22, 39 (2024). https://doi.org/10.1007/s10723-024-09756-4

Download citation

Received: 08 September 2023
Accepted: 21 February 2024
Published: 09 March 2024
DOI: https://doi.org/10.1007/s10723-024-09756-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive Scheduling Framework of Streaming Applications based on Resource Demand Prediction with Hybrid Algorithms

Abstract

Access this article

Similar content being viewed by others

Puma optimizer (PO): a novel metaheuristic optimization algorithm and its application in machine learning

Big data analytics on Apache Spark

Big data preprocessing: methods and prospects

Availability of data and materials

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethics approval and consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Adaptive Scheduling Framework of Streaming Applications based on Resource Demand Prediction with Hybrid Algorithms

Abstract

Access this article

Similar content being viewed by others

Puma optimizer (PO): a novel metaheuristic optimization algorithm and its application in machine learning

Big data analytics on Apache Spark

Big data preprocessing: methods and prospects

Availability of data and materials

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethics approval and consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation