Skip to main content
Log in

Adaptive Scheduling Framework of Streaming Applications based on Resource Demand Prediction with Hybrid Algorithms

  • Research
  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

Spark Streaming is currently one of the mainstream stream processing frameworks which process real-time stream data by using micro-batch approach. However, there are some issues with its default task scheduling process, such as the high cost of cluster usage due to inappropriate executor placement strategy in heterogeneous cluster environments. Meanwhile, most of the current scheduling studies focus on improving the processing performance of the clusters, while ignoring the cost efficiency and service quality assurance of the clusters. In this paper, we propose a low-cost executor placement method based on resource demand prediction using machine learning under heterogeneous clusters, which is called Cost-Efficient and Best-Fit Decrease (CEBFD). First, a cost-efficient model is constructed for the Spark Streaming framework, then the Sparrow Search Algorithm (SSA) and eXtreme Gradient Boosting (XGboost) algorithm are combined to predict the resources required by streaming tasks, and finally the executor placement method for the heterogeneous Spark Streaming clusters is designed based on the cost-efficient model and resource demand prediction. Furthermore, the proposed method also improves the Service Level Agreement (SLA) of cost minimization and job deadline guarantee for streaming processing. Experimental results show that the proposed approach reduces the cluster usage cost by 6.89% to 52.24% and effectively optimizes SLA compared to existing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Availability of data and materials

The datasets generated during the current study are available from the corresponding author on reasonable request.

References

  1. Sunyaev, A., Sunyaev, A.: Cloud computing. Internet Computing: Principles of Distributed Systems and Emerging Internet-Based Technologies, 195–236 (2020)

  2. Kalia, K., Gupta, N.: Analysis of hadoop mapreduce scheduling in heterogeneous environment. Ain Shams Engineering Journal 12(1), 1101–1110 (2021)

    Article  Google Scholar 

  3. Hu, Z.-Y., Zhang, Z.-H., Cheng, X.-W., Wang, F.-C., Zhang, Y.-F., Li, S.-L.: A review of multi-physical fields induced phenomena and effects in spark plasma sintering: fundamentals and applications. Materials & Design 191, 108662 (2020)

  4. HoseinyFarahabady, M.R., Jannesari, A., Taheri, J., Bao, W., Zomaya, A.Y., Tari, Z.: Q-flink: a qos-aware controller for apache flink. In: 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), pp. 629–638 (2020). IEEE

  5. Liu, X., Buyya, R.: Resource management and scheduling in distributed stream processing systems: a taxonomy, review, and future directions. ACM Computing Surveys (CSUR) 53(3), 1–41 (2020)

    Article  CAS  Google Scholar 

  6. Ma, H., Tang, W., Zhu, H., Zhang, H.: Resource utilization-aware collaborative optimization of iaas cloud service composition for data-intensive applications. IEEE Transactions on Systems, Man, and Cybernetics: Systems 51(2), 1322–1333 (2019)

    Article  Google Scholar 

  7. Weinman, J.: Hybrid cloud economics. IEEE Cloud. Computing 3(1), 18–22 (2016)

    Google Scholar 

  8. Jain, T., Hazra, J.: Hybrid cloud computing investment strategies. Prod. Oper. Manag. 28(5), 1272–1284 (2019)

    Article  Google Scholar 

  9. Thai, L., Varghese, B., Barker, A.: A survey and taxonomy of resource optimisation for executing bag-of-task applications on public clouds. Futur. Gener. Comput. Syst. 82, 1–11 (2018)

    Article  Google Scholar 

  10. Matteussi, K.J., Dos Anjos, J.C., Leithardt, V.R., Geyer, C.F.: Performance evaluation analysis of spark streaming backpressure for data-intensive pipelines. Sensors 22(13), 4756 (2022)

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  11. Cheng, D., Chen, Y., Zhou, X., Gmach, D., Milojicic, D.: Adaptive scheduling of parallel jobs in spark streaming. In: IEEE INFOCOM 2017-IEEE Conference on Computer Communications, pp. 1–9 (2017). IEEE

  12. Cheng, D., Zhou, X., Wang, Y., Jiang, C.: Adaptive scheduling parallel jobs with dynamic batching in spark streaming. IEEE Trans. Parallel Distrib. Syst. 29(12), 2672–2685 (2018)

    Article  Google Scholar 

  13. Khan, A.A., Zakarya, M.: Energy, performance and cost efficient cloud datacentres: a survey. Computer Science Review 40, 100390 (2021)

  14. Kumar, H., Soh, P.J., Ismail, M.A.: Big data streaming platforms: a review. Iraqi Journal for Computer Science and Mathematics 3(2), 95–100 (2022)

    Article  Google Scholar 

  15. Liu, X., Buyya, R.: Performance-oriented deployment of streaming applications on cloud. IEEE Transactions on Big Data 5(1), 46–59 (2017)

    Article  Google Scholar 

  16. Liu, S., Weng, J., Wang, J.H., An, C., Zhou, Y., Wang, J.: An adaptive online scheme for scheduling and resource enforcement in storm. IEEE/ACM Trans. Networking 27(4), 1373–1386 (2019)

    Article  Google Scholar 

  17. Quan, Z., Wang, Z.-J., Ye, T., Guo, S.: Task scheduling for energy consumption constrained parallel applications on heterogeneous computing systems. IEEE Trans. Parallel Distrib. Syst. 31(5), 1165–1182 (2019)

    Article  Google Scholar 

  18. Hu, Z., Li, B., Qin, Z., Goh, R.S.M.: Low latency big data processing without prior information. IEEE Transactions on Cloud Computing 9(4), 1521–1534 (2019)

  19. Rjoub, G., Bentahar, J., Wahab, O.A.: Bigtrustscheduling: trust-aware big data task scheduling approach in cloud computing environments. Futur. Gener. Comput. Syst. 110, 1079–1097 (2020)

  20. Morisawa, Y., Suzuki, M., Kitahara, T.: Flexible executor allocation without latency increase for stream processing in apache spark. In: 2020 IEEE International Conference on Big Data (Big Data), pp. 2198–2206 (2020). IEEE

  21. Ali, H., Tariq, U.U., Zheng, Y., Zhai, X., Liu, L.: Contention & energy-aware real-time task mapping on noc based heterogeneous mpsocs. IEEE Access 6, 75110–75123 (2018)

    Article  Google Scholar 

  22. Yang, C.-T., Chen, S.-T., Liu, J.-C., Chan, Y.-W., Chen, C.-C., Verma, V.K.: An energy-efficient cloud system with novel dynamic resource allocation methods. J. Supercomput. 75, 4408–4429 (2019)

    Article  Google Scholar 

  23. Liu, L., Xu, H.: Elasecutor: Elastic executor scheduling in data analytics systems. IEEE/ACM Trans. Networking 29(2), 681–694 (2021)

    Article  MathSciNet  Google Scholar 

  24. Li, H., Xia, J., Luo, W., Fang, H.: Cost-efficient scheduling of streaming applications in apache flink on cloud. IEEE Transactions on Big Data (2022)

  25. Li, H., Dai, H., Liu, Z., Fu, H., Zou, Y.: Dynamic energy-efficient scheduling for streaming applications in storm. Computing 104(2), 413–432 (2022)

    Article  Google Scholar 

  26. Tariq, U.U., Ali, H., Liu, L., Panneerselvam, J., Zhai, X.: Energy-efficient static task scheduling on vfi-based noc-hmpsocs for intelligent edge devices in cyber-physical systems. ACM Transactions on Intelligent Systems and Technology (TIST) 10(6), 1–22 (2019)

    Article  Google Scholar 

  27. Chen, R., Chen, X., Yang, C.: Using a task dependency job-scheduling method to make energy savings in a cloud computing environment. J. Supercomput. 78(3), 4550–4573 (2022)

    Article  MathSciNet  Google Scholar 

  28. Li, H., Zhu, L., Wang, S., Wang, L.: Cost-aware scheduling and data skew alleviation for big data processing in heterogeneous cloud environment. Journal of Grid Computing 21(3), 33 (2023)

    Article  Google Scholar 

  29. Mangalampalli, S., Swain, S.K., Mangalampalli, V.K.: Multi objective task scheduling in cloud computing using cat swarm optimization algorithm. Arab. J. Sci. Eng. 47(2), 1821–1830 (2022)

    Article  Google Scholar 

  30. Kakkottakath Valappil Thekkepuryil, J., Suseelan, D.P., Keerikkattil, P.M.: An effective meta-heuristic based multi-objective hybrid optimization method for workflow scheduling in cloud computing environment. Cluster Computing 24, 2367–2384 (2021)

  31. Islam, M.T., Wu, H., Karunasekera, S., Buyya, R.: Sla-based scheduling of spark jobs in hybrid cloud computing environments. IEEE Trans. Comput. 71(5), 1117–1132 (2021)

    Article  Google Scholar 

  32. Islam, M.T., Karunasekera, S., Buyya, R.: Performance and cost-efficient spark job scheduling based on deep reinforcement learning in cloud computing environments. IEEE Trans. Parallel Distrib. Syst. 33(7), 1695–1710 (2021)

    Article  Google Scholar 

  33. Li, H., Wang, H., Fang, S., Zou, Y., Tian, W.: An energy-aware scheduling algorithm for big data applications in spark. Clust. Comput. 23, 593–609 (2020)

    Article  Google Scholar 

  34. Islam, M.T., Wu, H., Karunasekera, S., Buyya, R.: Sla-based scheduling of spark jobs in hybrid cloud computing environments. IEEE Trans. Comput. 71(5), 1117–1132 (2021)

    Article  Google Scholar 

  35. Shabestari, F., Rahmani, A.M., Navimipour, N.J., Jabbehdari, S.: A yarn-based energy-aware scheduling method for big data applications under deadline constraints. Journal of Grid Computing 20(4), 38 (2022)

    Article  Google Scholar 

  36. Li, J., Zhang, R., Zheng, Y.: Qos-aware and multi-objective virtual machine dynamic scheduling for big data centers in clouds. Soft. Comput. 26(19), 10239–10252 (2022)

    Article  Google Scholar 

  37. Kang, Y., Pan, L., Liu, S.: An online algorithm for scheduling big data analysis jobs in cloud environments. Knowl.-Based Syst. 245, 108628 (2022)

    Article  Google Scholar 

  38. Cheng, M., Li, J., Nazarian, S.: Drl-cloud: Deep reinforcement learning-based resource provisioning and task scheduling for cloud service providers. In: 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 129–134 (2018). IEEE

  39. Zhang, H., Sun, D., Sajjanhar, A., Buyya, R.: A data stream prediction strategy for elastic stream computing systems. In: Broadband Communications, Networks, and Systems: 12th EAI International Conference, BROADNETS 2021, Virtual Event, October 28–29, 2021, Proceedings 12, pp. 148–162 (2022). Springer

  40. Shi, W., Li, H., Zeng, H.: Drl-based and bsld-aware job scheduling for apache spark cluster in hybrid cloud computing environments. Journal of Grid Computing 20(4), 1–23 (2022)

  41. Liang, Y., Zhang, C.: Resource scheduling strategy for spark in co-allocated data centers. In: International Conference on Wireless Communications, Networking and Applications, pp. 114–122 (2021). Springer

  42. Cheng, L., Wang, Y., Cheng, F., Liu, C., Zhao, Z., Wang, Y.: A deep reinforcement learning-based preemptive approach for cost-aware cloud job scheduling. IEEE Transactions on Sustainable Computing (2023)

  43. Cheng, F., Huang, Y., Tanpure, B., Sawalani, P., Cheng, L., Liu, C.: Cost-aware job scheduling for cloud instances using deep reinforcement learning. Cluster Computing, 1–13 (2022)

  44. Cheng, L., Kalapgar, A., Jain, A., Wang, Y., Qin, Y., Li, Y., Liu, C.: Cost-aware real-time job scheduling for hybrid cloud using deep reinforcement learning. Neural Comput. Appl. 34(21), 18579–18593 (2022)

    Article  Google Scholar 

  45. Zhou, G., Tian, W., Buyya, R.: Multi-search-routes-based methods for minimizing makespan of homogeneous and heterogeneous resources in cloud computing. Futur. Gener. Comput. Syst. 141, 414–432 (2023)

    Article  Google Scholar 

  46. Samadi, Y., Zbakh, M., Tadonki, C.: Performance comparison between hadoop and spark frameworks using hibench benchmarks. Concurrency and Computation: Practice and Experience 30(12), 4367 (2018)

    Article  Google Scholar 

  47. Sagi, O., Rokach, L.: Approximating xgboost with an interpretable decision tree. Inf. Sci. 572, 522–542 (2021)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

Hongjian Li: Proposed an idea, Experiment, Wrote the manuscript. Wei Luo: Proposed an idea, Experiment, Wrote the manuscript. Wenbin Xie: Helped to wrote also several sections of the manuscript, Proofreading. Huaqing Ye: Helped to wrote also several sections of the manuscript, Proofreading. Xiaolin Duan: Helped to wrote also several sections of the manuscript, Proofreading.

Corresponding author

Correspondence to Hongjian Li.

Ethics declarations

Competing interests

None. The authors declare that they have no known conflict financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethics approval and consent to participate

Not applicable

Consent for publication

Not applicable

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, H., Luo, W., Xie, W. et al. Adaptive Scheduling Framework of Streaming Applications based on Resource Demand Prediction with Hybrid Algorithms. J Grid Computing 22, 39 (2024). https://doi.org/10.1007/s10723-024-09756-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10723-024-09756-4

Keywords

Navigation