Cost-aware real-time job scheduling for hybrid cloud using deep reinforcement learning

Cheng, Long; Kalapgar, Archana; Jain, Amogh; Wang, Yue; Qin, Yongtai; Li, Yuancheng; Liu, Cong

doi:10.1007/s00521-022-07477-x

Cost-aware real-time job scheduling for hybrid cloud using deep reinforcement learning

Original Article
Published: 19 June 2022

Volume 34, pages 18579–18593, (2022)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Long Cheng ORCID: orcid.org/0000-0003-1638-059X^1,2,
Archana Kalapgar³,
Amogh Jain³,
Yue Wang¹,
Yongtai Qin¹,
Yuancheng Li¹ &
…
Cong Liu⁴

896 Accesses
22 Citations
1 Altmetric
Explore all metrics

Abstract

Hybrid cloud computing enables enterprises to get the best of both private and public cloud models. One of its primary benefits is to reduce operational costs, and the prerequisite is that jobs should be executed in an effective way in the hybrid environment. Although many job scheduling methods have been proposed for cloud in the past decade, most of them focus on handling batch jobs rather than real-time ones. Moreover, few of them have ever considered real-time jobs in hybrid cloud. Inspired by the recent success of using deep reinforcement learning (DRL) for solving complex optimization problems, in this paper, we propose a DRL-based approach for scheduling real-time jobs in hybrid cloud, with a focus on optimizing monetary cost for job executions while ensuring that high quality of service and low responsible time can be also achieved. Specifically, our method can learn to make appropriate decisions in selecting suitable virtual machines for incoming jobs in real-time over hybrid cloud, with the scheduling agent getting trained through rewards in its learning experiences. We give the detailed design of our approach, and our experimental results demonstrate that our method is more cost-efficient, compared to the current approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cost-aware job scheduling for cloud instances using deep reinforcement learning

Article 16 October 2021

Deep Reinforcement Learning for Multi-resource Cloud Job Scheduling

An intelligent scheduling algorithm for resource management of cloud platform

Article 15 August 2018

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Abed-Alguni B, Ottom MA (2018) Double delayed q-learning. Int J Artif Intell 16(2):41–59
Google Scholar
Abed-alguni BH (2018) Action-selection method for reinforcement learning based on cuckoo search algorithm. Arab J Sci Eng 43(12):6771–6785
Article Google Scholar
Abed-Alguni BH, Alawad NA (2021) Distributed grey wolf optimizer for scheduling of workflow applications in cloud environments. Appl Soft Comput 102(18):107113
Article Google Scholar
Abed-Alguni BH, Alawad NA, Barhoush M, Hammad R (2021) Exploratory cuckoo search for solving single-objective optimization problems. Soft Comput. https://doi.org/10.1007/s00500-021-05939-3
Article Google Scholar
Abundo M, Di Valerio V, Cardellini V, Presti FL (2015) Qos-aware bidding strategies for vm spot instances: a reinforcement learning approach applied to periodic long running jobs. In: 2015 IFIP/IEEE International symposium on integrated network management, pp. 53–61
Alawad NA, Abed-Alguni B (2021) Discrete island-based cuckoo search with highly disruptive polynomial mutation and opposition-based learning strategy for scheduling of workflow applications in cloud environments. Arab J Sci Eng 46:3213
Article Google Scholar
Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA (2017) Deep reinforcement learning: a brief survey. IEEE Signal Process Mag 34(6):26–38
Article Google Scholar
Chen J, Wang C, Zhou B.B, Sun L, Lee Y.C, Zomaya AY (2011) Tradeoffs between profit and customer satisfaction for service provisioning in the cloud. In: Proceedings of the 20th international symposium on high performance distributed computing, pp. 229–238
Chen X, Cheng L, Liu C, Liu Q, Liu J, Mao Y, Murphy J (2020) A woa-based optimization approach for task scheduling in cloud computing systems. IEEE Syst J 14(3):3117–3128
Article Google Scholar
Cheng F, Huang Y, Tanpure B, Sawalani P, Cheng L, Liu C (2021) Cost-aware job scheduling for cloud instances using deep reinforcement learning. Clust Comput 25:619
Article Google Scholar
Chopra N, Singh S (2014) Survey on scheduling in hybrid clouds. In: International conference on computing, pp. 1–6
Deelman E (2010) Grids and clouds: making workflow applications work in heterogeneous distributed environments. Int J High Perform Comput Appl 24(3):284–298
Article Google Scholar
Fu Y, Zhang S, Terrero J, Mao Y, Liu G, Li S, Tao D (2019) Progress-based container scheduling for short-lived applications in a kubernetes cluster. In: 2019 IEEE international conference on big data, pp. 278–287
Ghahramani MH, Zhou MC, Chi TH (2017) Toward cloud computing qos architecture:analysis of cloud systems and cloud services. IEEE/CAA J Autom Sin 4(001):6–18
Article MathSciNet Google Scholar
He S, Zhang M, Fang H, Liu F, Luan X, Ding Z (2019) Reinforcement learning and adaptive optimization of a class of markov jump systems with completely unknown dynamic information. Neural Comput Appl 32:14311
Article Google Scholar
Huang Y, Cheng L, Xue L, Liu C, Li Y, Li J, Ward T (2021) Deep adversarial imitation reinforcement learning for QoS-aware cloud job scheduling. IEEE Syst J. https://doi.org/10.1109/JSYST.2021.3122126
Article Google Scholar
Jiang L, Huang H, Ding Z (2019) Path planning for intelligent robots based on deep q-learning with experience replay and heuristic knowledge. IEEE/CAA J Autom Sin 7(4):1179–1189
Article MathSciNet Google Scholar
Kim H, El-Khamra Y, Rodero I, Jha S, Parashar M (2011) Autonomic management of application workflows on hybrid computing infrastructure. Sci Prog 19(2–3):75–89
Google Scholar
Li Z, Ren A, Li J, Qiu Q, Yuan B, Draper J, Wang Y (2017) Structural design optimization for deep convolutional neural networks using stochastic computing. In: Design, Automation & Test in Europe Conference & Exhibition, 2017, pp. 250–253
Liu C, Zhu F, Liu Q, Fu Y (2021) Hierarchical reinforcement learning with automatic sub-goal identification. IEEE/CAA J Autom Sin 8(10):1686–1696
Article Google Scholar
Liu CL, Chang CC, Tseng CJ (2020) Actor-critic deep reinforcement learning for solving job shop scheduling problems. IEEE Access 8:71752–71762
Article Google Scholar
Liu J, Cheng L (2021) SwiftS: A dependency-aware and resource efficient scheduling for high throughput in clouds. In: IEEE INFOCOM 2021-IEEE conference on computer communications
Liu N, Li Z, Xu J, Xu Z, Lin S, Qiu Q, Tang J, Wang Y (2017) A hierarchical framework of cloud resource allocation and power management using deep reinforcement learning. In: IEEE 37th international conference on distributed computing systems, pp. 372–382
Liu Q, Cheng L, Jia AL, Liu C (2021) Deep reinforcement learning for communication flow control in wireless mesh networks. IEEE Netw 35(2):112–119
Article Google Scholar
Liu Q, Cheng L, Ozcelebi T, Murphy J, Lukkien J (2017) Deep reinforcement learning for IoT network dynamic clustering in edge computing. In: Proc. 19th IEEE/ACM international symposium on cluster, cloud and grid computing, pp. 600–603 x
Liu Q, Xia T, Cheng L, Van Eijk M, Ozcelebi T, Mao Y (2022) Deep reinforcement learning for load-balancing aware network control in IoT edge systems. IEEE Trans Parallel Distrib Syst 33(6):1491–1502
Article Google Scholar
Malawski M, Figiela K, Nabrzyski J (2013) Cost minimization for computational applications on hybrid cloud infrastructures. Futur Gener Comput Syst 29(7):1786–1794
Article Google Scholar
Malawski M, Gubała T, Bubak M (2012) Component-based approach for programming and running scientific applications on grids and clouds. Int J High Perform Comput Appl 26(3):275–295
Article Google Scholar
Malawski M, Meizner J, Bubak M, Gepner P (2011) Component approach to computational applications on clouds. Procedia Comput Sci 4:432–441
Article Google Scholar
Mizan T, Al Masud S.M.R, Latip R (2012) Modified bees life algorithm for job scheduling in hybrid cloud
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. Comput Sci
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Article Google Scholar
Morales E.F, Zaragoza J.H (2011) An introduction to reinforcement learning. Decision Theory Models Appl Artif Intell Concepts Solut
Pandey S, Barker A, Gupta K.K, Buyya R (2010) Minimizing execution costs when using globally distributed cloud services. In: 2010 24th IEEE international conference on advanced information networking and applications, pp. 222–229. IEEE
Singh L, Singh S (2013) A survey of workflow scheduling algorithms and research issues. Int J Comput Appl 74(15):21
Google Scholar
Singh S, Chana I (2016) A survey on resource scheduling in cloud computing: Issues and challenges. J Grid Comput 14(2):217–264
Article Google Scholar
Tu Y, Fang H, Yin Y, He S (2021) Reinforcement learning-based nonlinear tracking control system design via ldi approach with application to trolley system. Neural Comput Appl 34:5055
Article Google Scholar
Watkins C, Dayan P (1992) Q-learning. Mach Learn 8(3–4):279–292
Article Google Scholar
Wei Y, Pan L, Liu S, Wu L, Meng X (2018) Drl-scheduling: an intelligent qos-aware job scheduling framework for applications in clouds. IEEE Access 6:55112–55125
Article Google Scholar
Yuan H, Bi J, Tan W, Zhou M, Li BH, Li J (2016) TTSA: an effective scheduling approach for delay bounded tasks in hybrid clouds. IEEE Trans Cybern 47(11):3658–3668
Article Google Scholar
Yuan H, Bi J, Zhou M (2019) Multiqueue scheduling of heterogeneous tasks with bounded response time in hybrid green iaas clouds. IEEE Trans Ind Inform 15:5404–5412
Article Google Scholar
Yuan H, Jing B, Zhou MC (2018) Temporal task scheduling of multiple delay-constrained applications in green hybrid cloud. IEEE Trans Serv Comput. https://doi.org/10.1109/TSC.2018.2878561
Article Google Scholar
Yuan H, Zhou M, Liu Q, Abusorrah A (2020) Fine-grained resource provisioning and task scheduling for heterogeneous applications in distributed green clouds. IEEE/CAA J Autom Sin 7(5):1380–1393
Google Scholar
Zhang Z, Liu H, Zhou M, Wang J (2021) Solving dynamic traveling salesman problems with deep reinforcement learning. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2021.3105905
Article Google Scholar
Zheng W, Song Y, Guo Z, Cui Y, Gu S, Mao Y, Cheng L (2019) Target-based resource allocation for deep learning applications in a multi-tenancy system. In: Proc. 2019 IEEE High performance extreme computing conference, pp. 1–7
Zheng W, Tynes M, Gorelick H, Mao Y, Cheng L, Hou Y (2019) Flowcon: elastic flow configuration for containerized deep learning applications. In: Proc. 48th International conference on parallel processing, pp. 1–10
Zhu QH, Tang H, Huang JJ, Hou Y (2021) Task scheduling for multi-cloud computing subject to security and reliability constraints. IEEE/CAA J Autom Sin 8(4):848–865
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work was supported by the Fundamental Research Funds for the Central Universities (2021MS017), the National Natural Science Foundation of China Under Grant 61902222, the Taishan Scholars Program of Shandong Province under Grant tsqn201909109.

Author information

Authors and Affiliations

North China Electric Power University in Beijing, Beijing, China
Long Cheng, Yue Wang, Yongtai Qin & Yuancheng Li
Insight SFI Research Centre for Data Analytics in Dublin, Dublin, Ireland
Long Cheng
School of Computing, Dublin City University, Dublin, Ireland
Archana Kalapgar & Amogh Jain
School of Computer Science and Technology, Shandong University of Technology, Zibo, China
Cong Liu

Authors

Long Cheng
View author publications
You can also search for this author inPubMed Google Scholar
Archana Kalapgar
View author publications
You can also search for this author inPubMed Google Scholar
Amogh Jain
View author publications
You can also search for this author inPubMed Google Scholar
Yue Wang
View author publications
You can also search for this author inPubMed Google Scholar
Yongtai Qin
View author publications
You can also search for this author inPubMed Google Scholar
Yuancheng Li
View author publications
You can also search for this author inPubMed Google Scholar
Cong Liu
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Long Cheng.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cheng, L., Kalapgar, A., Jain, A. et al. Cost-aware real-time job scheduling for hybrid cloud using deep reinforcement learning. Neural Comput & Applic 34, 18579–18593 (2022). https://doi.org/10.1007/s00521-022-07477-x

Download citation

Received: 20 June 2021
Accepted: 26 May 2022
Published: 19 June 2022
Issue Date: November 2022
DOI: https://doi.org/10.1007/s00521-022-07477-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cost-aware real-time job scheduling for hybrid cloud using deep reinforcement learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Cost-aware job scheduling for cloud instances using deep reinforcement learning

Deep Reinforcement Learning for Multi-resource Cloud Job Scheduling

An intelligent scheduling algorithm for resource management of cloud platform

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now