Skip to main content

Deep Reinforcement Learning for Job Scheduling on Cluster

  • Conference paper
  • First Online:
Book cover Artificial Neural Networks and Machine Learning – ICANN 2021 (ICANN 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12894))

Included in the following conference series:

Abstract

Job scheduling is a key function of cluster computing. Efficient job scheduling can improve hardware resource utilization and promote the execution efficiency of jobs. Conventional scheduling work is dominated by heuristic algorithms. The scheduling efficiency of the heuristic algorithm is not optimal. In this paper, we improved the deep reinforcement learning algorithm for the cluster scheduling, which named DeepCM. Test results on the simulation data shows that the DeepCM is capable of improving the performance for job scheduling on the cluster. The slowdown could be improved from 2.248 to 2.235 in a environment of 3 machines. The fusion of internal baseline and external baseline could reduce the variations of the performance on different jobsets. The experimental results demonstrate that the deep reinforcement learning get improved scheduling efficiency in cluster computing. The performance advantage is more obvious when the load gets heavier.

Supported by National key R & D program high performance computing project (2017YFB0203501).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bao, Y., Peng, P., Wu, C.: Deep learning-based job placement in distributed machine learning clusters, pp. 505–513 (2019)

    Google Scholar 

  2. Barroso, L.A., Hlzle, U.: The datacenter as a computer: an introduction to the design of warehouse-scale machines. Synth. Lect. Comput. Archit. 8(3), (2009). https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html

  3. Ghodsi, A., et al.: Dominant resource fairness: Fair allocation of multiple resource types. In: Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation (NSDI), vol. 11, pp. 323–336 (2011)

    Google Scholar 

  4. Grandl, R., Ananthanarayanan, G., Kandula, S., Rao, S., Akella, A.: Multi-resource packing for cluster schedulers. ACM SIGCOMM Comput. Commun. Rev. 44(4), 455–466 (2014)

    Article  Google Scholar 

  5. Hadoop, A.: Hadoop fair scheduler. http://hadoop.apache.org/common/docs/stable1/fair_scheduler.html (2014)

  6. Hastings, W.K.: Monte carlo sampling methods using markov chains and their applications. Biometrika 57(1), 97–109 (1970)

    Article  MathSciNet  Google Scholar 

  7. Mao, H., Schwarzkopf, M., Venkatakrishnan, S.B., Meng, Z., Alizadeh, M.: Learning scheduling algorithms for data processing clusters. In: Proceedings of the ACM Special Interest Group on Data Communication, pp. 270–288 (2019)

    Google Scholar 

  8. Mao, H., Alizadeh, M., Menache, I., Kandula, S.: Resource management with deep reinforcement learning. In: Proceedings of the 15th ACM Workshop on Hot Topics in Networks, pp. 50–56 (2016)

    Google Scholar 

  9. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  10. Park, J.W., Tumanov, A., Jiang, A., Kozuch, M.A., Ganger, G.R.: 3sigma: distribution-based cluster scheduling for runtime uncertainty. In: Proceedings of the Thirteenth EuroSys Conference (2018)

    Google Scholar 

  11. Peng, Y., Bao, Y., Chen, Y., Wu, C., Guo, C.: Optimus: an efficient dynamic resource scheduler for deep learning clusters. In: Proceedings of the Thirteenth EuroSys Conference, pp. 1–14 (2018)

    Google Scholar 

  12. Peng, Y., et al.: Dl2: A deep learning-driven scheduler for deep learning clusters. In: arXiv preprint arXiv:1909.06040 (2019)

  13. Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)

    Article  Google Scholar 

  14. Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

  15. Sutton, R.S., Mcallester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems, pp. 1057–1063 (1999)

    Google Scholar 

  16. Wang, Y., et al.: Multi-objective workflow scheduling with deep-q-network-based multi-agent reinforcement learning. IEEE Access 7, 39974–39982 (2019)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhenjie Yao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yao, Z., Chen, L., Zhang, H. (2021). Deep Reinforcement Learning for Job Scheduling on Cluster. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12894. Springer, Cham. https://doi.org/10.1007/978-3-030-86380-7_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86380-7_50

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86379-1

  • Online ISBN: 978-3-030-86380-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics