research-article

Resource Management with Deep Reinforcement Learning

Authors:

Mohammad Alizadeh,

Srikanth KandulaAuthors Info & Claims

HotNets '16: Proceedings of the 15th ACM Workshop on Hot Topics in Networks

Pages 50 - 56

https://doi.org/10.1145/3005745.3005750

Published: 09 November 2016 Publication History

Abstract

Resource management problems in systems and networking often manifest as difficult online decision making tasks where appropriate solutions depend on understanding the workload and environment. Inspired by recent advances in deep reinforcement learning for AI problems, we consider building systems that learn to manage resources directly from experience. We present DeepRM, an example solution that translates the problem of packing tasks with multiple resource demands into a learning problem. Our initial results show that DeepRM performs comparably to state-of-the-art heuristics, adapts to different conditions, converges quickly, and learns strategies that are sensible in hindsight.

References

[1]

Terminator, http://www.imdb.com/title/tt0088247/.

[2]

M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al. Tensorflow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow. org, 2015.

[3]

P. Abbeel, A. Coates, M. Quigley, and A. Y. Ng. An application of reinforcement learning to aerobatic helicopter flight. Advances in neural information processing systems, page 1, 2007.

Digital Library

[4]

S. Agarwal, S. Kandula, N. Bruno, M.-C. Wu, I. Stoica, and J. Zhou. Reoptimizing data parallel computing. In NSDI, pages 281-294, San Jose, CA, 2012. USENIX.

Digital Library

[5]

G. Ananthanarayanan, S. Kandula, A. G. Greenberg, I. Stoica, Y. Lu, B. Saha, and E. Harris. Reining in the outliers in map-reduce clusters using mantri. In OSDI, number 1, page 24, 2010.

Digital Library

[6]

M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, et al. A view of cloud computing. Communications of the ACM, (4), 2010.

Digital Library

[7]

D. P. Bertsekas and J. N. Tsitsiklis. Neuro-dynamic programming: an overview. In Decision and Control, IEEE, 1995.

[8]

J. A. Boyan and M. L. Littman. Packet routing in dynamically changing networks: A reinforcement learning approach. Advances in neural information processing systems, 1994.

Digital Library

[9]

A. R. Cassandra and L. P. Kaelbling. Learning policies for partially observable environments: Scaling up. In Machine Learning Proceedings 1995, page 362. Morgan Kaufmann, 2016.

[10]

T. Chilimbi, Y. Suzue, J. Apacible, and K. Kalyanaraman. Project adam: Building an efficient and scalable deep learning training system. In OSDI, pages 571-582, Broomfield, CO, Oct. 2014. USENIX Association.

Digital Library

[11]

J. Dean and L. A. Barroso. The tail at scale. Communications of the ACM, pages 74-80, 2013.

Digital Library

[12]

C. Delimitrou and C. Kozyrakis. Quasar: Resource-efficient and qos-aware cluster management. ASPLOS '14, pages 127-144, New York, NY, USA, 2014. ACM.

Digital Library

[13]

M. Dong, Q. Li, D. Zarchy, P. B. Godfrey, and M. Schapira. Pcc: Re-architecting congestion control for consistent high performance. In NSDI, pages 395-408, Oakland, CA, May 2015. USENIX Association.

Digital Library

[14]

A. D. Ferguson, P. Bodik, S. Kandula, E. Boutin, and R. Fonseca. Jockey: guaranteed job latency in data parallel clusters. In Proceedings of the 7th ACM european conference on Computer Systems. ACM, 2012.

Digital Library

[15]

J. Gao and R. Evans. Deepmind ai reduces google data centre cooling bill by 40%. https://deepmind.com/blog/deepmind-ai-reduces-google-data-centre-cooling-bill-40/.

[16]

A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, and I. Stoica. Dominant resource fairness: Fair allocation of multiple resource types. NSDI'11, pages 323-336, Berkeley, CA, USA, 2011. USENIX Association.

Digital Library

[17]

R. Grandl, G. Ananthanarayanan, S. Kandula, S. Rao, and A. Akella. Multi-resource packing for cluster schedulers. SIGCOMM '14, pages 455-466, New York, NY, USA, 2014. ACM.

Digital Library

[18]

M. T. Hagan, H. B. Demuth, M. H. Beale, and O. De Jesús. Neural network design. PWS publishing company Boston, 1996.

Digital Library

[19]

W. K. Hastings. Monte carlo sampling methods using markov chains and their applications. Biometrika, (1), 1970.

[20]

B. Heller, S. Seetharaman, P. Mahadevan, Y. Yiakoumis, P. Sharma, S. Banerjee, and N. McKeown. Elastictree: Saving energy in data center networks. NSDI'10, Berkeley, CA, USA, 2010. USENIX Association.

Digital Library

[21]

G. Hinton. Overview of mini-batch gradient descent. Neural Networks for Machine Learning.

[22]

M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg. Quincy: fair scheduling for distributed computing clusters. In ACM SIGOPS, 2009.

Digital Library

[23]

J. Junchen, D. Rajdeep, A. Ganesh, C. Philip, P. Venkata, S. Vyas, D. Esbjorn, G. Marcin, K. Dalibor, V. Renat, and Z. Hui. A control-theoretic approach for dynamic adaptive video streaming over http. SIGCOMM '15, New York, NY, USA, 2015. ACM.

Digital Library

[24]

L. P. Kaelbling, M. L. Littman, and A. W. Moore. Reinforcement learning: A survey. Journal of artificial intelligence research, 1996.

Digital Library

[25]

J. Kober, J. A. Bagnell, and J. Peters. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 2013.

Digital Library

[26]

S. Mahadevan and G. Theocharous. Optimizing production manufacturing using reinforcement learning. In FLAIRS Conference, 1998.

Digital Library

[27]

I. Menache, S. Mannor, and N. Shimkin. Basis function adaptation in temporal difference reinforcement learning. Annals of Operations Research, (1), 2005.

[28]

V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu. Asynchronous methods for deep reinforcement learning. CoRR, 2016.

[29]

V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. A. Riedmiller. Playing atari with deep reinforcement learning. CoRR, 2013.

[30]

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. H. I. Antonoglou, D. Wierstra, and M. A. Riedmiller. Human-level control through deep reinforcement learning. Nature, 2015.

[31]

G. E. Monahan. State of the art - a survey of partially observable markov decision processes: theory, models, and algorithms. Management Science, (1), 1982.

[32]

J. Schulman, S. Levine, P. Moritz, M. I. Jordan, and P. Abbeel. Trust region policy optimization. CoRR, abs/1502.05477, 2015.

[33]

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershevlvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis. Mastering the game of go with deep neural networks and tree search. Nature, 2016.

[34]

R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.

Digital Library

[35]

R. S. Sutton, D. A. McAllester, S. P. Singh, Y. Mansour, et al. Policy gradient methods for reinforcement learning with function approximation. In NIPS, 1999.

Digital Library

[36]

V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino, O. O'Malley, S. Radia, B. Reed, and E. Baldeschwieler. Apache hadoop yarn: Yet another resource negotiator. SOCC '13, pages 5:1-5:16, New York, NY, USA, 2013. ACM.

Digital Library

[37]

K. Winstein and H. Balakrishnan. TCP Ex Machina: Computer-generated Congestion Control. In SIGCOMM, 2013.

Digital Library

[38]

K. Winstein, A. Sivaraman, and H. Balakrishnan. Stochastic forecasts achieve high throughput and low delay over cellular networks. In NSDI, pages 459-471, Lombard, IL, 2013. USENIX.

Digital Library

[39]

S. Yi, Y. Xiaoqi, J. Junchen, S. Vyas, L. Fuyuan, W. Nanshu, L. Tao, and B. Sinopoli. Cs2p: Improving video bitrate selection and adaptation with data-driven throughput prediction. SIGCOMM, New York, NY, USA, 2016. ACM.

Digital Library

[40]

X. Yin, A. Jindal, V. Sekar, and B. Sinopoli. Via: Improving internet telephony call quality using predictive relay selection. In SIGCOMM, SIGCOMM '16, 2016.

[41]

M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In EuroSys, 2010.

Digital Library

[42]

W. Zhang and T. G. Dietterich. A reinforcement learning approach to job-shop scheduling. In IJCAI. Citeseer, 1995.

Digital Library

Cited By

Kang LHe MZhou JHou YXu BLiu H(2025)A Method for the Predictive Maintenance Resource Scheduling of Aircraft Based on Heterogeneous HypergraphsElectronics10.3390/electronics1404078214:4(782)Online publication date: 17-Feb-2025
https://doi.org/10.3390/electronics14040782
Terven J(2025)Deep Reinforcement Learning: A Chronological Overview and MethodsAI10.3390/ai60300466:3(46)Online publication date: 24-Feb-2025
https://doi.org/10.3390/ai6030046
Reidys BZardoshti PGoiri ÍIrvene CBerger DMa HArya KCortez EStark TBak EIyigun MNovakovic SHsu LTrueba KPan ABansal CRajmohan SHuang JBianchini REeckhout LSmaragdakis GLiang KSampson AKim MRossbach C(2025)Coach: Exploiting Temporal Patterns for All-Resource Oversubscription in Cloud PlatformsProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707226(164-181)Online publication date: 30-Mar-2025
https://dl.acm.org/doi/10.1145/3669940.3707226
Show More Cited By

Recommendations

DeepJS: Job Scheduling Based on Deep Reinforcement Learning in Cloud Data Center
ICBDC '19: Proceedings of the 4th International Conference on Big Data and Computing

Job scheduling is a key building block of a cloud data center. Hand-crafted heuristics cannot automatically adapt to the change of the environment and optimize for specific workloads. We present the DeepJS, a job scheduling algorithm based on deep ...
Neural Adaptive Video Streaming with Pensieve
SIGCOMM '17: Proceedings of the Conference of the ACM Special Interest Group on Data Communication

Client-side video players employ adaptive bitrate (ABR) algorithms to optimize user quality of experience (QoE). Despite the abundance of recently proposed schemes, state-of-the-art ABR algorithms suffer from a key limitation: they use fixed control ...
Multi-resource packing for cluster schedulers
SIGCOMM'14

Tasks in modern data parallel clusters have highly diverse resource requirements, along CPU, memory, disk and network. Any of these resources may become bottlenecks and hence, the likelihood of wasting resources due to fragmentation is now larger. Today'...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

HotNets '16: Proceedings of the 15th ACM Workshop on Hot Topics in Networks

November 2016

217 pages

ISBN:9781450346610

DOI:10.1145/3005745

General Chair:
Ellen Zegura
Georgia Tech
,
Program Chairs:
Bryan Ford
EPFL
,
Alex C. Snoeren
UC San Diego

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCOMM: ACM Special Interest Group on Data Communication

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 November 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

HotNets-XV

Sponsor:

SIGCOMM

HotNets-XV: The 15th ACM Workshop on Hot Topics in Networks

November 9 - 10, 2016

GA, Atlanta, USA

Acceptance Rates

HotNets '16 Paper Acceptance Rate 30 of 108 submissions, 28%;

Overall Acceptance Rate 110 of 460 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

762
Total Citations
View Citations
7,899
Total Downloads

Downloads (Last 12 months)653
Downloads (Last 6 weeks)57

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kang LHe MZhou JHou YXu BLiu H(2025)A Method for the Predictive Maintenance Resource Scheduling of Aircraft Based on Heterogeneous HypergraphsElectronics10.3390/electronics1404078214:4(782)Online publication date: 17-Feb-2025
https://doi.org/10.3390/electronics14040782
Terven J(2025)Deep Reinforcement Learning: A Chronological Overview and MethodsAI10.3390/ai60300466:3(46)Online publication date: 24-Feb-2025
https://doi.org/10.3390/ai6030046
Reidys BZardoshti PGoiri ÍIrvene CBerger DMa HArya KCortez EStark TBak EIyigun MNovakovic SHsu LTrueba KPan ABansal CRajmohan SHuang JBianchini REeckhout LSmaragdakis GLiang KSampson AKim MRossbach C(2025)Coach: Exploiting Temporal Patterns for All-Resource Oversubscription in Cloud PlatformsProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707226(164-181)Online publication date: 30-Mar-2025
https://dl.acm.org/doi/10.1145/3669940.3707226
Ko HChoi HPack S(2025)Restoration-Aware Sleep Scheduling Framework in Energy Harvesting Internet of Things: A Deep Reinforcement Learning ApproachIEEE Transactions on Sustainable Computing10.1109/TSUSC.2024.344291810:1(190-198)Online publication date: Jan-2025
https://doi.org/10.1109/TSUSC.2024.3442918
Sun YDing ZYan YWang ZDehghanian PLee W(2025)Privacy-Preserving Energy Sharing Among Cloud Service Providers via Collaborative Job SchedulingIEEE Transactions on Smart Grid10.1109/TSG.2024.348239016:2(1168-1180)Online publication date: Mar-2025
https://doi.org/10.1109/TSG.2024.3482390
Yue XYang SZhu LTrajanovski SLi FFu X(2025)Exploiting Wide-Area Resource Elasticity With Fine-Grained Orchestration for Serverless AnalyticsIEEE Transactions on Networking10.1109/TNET.2024.348678833:1(398-413)Online publication date: Feb-2025
https://doi.org/10.1109/TNET.2024.3486788
Xiao YZhai ZYu SXu ZLi LZhang FCao LChen X(2025)SEC-DT: Satellite Edge Computing Enabled Dynamic Data Transmission Based on GNN-Assisted MARL for Earth Observation MissionsIEEE Open Journal of the Communications Society10.1109/OJCOMS.2024.35094406(288-301)Online publication date: 2025
https://doi.org/10.1109/OJCOMS.2024.3509440
Li ZGong JXiong XWang D(2025)Multi-Slot Secure Offloading and Resource Management in VEC Networks: A Deep Reinforcement Learning-Based MethodIEEE Access10.1109/ACCESS.2024.352463613(4533-4546)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2024.3524636
Shmuel ALazebnik TGlickman OHeifetz EPrice C(2025)Global lightning-ignited wildfires prediction and climate change projections based on explainable machine learning modelsScientific Reports10.1038/s41598-025-92171-w15:1Online publication date: 6-Mar-2025
https://doi.org/10.1038/s41598-025-92171-w
Karne NRamagundam CPatnala RRamagundam STadiboina S(2025)Adaptive Deep Reinforcement Learning-based Resource Management for Complex Decision Making in Industry Internet of Things ApplicationsProcedia Computer Science10.1016/j.procs.2024.12.025252(231-240)Online publication date: 2025
https://doi.org/10.1016/j.procs.2024.12.025
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten