Crux: GPU-Efficient Communication Scheduling for Deep Learning Training
Abstract
References
Index Terms
- Crux: GPU-Efficient Communication Scheduling for Deep Learning Training
Recommendations
An efficient and non-intrusive GPU scheduling framework for deep learning training systems
SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisEfficient GPU scheduling is the key to minimizing the execution time of the Deep Learning (DL) training workloads. DL training system schedulers typically allocate a fixed number of GPUs to each job, which inhibits high resource utilization and often ...
Scheduling CPU for GPU-based Deep Learning Jobs
SoCC '18: Proceedings of the ACM Symposium on Cloud ComputingDeep learning (DL) is popular in data-center as an important workload for artificial intelligence. With the recent breakthrough of using graphics accelerators and the popularity of DL framework, GPU server cluster dominates DL training in current ...
SCHED²: Scheduling Deep Learning Training via Deep Reinforcement Learning
2019 IEEE Global Communications Conference (GLOBECOM)Today's companies and organizations build GPU clusters for efficient deep learning training (DLT). However, the inherent heterogeneity of DLT workloads makes it challenging to perform efficient scheduling of the GPUs. On one hand, DLT jobs typically ...
Comments
Information & Contributors
Information
Published In
- Co-chairs:
- Aruna Seneviratne,
- Darryl Veitch,
- Program Co-chairs:
- Vyas Sekar,
- Minlan Yu
Sponsors
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
Check for updates
Badges
- Honorable Mention
Author Tags
Qualifiers
- Research-article
Conference
Acceptance Rates
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 2,191Total Downloads
- Downloads (Last 12 months)2,191
- Downloads (Last 6 weeks)363
Other Metrics
Citations
View Options
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in