Journals & Magazines >IEEE/ACM Transactions on Netw... >Volume: 32 Issue: 3

DistMind: Efficient Resource Disaggregation for Deep Learning Workloads

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Deep learning (DL) systems suffer from low resource utilization due to 1) monolithic server model that tightly couples compute and memory; and 2) limited sharing between ...Show More

Metadata

Abstract:

Deep learning (DL) systems suffer from low resource utilization due to 1) monolithic server model that tightly couples compute and memory; and 2) limited sharing between different inference applications, and across inference and training, because of strict service level objectives (SLOs). To address this problem, we present DistMind, a disaggregated DL system that enables efficient multiplexing of DL applications with near-optimal resource utilization. DistMind decouples compute from host memory, and exposes the abstractions of a GPU pool and a memory pool, each of which can be independently provisioned. The key challenge is to dynamically allocate GPU resources to different applications based on their real-time demands while meeting strict SLOs. We tackle this challenge by exploiting the power of high-speed 100 Gbps networks, and design three-stage pipelining, cache-aware load balancing, and DNN-aware sharding mechanisms based on the characteristics of DL workloads, to achieve millisecond-scale application loading overhead and improve system efficiency. We have implemented a prototype of DistMind and integrated it with PyTorch. Experimental results on AWS EC2 show that DistMind achieves near 100% resource utilization, and compared with NVIDIA MPS and Ray, DistMind improves the throughput by up to 279% and reduces the inference latency by up to 94%.

Published in: IEEE/ACM Transactions on Networking ( Volume: 32, Issue: 3, June 2024)

Page(s): 2422 - 2437

Date of Publication: 24 January 2024

ISSN Information:

DOI: 10.1109/TNET.2024.3355010

Funding Agency:

Contents

I. Introduction

Deep learning (DL) applications are rising to transform every aspect of modern society. The breakthroughs in deep neural networks (DNNs) in the past few decades have solved many notoriously difficult problems in computer vision [1] and natural language processing [2]. They have been increasingly integrated into applications, and power a wide range of Internet services we use every day, such as face recognition and language translation.

References is not available for this document.

DistMind: Efficient Resource Disaggregation for Deep Learning Workloads

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

DistMind: Efficient Resource Disaggregation for Deep Learning Workloads

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

Authors

Figures

References

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?