Loading [a11y]/accessibility-menu.js
DistMind: Efficient Resource Disaggregation for Deep Learning Workloads | IEEE Journals & Magazine | IEEE Xplore

DistMind: Efficient Resource Disaggregation for Deep Learning Workloads


Abstract:

Deep learning (DL) systems suffer from low resource utilization due to 1) monolithic server model that tightly couples compute and memory; and 2) limited sharing between ...Show More

Abstract:

Deep learning (DL) systems suffer from low resource utilization due to 1) monolithic server model that tightly couples compute and memory; and 2) limited sharing between different inference applications, and across inference and training, because of strict service level objectives (SLOs). To address this problem, we present DistMind, a disaggregated DL system that enables efficient multiplexing of DL applications with near-optimal resource utilization. DistMind decouples compute from host memory, and exposes the abstractions of a GPU pool and a memory pool, each of which can be independently provisioned. The key challenge is to dynamically allocate GPU resources to different applications based on their real-time demands while meeting strict SLOs. We tackle this challenge by exploiting the power of high-speed 100 Gbps networks, and design three-stage pipelining, cache-aware load balancing, and DNN-aware sharding mechanisms based on the characteristics of DL workloads, to achieve millisecond-scale application loading overhead and improve system efficiency. We have implemented a prototype of DistMind and integrated it with PyTorch. Experimental results on AWS EC2 show that DistMind achieves near 100% resource utilization, and compared with NVIDIA MPS and Ray, DistMind improves the throughput by up to 279% and reduces the inference latency by up to 94%.
Published in: IEEE/ACM Transactions on Networking ( Volume: 32, Issue: 3, June 2024)
Page(s): 2422 - 2437
Date of Publication: 24 January 2024

ISSN Information:

Funding Agency:


I. Introduction

Deep learning (DL) applications are rising to transform every aspect of modern society. The breakthroughs in deep neural networks (DNNs) in the past few decades have solved many notoriously difficult problems in computer vision [1] and natural language processing [2]. They have been increasingly integrated into applications, and power a wide range of Internet services we use every day, such as face recognition and language translation.

Contact IEEE to Subscribe

References

References is not available for this document.