Abstract:
Deep learning services based on cloud computing have deficiencies in latency, privacy, etc. To meet the requirements of low latency, researchers have begun to consider th...Show MoreMetadata
Abstract:
Deep learning services based on cloud computing have deficiencies in latency, privacy, etc. To meet the requirements of low latency, researchers have begun to consider the deployment of deep learning services in edges, i.e., edge intelligence service. Deploying deep learning models on multiple processors or devices so that the computation of a deep learning model can be conducted in parallel is a possible solution to improve the efficiency of edge intelligence services. In this article, we propose a novel latency-driven deep learning model placement method for efficient edge intelligence service. Model placement contains two procedures: model partition and sub-models assignment. In our method, we first convert the model into execution graphs and propose a novel latency-driven multilevel graph partition for the model. Then the partitioned sub-models are heuristically assigned to available processors. To the best of our knowledge, it is the first work that proposes latency-driven graph partition algorithms for model placement. Extensive experiments on several commonly used DNN (deep neural network) models and synthetic datasets show that our method can achieve the lowest execution latency with low complexity compared with other state-of-the-art model placement methods.
Published in: IEEE Transactions on Services Computing ( Volume: 15, Issue: 2, 01 March-April 2022)