Latency-Driven Model Placement for Efficient Edge Intelligence Service | IEEE Journals & Magazine | IEEE Xplore

Latency-Driven Model Placement for Efficient Edge Intelligence Service


Abstract:

Deep learning services based on cloud computing have deficiencies in latency, privacy, etc. To meet the requirements of low latency, researchers have begun to consider th...Show More

Abstract:

Deep learning services based on cloud computing have deficiencies in latency, privacy, etc. To meet the requirements of low latency, researchers have begun to consider the deployment of deep learning services in edges, i.e., edge intelligence service. Deploying deep learning models on multiple processors or devices so that the computation of a deep learning model can be conducted in parallel is a possible solution to improve the efficiency of edge intelligence services. In this article, we propose a novel latency-driven deep learning model placement method for efficient edge intelligence service. Model placement contains two procedures: model partition and sub-models assignment. In our method, we first convert the model into execution graphs and propose a novel latency-driven multilevel graph partition for the model. Then the partitioned sub-models are heuristically assigned to available processors. To the best of our knowledge, it is the first work that proposes latency-driven graph partition algorithms for model placement. Extensive experiments on several commonly used DNN (deep neural network) models and synthetic datasets show that our method can achieve the lowest execution latency with low complexity compared with other state-of-the-art model placement methods.
Published in: IEEE Transactions on Services Computing ( Volume: 15, Issue: 2, 01 March-April 2022)
Page(s): 591 - 601
Date of Publication: 01 September 2021

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.