research-article

Unlocking the Non-deterministic Computing Power with Memory-Elastic Multi-Exit Neural Networks

Authors:

Wei DongAuthors Info & Claims

WWW '24: Proceedings of the ACM Web Conference 2024

Pages 2777 - 2785

https://doi.org/10.1145/3589334.3645340

Published: 13 May 2024 Publication History

Abstract

With the increasing demand for Web of Things (WoT) and edge computing, the efficient utilization of limited computing power on edge devices is becoming a crucial challenge. Traditional neural networks (NNs) as web services rely on deterministic computational resources. However, they may fail to output the results on non-deterministic computing power which could be preempted at any time, degrading the task performance significantly. Multi-exit NNs with multiple branches have been proposed as a solution, but the accuracy of intermediate results may be unsatisfactory. In this paper, we propose MEEdge, a system that automatically transforms classic single-exit models into heterogeneous and dynamic multi-exit models which enables Memory-Elastic inference at the Edge with non-deterministic computing power. To build heterogeneous multi-exit models, MEEdge uses efficient convolutions to form a branch zoo and High Priority First (HPF)-based branch placement method for branch growth. To adapt models to dynamically varying computational resources, we employ a novel on-device scheduler for collaboration. Further, to reduce the memory overhead caused by dynamic branches, we propose neuron-level weight sharing and few-shot knowledge distillation(KD) retraining. Our experimental results show that models generated by MEEdge can achieve up to 27.31% better performance than existing multi-exit NNs.

Supplemental Material

MP4 File - rfp0154.MP4

Supplemental video

Download
42.61 MB

References

[1]

Zeqian Dong, Qiang He, Feifei Chen, Hai Jin, Tao Gu, and Yun Yang. 2023. Edge-Move: Pipelining Device-Edge Model Training for Mobile Intelligence. In Proceedings of the ACM Web Conference 2023. 3142--3153.

[2]

Biyi Fang, Xiao Zeng, Faen Zhang, Hui Xu, and Mi Zhang. 2020. FlexDNN: Inputadaptive on-device deep learning for efficient mobile vision. In 2020 IEEE/ACM Symposium on Edge Computing (SEC). IEEE, 84--95.

[3]

Xenofon Foukas and Bozidar Radunovic. 2021. Concordia: teaching the 5G vRAN to share compute. In Proceedings of the 2021 ACM SIGCOMM 2021 Conference. 580--596.

Digital Library

[4]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[5]

Xiaoxi He, Zimu Zhou, and Lothar Thiele. 2018. Multi-task zipping via layer-wise neuron sharing. Advances in Neural Information Processing Systems 31 (2018).

[6]

Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).

[7]

Gao Huang, Danlu Chen, Tianhong Li, Felix Wu, Laurens Van Der Maaten, and Kilian QWeinberger. 2018. Multi-scale dense convolutional networks for efficient prediction. In Proc. of ICLR.

[8]

Kai Huang and Wei Gao. 2022. Real-time neural network inference on extremely weak devices: agile offloading with explainable AI. In Proceedings of the 28th Annual International Conference on Mobile Computing And Networking. 200--213.

Digital Library

[9]

Zhaowu Huang, Fang Dong, Dian Shen, Junxue Zhang, HuitianWang, Guangxing Cai, and Qiang He. 2021. Enabling Low Latency Edge Intelligence based on Multiexit DNNs in the Wild. In 2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS). IEEE, 729--739.

[10]

Forrest Iandola, Matt Moskewicz, Sergey Karayev, Ross Girshick, Trevor Darrell, and Kurt Keutzer. 2014. Densenet: Implementing efficient convnet descriptor pyramids. arXiv preprint arXiv:1404.1869 (2014).

[11]

Frederick James. 1980. Monte Carlo theory and practice. Reports on progress in Physics 43, 9 (1980), 1145.

[12]

Tian Jin and Seokin Hong. 2019. Split-cnn: Splitting window-based operations in convolutional neural networks for memory system optimization. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 835--847.

Digital Library

[13]

Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, and Lingjia Tang. 2017. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. ACM SIGARCH Computer Architecture News 45, 1 (2017), 615--629.

Digital Library

[14]

Yigitcan Kaya, Sanghyun Hong, and Tudor Dumitras. 2019. Shallow-deep networks: Understanding and mitigating network overthinking. In International conference on machine learning. PMLR, 3301--3310.

[15]

A. Krizhevsky and G. Hinton. 2009. Learning multiple layers of features from tiny images. Handbook of Systemic Autoimmune Diseases 1, 4 (2009).

[16]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2017. Imagenet classification with deep convolutional neural networks. Commun. ACM 60, 6 (2017), 84--90.

Digital Library

[17]

Young D Kwon, Jagmohan Chauhan, and Cecilia Mascolo. 2022. Yono: Modeling multiple heterogeneous neural networks on microcontrollers. In 2022 21st ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN). IEEE, 285--297.

[18]

Stefanos Laskaridis, Stylianos I Venieris, Mario Almeida, Ilias Leontiadis, and Nicholas D Lane. 2020. SPINN: synergistic progressive inference of neural networks over device and cloud. In Proceedings of the 26th annual international conference on mobile computing and networking. 1--15.

Digital Library

[19]

Stefanos Laskaridis, Stylianos I Venieris, Hyeji Kim, and Nicholas D Lane. 2020. HAPI: Hardware-aware progressive inference. In 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD). IEEE, 1--9.

Digital Library

[20]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradientbased learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.

[21]

Seulki Lee and Shahriar Nirjon. 2020. Fast and scalable in-memory deep multitask learning via neural weight virtualization. In Proceedings of the 18th International Conference on Mobile Systems, Applications, and Services. 175--190.

Digital Library

[22]

En Li, Liekang Zeng, Zhi Zhou, and Xu Chen. 2019. Edge AI: On-demand accelerating deep neural network inference via edge computing. IEEE Transactions on Wireless Communications 19, 1 (2019), 447--457.

[23]

Yanan Li, Haitao Yuan, Zhe Fu, Xiao Ma, Mengwei Xu, and Shangguang Wang. 2023. ELASTIC: Edge Workload Forecasting based on Collaborative Cloud-Edge Deep Learning. In Proceedings of the ACM Web Conference 2023. 3056--3066.

Digital Library

[24]

Zheqi Lv, Wenqiao Zhang, Shengyu Zhang, Kun Kuang, Feng Wang, Yongwei Wang, Zhengyu Chen, Tao Shen, Hongxia Yang, Beng Chin Ooi, et al. 2023. DUET: A Tuning-Free Device-Cloud Collaborative Parameters Generation Framework for Efficient Device Model Generalization. In Proceedings of the ACM Web Conference 2023. 3077--3085.

Digital Library

[25]

Franck Mamalet and Christophe Garcia. 2012. Simplifying convnets for fast learning. In Artificial Neural Networks and Machine Learning--ICANN 2012: 22nd International Conference on Artificial Neural Networks, Lausanne, Switzerland, September 11--14, 2012, Proceedings, Part II 22. Springer, 58--65.

Digital Library

[26]

Seungeun Oh, Jihong Park, Praneeth Vepakomma, Sihun Baek, Ramesh Raskar, Mehdi Bennis, and Seong-Lyun Kim. 2022. Locfedmix-sl: Localize, federate, and mix for improved scalability, convergence, and latency in split learning. In Proceedings of the ACM Web Conference 2022. 3347--3357.

Digital Library

[27]

Arthi Padmanabhan, Neil Agarwal, Anand Iyer, Ganesh Ananthanarayanan, Yuanchao Shu, Nikolaos Karianakis, Guoqing Harry Xu, and Ravi Netravali. 2022. GEMEL: Model Merging for Memory-Efficient, Real-Time Video Analytics at the Edge. arXiv preprint arXiv:2201.07705 (2022).

[28]

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510--4520.

[29]

Karen Simonyan and AndrewZisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[30]

Surat Teerapittayanon, Bradley McDanel, and Hsiang-Tsung Kung. 2016. Branchynet: Fast inference via early exiting from deep neural networks. In 2016 23rd International Conference on Pattern Recognition (ICPR). IEEE, 2464--2469.

[31]

Surat Teerapittayanon, Bradley McDanel, and Hsiang-Tsung Kung. 2017. Distributed deep neural networks over the cloud, the edge and end devices. In 2017 IEEE 37th international conference on distributed computing systems (ICDCS). IEEE, 328--339.

[32]

AoWang, Jingyuan Zhang, Xiaolong Ma, Ali Anwar, Lukas Rupprecht, Dimitrios Skourtis, Vasily Tarasov, Feng Yan, and Yue Cheng. 2020. Infinicache: Exploiting ephemeral serverless functions to build a cost-effective memory cache. arXiv preprint arXiv:2001.10483 (2020).

[33]

Le Yang, Yizeng Han, Xi Chen, Shiji Song, Jifeng Dai, and Gao Huang. 2020. Resolution Adaptive Networks for Efficient Inference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]

Shuochao Yao, Jinyang Li, Dongxin Liu, Tianshi Wang, Shengzhong Liu, Huajie Shao, and Tarek Abdelzaher. 2020. Deep compressive offloading: Speeding up neural network inference by trading edge computation for network latency. In Proceedings of the 18th Conference on Embedded Networked Sensor Systems. 476--488.

Digital Library

[35]

Fisher Yu and Vladlen Koltun. 2015. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015).

[36]

Linfeng Zhang, Jiebo Song, Anni Gao, Jingwei Chen, Chenglong Bao, and Kaisheng Ma. 2019. Be your own teacher: Improve the performance of convolutional neural networks via self distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3713--3722.

[37]

Yi Zhang, Yue Zheng, Kun Qian, Guidong Zhang, Yunhao Liu, Chenshu Wu, and Zheng Yang. 2021. Widar3.0: Zero-effort cross-domain gesture recognition with Wi-Fi. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 11 (2021), 8671--8688.

Index Terms

Unlocking the Non-deterministic Computing Power with Memory-Elastic Multi-Exit Neural Networks
1. Human-centered computing
  1. Ubiquitous and mobile computing
    1. Ubiquitous and mobile computing systems and tools

Recommendations

Supporting Multi-Provider Serverless Computing on the Edge
ICPP Workshops '18: Workshop Proceedings of the 47th International Conference on Parallel Processing

Serverless computing has recently emerged as a new execution model for cloud computing, in which service providers offer compute runtimes, also known as Function-as-a-Service (FaaS) platforms, allowing users to develop, execute and manage application ...
Deviceless edge computing: extending serverless computing to the edge of the network
SYSTOR '17: Proceedings of the 10th ACM International Systems and Storage Conference

The serverless paradigm has been rapidly adopted by developers of cloud-native applications, mainly because it relieves them from the burden of provisioning, scaling and operating the underlying infrastructure. In this paper, we propose a novel ...
All one needs to know about fog computing and related edge computing paradigms: A complete survey
Abstract
With the Internet of Things (IoT) becoming part of our daily life and our environment, we expect rapid growth in the number of connected devices. IoT is expected to connect billions of devices and humans to bring promising advantages ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '24: Proceedings of the ACM Web Conference 2024

May 2024

4826 pages

ISBN:9798400701719

DOI:10.1145/3589334

General Chairs:
Tat-Seng Chua
National University of Singapore
,
Chong-Wah Ngo
Singapore Management University
,
Proceedings Chair:
Roy Ka-Wei Lee
Singapore University of Technology and Design
,
Program Chairs:
Ravi Kumar
Google
,
Hady W. Lauw
Singapore Management University

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WWW '24

Sponsor:

SIGWEB

WWW '24: The ACM Web Conference 2024

May 13 - 17, 2024

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
135
Total Downloads

Downloads (Last 12 months)135
Downloads (Last 6 weeks)20

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten