research-article

Re-thinking computation offload for efficient inference on IoT devices with duty-cycled radios

Authors:
Jin Huang

University of Massachusetts Amherst, Amherst, Massachusetts, USA

University of Massachusetts Amherst, Amherst, Massachusetts, USA

https://orcid.org/0009-0007-2118-2834
View Profile

,
Hui Guan

University of Massachusetts Amherst, Amherst, Massachusetts, United States

University of Massachusetts Amherst, Amherst, Massachusetts, United States

https://orcid.org/0000-0001-9128-2231
View Profile

,
Deepak Ganesan

University of Massachusetts Amherst, Amherst, Massachusetts, United States

University of Massachusetts Amherst, Amherst, Massachusetts, United States

https://orcid.org/0000-0003-2762-9194
View Profile

ACM MobiCom '23: Proceedings of the 29th Annual International Conference on Mobile Computing and NetworkingOctober 2023Article No.: 13Pages 1–15https://doi.org/10.1145/3570361.3592514

Published:10 July 2023Publication History

ACM MobiCom '23: Proceedings of the 29th Annual International Conference on Mobile Computing and Networking

Pages 1–15

ABSTRACT

While a number of recent efforts have explored the use of "cloud offload" to enable deep learning on IoT devices, these have not assumed the use of duty-cycled radios like BLE. We argue that radio duty-cycling significantly diminishes the performance of existing cloud-offload methods. We tackle this problem by leveraging a previously unexplored opportunity to use early-exit offload enhanced with prioritized communication, dynamic pooling, and dynamic fusion of features. We show that our system, FLEET, achieves significant benefits in accuracy, latency, and compute budget compared to state-of-art local early exit, remote processing, and model partitioning schemes across a range of DNN models, datasets, and IoT platforms.

References

Arm cortex-a77. https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a77.Google Scholar
Arm cortex-m33. https://developer.arm.com/Processors/Cortex-M33.Google Scholar
Maximize ble throughput. https://punchthrough.com/maximizing-ble-throughput-on-ios-and-android/.Google Scholar
nrf online power profiler. https://devzone.nordicsemi.com/power/w/opp.Google Scholar
Hyomin Choi and Ivan V Bajić. Deep feature compression for collaborative object detection. In 2018 25th IEEE International Conference on Image Processing (ICIP), pages 3743--3747. IEEE, 2018.Google ScholarCross Ref
Patryk Chrabaszcz, Ilya Loshchilov, and Frank Hutter. A downsampled variant of imagenet as an alternative to the cifar datasets. arXiv preprint arXiv:1707.08819, 2017.Google Scholar
Robert A Cohen, Hyomin Choi, and Ivan V Bajić. Lightweight compression of neural network feature tensors for collaborative intelligence. In 2020 IEEE International Conference on Multimedia and Expo (ICME), pages 1--6. IEEE, 2020.Google ScholarCross Ref
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248--255, 2009.Google ScholarCross Ref
Xibin Dong, Zhiwen Yu, Wenming Cao, Yifan Shi, and Qianli Ma. A survey on ensemble learning. Frontiers of Computer Science, 14(2):241--258, 2020.Google ScholarDigital Library
Amir Erfan Eshratifar, Mohammad Saeed Abrishami, and Massoud Pedram. Jointdnn: An efficient training and inference engine for intelligent mobile cloud computing services. IEEE Transactions on Mobile Computing, 2019.Google Scholar
Song Han, Huizi Mao, and William J Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015.Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770--778, 2016.Google ScholarCross Ref
Himax WE-I Plus EVB Endpoint AI Development Board. https://www.sparkfun.com/products/17256.Google Scholar
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.Google Scholar
Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1314--1324, 2019.Google ScholarCross Ref
Jian Huang, Anirudh Badam, Ranveer Chandra, and Edmund B. Nightingale. Weardrive: Fast and energy-efficient storage for wearables. In 2015 USENIX Annual Technical Conference (USENIX ATC 15), pages 613--625, Santa Clara, CA, July 2015. USENIX Association.Google Scholar
Jin Huang, Colin Samplawski, Deepak Ganesan, Benjamin Marlin, and Heesung Kwon. Clio: Enabling automatic compilation of deep learning pipelines across iot and cloud. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, pages 1--12, 2020.Google ScholarDigital Library
Sohei Itahara, Takayuki Nishio, and Koji Yamamoto. Packet-loss-tolerant split inference for delay-sensitive deep learning in lossy wireless networks. arXiv preprint arXiv:2104.13629, 2021.Google Scholar
Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2704--2713, 2018.Google ScholarCross Ref
Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, and Lingjia Tang. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. ACM SIGARCH Computer Architecture News, 45(1):615--629, 2017.Google ScholarDigital Library
Yigitcan Kaya, Sanghyun Hong, and Tudor Dumitras. Shallow-deep networks: Understanding and mitigating network overthinking. In International Conference on Machine Learning, pages 3301--3310. PMLR, 2019.Google Scholar
Jong Hwan Ko, Taesik Na, Mohammad Faisal Amir, and Saibal Mukhopadhyay. Edge-host partitioning of deep neural networks with feature space encoding for resource-constrained internet-of-things platforms. In 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pages 1--6. IEEE, 2018.Google ScholarCross Ref
Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. Cifar-100 (canadian institute for advanced research).Google Scholar
Liangzhen Lai and Naveen Suda. Enabling deep learning at the lot edge. In 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pages 1--6. IEEE, 2018.Google ScholarDigital Library
Stefanos Laskaridis, Stylianos I Venieris, Mario Almeida, Ilias Leontiadis, and Nicholas D Lane. Spinn: synergistic progressive inference of neural networks over device and cloud. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, pages 1--15, 2020.Google ScholarDigital Library
Ilias Leontiadis, Stefanos Laskaridis, Stylianos I Venieris, and Nicholas D Lane. It's always personal: Using early exits for efficient on-device cnn personalisation. In Proceedings of the 22nd International Workshop on Mobile Computing Systems and Applications, pages 15--21, 2021.Google ScholarDigital Library
Guangli Li, Lei Liu, Xueying Wang, Xiao Dong, Peng Zhao, and Xiaobing Feng. Auto-tuning neural network quantization framework for collaborative inference between the cloud and edge. In International Conference on Artificial Neural Networks, pages 402--411. Springer, 2018.Google ScholarCross Ref
Jinyu Li, Rui Zhao, Jui-Ting Huang, and Yifan Gong. Learning small-size dnn with output-distribution-based criteria. In Fifteenth annual conference of the international speech communication association, 2014.Google ScholarCross Ref
Zhuohan Li, Eric Wallace, Sheng Shen, Kevin Lin, Kurt Keutzer, Dan Klein, and Joey Gonzalez. Train big, then compress: Rethinking model size for efficient training and inference of transformers. In International Conference on Machine Learning, pages 5958--5968. PMLR, 2020.Google Scholar
Ji Lin, Wei-Ming Chen, Han Cai, Chuang Gan, and Song Han. Memory-efficient patch-based inference for tiny deep learning. Advances in Neural Information Processing Systems, 34, 2021.Google Scholar
Sicong Liu, Bin Guo, Ke Ma, Zhiwen Yu, and Junzhao Du. Adaspring: Context-adaptive and runtime-evolutionary deep model compression for mobile applications. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 5(1):1--22, 2021.Google ScholarDigital Library
Zihao Liu, Tao Liu, Wujie Wen, Lei Jiang, Jie Xu, Yanzhi Wang, and Gang Quan. Deepn-jpeg: a deep neural network favorable jpeg-based image compression framework. In Proceedings of the 55th Annual Design Automation Conference, pages 1--6, 2018.Google ScholarDigital Library
Arnab Neelim Mazumder, Jian Meng, Hasib-Al Rashid, Utteja Kallakuri, Xin Zhang, Jae-sun Seo, and Tinoosh Mohsenin. A survey on the optimization of neural network accelerators for micro-ai on-device inference. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2021.Google ScholarCross Ref
microTVM: TVM on bare-metal. https://tvm.apache.org/docs/topic/microtvm/index.html.Google Scholar
Raspberry Pi. Raspberry pi 4 model b. online].(https://www.raspberrypi.org, 2015.Google Scholar
S32R2X: Microcontrollers for High-Performance Radar. https://www.nxp.com/products/processors-and-microcontrollers/power-architecture/s32r-radar-mcus/s32r26-and-s32r27-microcontrollers-for-high-performance-radar:S32R2X.Google Scholar
Wenqi Shi, Yunzhong Hou, Sheng Zhou, Zhisheng Niu, Yang Zhang, and Lu Geng. Improving device-edge cooperative inference of deep learning via 2-step pruning. arXiv preprint arXiv:1903.03472, 2019.Google Scholar
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818--2826, 2016.Google ScholarCross Ref
Surat Teerapittayanon, Bradley McDanel, and Hsiang-Tsung Kung. Branchynet: Fast inference via early exiting from deep neural networks. In 2016 23rd International Conference on Pattern Recognition (ICPR), pages 2464--2469. IEEE, 2016.Google ScholarCross Ref
Surat Teerapittayanon, Bradley McDanel, and Hsiang-Tsung Kung. Distributed deep neural networks over the cloud, the edge and end devices. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pages 328--339. IEEE, 2017.Google ScholarCross Ref
Vishal Varun Tipparaju, Kyle R Mallires, Di Wang, Francis Tsow, and Xiaojun Xian. Mitigation of data packet loss in bluetooth low energy-based wearable healthcare ecosystem. Biosensors, 11(10):350, 2021.Google ScholarCross Ref
https://greenwaves-technologies.com/gap8-product/. GAP8: Ultra-low power, always-on processor for embedded artificial intelligence.Google Scholar
Yue Wang, Jianghao Shen, Ting-Kuei Hu, Pengfei Xu, Tan Nguyen, Richard Baraniuk, Zhangyang Wang, and Yingyan Lin. Dual dynamic inference: Enabling more efficient, adaptive, and controllable deep inference. IEEE Journal of Selected Topics in Signal Processing, 14(4):623--633, 2020.Google ScholarCross Ref
Pete Warden and Daniel Situnayake. Tinyml: Machine learning with tensorflow lite on arduino and ultra-low-power microcontrollers. O'Reilly Media, 2019.Google Scholar

Index Terms

Re-thinking computation offload for efficient inference on IoT devices with duty-cycled radios
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
      1. Cloud computing
  2. Embedded and cyber-physical systems
    1. Sensor networks
2. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Efficient and dynamic scaling of fog nodes for IoT devices

It is predicted by the year 2020, more than 50 billion devices will be connected to the Internet. Traditionally, cloud computing has been used as the preferred platform for aggregating, processing, and analyzing IoT traffic. However, the cloud may not ...
Read More
Toward integrated Cloud–Fog networks for efficient IoT provisioning: Key challenges and solutions
Abstract
Fog computing has been proposed as one of the promising technologies for the construction of a scalable network infrastructure in the user’s vicinity, with the purpose of serving the tremendous amount of daily generated latency-...
Highlights
- We first provide a literature review of the work related to the integration of Cloud–Fog networks.
Read More
All one needs to know about fog computing and related edge computing paradigms: A complete survey
Abstract
With the Internet of Things (IoT) becoming part of our daily life and our environment, we expect rapid growth in the number of connected devices. IoT is expected to connect billions of devices and humans to bring promising advantages ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ACM MobiCom '23: Proceedings of the 29th Annual International Conference on Mobile Computing and Networking
October 2023
1605 pages
ISBN:9781450399906
DOI:10.1145/3570361
Chairs:
Xavier Costa,
Joerg Widmer,
Co-chairs:
Diego Perino,
Domenico Giustiniano,
Program Chair:
Haitham Al Hassanieh,
Program Co-chairs:
Arash Asadi,
Landon Cox
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 July 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
edge computing
cloud computing
computation off-loading
deep neural networks
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate440of2,972submissions,15%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 673
  Total Downloads
- Downloads (Last 12 months)673
- Downloads (Last 6 weeks)65
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Re-thinking computation offload for efficient inference on IoT devices with duty-cycled radios

ACM MobiCom '23: Proceedings of the 29th Annual International Conference on Mobile Computing and Networking

ABSTRACT

References

Cited By

Index Terms

Recommendations

Efficient and dynamic scaling of fog nodes for IoT devices

Toward integrated Cloud–Fog networks for efficient IoT provisioning: Key challenges and solutions

All one needs to know about fog computing and related edge computing paradigms: A complete survey