ABSTRACT
While a number of recent efforts have explored the use of "cloud offload" to enable deep learning on IoT devices, these have not assumed the use of duty-cycled radios like BLE. We argue that radio duty-cycling significantly diminishes the performance of existing cloud-offload methods. We tackle this problem by leveraging a previously unexplored opportunity to use early-exit offload enhanced with prioritized communication, dynamic pooling, and dynamic fusion of features. We show that our system, FLEET, achieves significant benefits in accuracy, latency, and compute budget compared to state-of-art local early exit, remote processing, and model partitioning schemes across a range of DNN models, datasets, and IoT platforms.
- Arm cortex-a77. https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a77.Google Scholar
- Arm cortex-m33. https://developer.arm.com/Processors/Cortex-M33.Google Scholar
- Maximize ble throughput. https://punchthrough.com/maximizing-ble-throughput-on-ios-and-android/.Google Scholar
- nrf online power profiler. https://devzone.nordicsemi.com/power/w/opp.Google Scholar
- Hyomin Choi and Ivan V Bajić. Deep feature compression for collaborative object detection. In 2018 25th IEEE International Conference on Image Processing (ICIP), pages 3743--3747. IEEE, 2018.Google ScholarCross Ref
- Patryk Chrabaszcz, Ilya Loshchilov, and Frank Hutter. A downsampled variant of imagenet as an alternative to the cifar datasets. arXiv preprint arXiv:1707.08819, 2017.Google Scholar
- Robert A Cohen, Hyomin Choi, and Ivan V Bajić. Lightweight compression of neural network feature tensors for collaborative intelligence. In 2020 IEEE International Conference on Multimedia and Expo (ICME), pages 1--6. IEEE, 2020.Google ScholarCross Ref
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248--255, 2009.Google ScholarCross Ref
- Xibin Dong, Zhiwen Yu, Wenming Cao, Yifan Shi, and Qianli Ma. A survey on ensemble learning. Frontiers of Computer Science, 14(2):241--258, 2020.Google ScholarDigital Library
- Amir Erfan Eshratifar, Mohammad Saeed Abrishami, and Massoud Pedram. Jointdnn: An efficient training and inference engine for intelligent mobile cloud computing services. IEEE Transactions on Mobile Computing, 2019.Google Scholar
- Song Han, Huizi Mao, and William J Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770--778, 2016.Google ScholarCross Ref
- Himax WE-I Plus EVB Endpoint AI Development Board. https://www.sparkfun.com/products/17256.Google Scholar
- Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.Google Scholar
- Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1314--1324, 2019.Google ScholarCross Ref
- Jian Huang, Anirudh Badam, Ranveer Chandra, and Edmund B. Nightingale. Weardrive: Fast and energy-efficient storage for wearables. In 2015 USENIX Annual Technical Conference (USENIX ATC 15), pages 613--625, Santa Clara, CA, July 2015. USENIX Association.Google Scholar
- Jin Huang, Colin Samplawski, Deepak Ganesan, Benjamin Marlin, and Heesung Kwon. Clio: Enabling automatic compilation of deep learning pipelines across iot and cloud. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, pages 1--12, 2020.Google ScholarDigital Library
- Sohei Itahara, Takayuki Nishio, and Koji Yamamoto. Packet-loss-tolerant split inference for delay-sensitive deep learning in lossy wireless networks. arXiv preprint arXiv:2104.13629, 2021.Google Scholar
- Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2704--2713, 2018.Google ScholarCross Ref
- Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, and Lingjia Tang. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. ACM SIGARCH Computer Architecture News, 45(1):615--629, 2017.Google ScholarDigital Library
- Yigitcan Kaya, Sanghyun Hong, and Tudor Dumitras. Shallow-deep networks: Understanding and mitigating network overthinking. In International Conference on Machine Learning, pages 3301--3310. PMLR, 2019.Google Scholar
- Jong Hwan Ko, Taesik Na, Mohammad Faisal Amir, and Saibal Mukhopadhyay. Edge-host partitioning of deep neural networks with feature space encoding for resource-constrained internet-of-things platforms. In 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pages 1--6. IEEE, 2018.Google ScholarCross Ref
- Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. Cifar-100 (canadian institute for advanced research).Google Scholar
- Liangzhen Lai and Naveen Suda. Enabling deep learning at the lot edge. In 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pages 1--6. IEEE, 2018.Google ScholarDigital Library
- Stefanos Laskaridis, Stylianos I Venieris, Mario Almeida, Ilias Leontiadis, and Nicholas D Lane. Spinn: synergistic progressive inference of neural networks over device and cloud. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, pages 1--15, 2020.Google ScholarDigital Library
- Ilias Leontiadis, Stefanos Laskaridis, Stylianos I Venieris, and Nicholas D Lane. It's always personal: Using early exits for efficient on-device cnn personalisation. In Proceedings of the 22nd International Workshop on Mobile Computing Systems and Applications, pages 15--21, 2021.Google ScholarDigital Library
- Guangli Li, Lei Liu, Xueying Wang, Xiao Dong, Peng Zhao, and Xiaobing Feng. Auto-tuning neural network quantization framework for collaborative inference between the cloud and edge. In International Conference on Artificial Neural Networks, pages 402--411. Springer, 2018.Google ScholarCross Ref
- Jinyu Li, Rui Zhao, Jui-Ting Huang, and Yifan Gong. Learning small-size dnn with output-distribution-based criteria. In Fifteenth annual conference of the international speech communication association, 2014.Google ScholarCross Ref
- Zhuohan Li, Eric Wallace, Sheng Shen, Kevin Lin, Kurt Keutzer, Dan Klein, and Joey Gonzalez. Train big, then compress: Rethinking model size for efficient training and inference of transformers. In International Conference on Machine Learning, pages 5958--5968. PMLR, 2020.Google Scholar
- Ji Lin, Wei-Ming Chen, Han Cai, Chuang Gan, and Song Han. Memory-efficient patch-based inference for tiny deep learning. Advances in Neural Information Processing Systems, 34, 2021.Google Scholar
- Sicong Liu, Bin Guo, Ke Ma, Zhiwen Yu, and Junzhao Du. Adaspring: Context-adaptive and runtime-evolutionary deep model compression for mobile applications. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 5(1):1--22, 2021.Google ScholarDigital Library
- Zihao Liu, Tao Liu, Wujie Wen, Lei Jiang, Jie Xu, Yanzhi Wang, and Gang Quan. Deepn-jpeg: a deep neural network favorable jpeg-based image compression framework. In Proceedings of the 55th Annual Design Automation Conference, pages 1--6, 2018.Google ScholarDigital Library
- Arnab Neelim Mazumder, Jian Meng, Hasib-Al Rashid, Utteja Kallakuri, Xin Zhang, Jae-sun Seo, and Tinoosh Mohsenin. A survey on the optimization of neural network accelerators for micro-ai on-device inference. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2021.Google ScholarCross Ref
- microTVM: TVM on bare-metal. https://tvm.apache.org/docs/topic/microtvm/index.html.Google Scholar
- Raspberry Pi. Raspberry pi 4 model b. online].(https://www.raspberrypi.org, 2015.Google Scholar
- S32R2X: Microcontrollers for High-Performance Radar. https://www.nxp.com/products/processors-and-microcontrollers/power-architecture/s32r-radar-mcus/s32r26-and-s32r27-microcontrollers-for-high-performance-radar:S32R2X.Google Scholar
- Wenqi Shi, Yunzhong Hou, Sheng Zhou, Zhisheng Niu, Yang Zhang, and Lu Geng. Improving device-edge cooperative inference of deep learning via 2-step pruning. arXiv preprint arXiv:1903.03472, 2019.Google Scholar
- Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818--2826, 2016.Google ScholarCross Ref
- Surat Teerapittayanon, Bradley McDanel, and Hsiang-Tsung Kung. Branchynet: Fast inference via early exiting from deep neural networks. In 2016 23rd International Conference on Pattern Recognition (ICPR), pages 2464--2469. IEEE, 2016.Google ScholarCross Ref
- Surat Teerapittayanon, Bradley McDanel, and Hsiang-Tsung Kung. Distributed deep neural networks over the cloud, the edge and end devices. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pages 328--339. IEEE, 2017.Google ScholarCross Ref
- Vishal Varun Tipparaju, Kyle R Mallires, Di Wang, Francis Tsow, and Xiaojun Xian. Mitigation of data packet loss in bluetooth low energy-based wearable healthcare ecosystem. Biosensors, 11(10):350, 2021.Google ScholarCross Ref
- https://greenwaves-technologies.com/gap8-product/. GAP8: Ultra-low power, always-on processor for embedded artificial intelligence.Google Scholar
- Yue Wang, Jianghao Shen, Ting-Kuei Hu, Pengfei Xu, Tan Nguyen, Richard Baraniuk, Zhangyang Wang, and Yingyan Lin. Dual dynamic inference: Enabling more efficient, adaptive, and controllable deep inference. IEEE Journal of Selected Topics in Signal Processing, 14(4):623--633, 2020.Google ScholarCross Ref
- Pete Warden and Daniel Situnayake. Tinyml: Machine learning with tensorflow lite on arduino and ultra-low-power microcontrollers. O'Reilly Media, 2019.Google Scholar
Index Terms
- Re-thinking computation offload for efficient inference on IoT devices with duty-cycled radios
Recommendations
Efficient and dynamic scaling of fog nodes for IoT devices
It is predicted by the year 2020, more than 50 billion devices will be connected to the Internet. Traditionally, cloud computing has been used as the preferred platform for aggregating, processing, and analyzing IoT traffic. However, the cloud may not ...
Toward integrated Cloud–Fog networks for efficient IoT provisioning: Key challenges and solutions
AbstractFog computing has been proposed as one of the promising technologies for the construction of a scalable network infrastructure in the user’s vicinity, with the purpose of serving the tremendous amount of daily generated latency-...
Highlights- We first provide a literature review of the work related to the integration of Cloud–Fog networks.
All one needs to know about fog computing and related edge computing paradigms: A complete survey
AbstractWith the Internet of Things (IoT) becoming part of our daily life and our environment, we expect rapid growth in the number of connected devices. IoT is expected to connect billions of devices and humans to bring promising advantages ...
Comments