research-article

Space-Efficient TREC for Enabling Deep Learning on Microcontrollers

Authors:
Jiesong Liu

Renmin University of China, China

Renmin University of China, China
View Profile

,
Feng Zhang

Renmin University of China, China

Renmin University of China, China
View Profile

,
Jiawei Guan

Renmin University of China, China

Renmin University of China, China
View Profile

,
Hsin-Hsuan Sung

North Carolina State University, USA

North Carolina State University, USA
View Profile

,
Xiaoguang Guo

Renmin University of China, China

Renmin University of China, China
View Profile

,
Xiaoyong Du

Renmin University of China, China

Renmin University of China, China
View Profile

,
Xipeng Shen

North Carolina State University, USA

North Carolina State University, USA
View Profile

ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3March 2023Pages 644–659https://doi.org/10.1145/3582016.3582062

Published:25 March 2023Publication History

ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3

Pages 644–659

ABSTRACT

Deploying deep neural networks (DNNs) for a resource-constrained environment and achieving satisfactory performance is challenging. It is especially so on microcontrollers for their stringent space and computing power. This paper focuses on new ways to make TREC, an optimization recently proposed to enable computation reuse in DNNs, space and time efficient on Microcontrollers. The solution maximizes the performance benefits while keeping the DNN accuracy stable. Experiments show that the solution eliminates over 96% computations in DNNs and makes them fit well into microcontrollers, producing 3.4-5× speedups with only marginal accuracy loss.

References

2020. CifarNet. http://places.csail.mit.edu/deepscene/small-projects/TRN-pytorch-pose/model_zoo/models/slim/nets/cifarnet.py Google Scholar
Peter Bajcsy and Michael Majurski. 2021. Baseline Pruning-Based Approach to Trojan Detection in Neural Networks. arXiv preprint arXiv:2101.12016. Google Scholar
Colby Banbury, Chuteng Zhou, Igor Fedorov, Ramon Matas, Urmish Thakker, Dibakar Gope, Vijay Janapa Reddi, Matthew Mattina, and Paul Whatmough. 2021. Micronets: Neural network architectures for deploying tinyml applications on commodity microcontrollers. Proceedings of Machine Learning and Systems, 3 (2021), 517–532. Google Scholar
Jesús Benito-Picazo, Enrique Domínguez, Esteban J Palomo, Ezequiel López-Rubio, and Juan Miguel Ortiz-de Lazcano-Lobato. 2018. Deep learning-based anomalous object detection system powered by microcontroller for PTZ cameras. In 2018 International Joint Conference on Neural Networks (IJCNN). 1–7. Google ScholarCross Ref
Neel Bhave, Aniket Dhagavkar, Kalpesh Dhande, Monis Bana, and Jyoti Joshi. 2019. Smart Signal–Adaptive Traffic Signal Control using Reinforcement Learning and Object Detection. In 2019 Third International conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC). 624–628. Google ScholarCross Ref
Dimosthenis E Bolanakis. 2019. A survey of research in microcontroller education. IEEE Revista Iberoamericana de Tecnologias del Aprendizaje, 14, 2 (2019), 50–57. Google ScholarCross Ref
Gianmarco Cerutti, Renzo Andri, Lukas Cavigelli, Elisabetta Farella, Michele Magno, and Luca Benini. 2020. Sound event detection with binary neural networks on tightly power-constrained IoT devices. In Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design. 19–24. Google ScholarDigital Library
Beidi Chen, Zichang Liu, Binghui Peng, Zhaozhuo Xu, Jonathan Lingjie Li, Tri Dao, Zhao Song, Anshumali Shrivastava, and Christopher Re. 2021. MONGOOSE: A Learnable LSH Framework for Efficient Neural Network Training. In International Conference on Learning Representations. https://openreview.net/forum?id=wWK7yXkULyh Google Scholar
Arm Company. 2010. Cortex®-M4 Technical Reference Manual. https://users.ece.utexas.edu/~valvano/EE345L/Labs/Fall2011/CortexM4_TRM_r0p1.pdf Google Scholar
Robert David, Jared Duke, Advait Jain, Vijay Janapa Reddi, Nat Jeffries, Jian Li, Nick Kreeger, Ian Nappier, Meghna Natraj, and Tiezhen Wang. 2021. TensorFlow lite micro: Embedded machine learning for tinyml systems. Proceedings of Machine Learning and Systems, 3 (2021), 800–811. Google Scholar
Amir Erfan Eshratifar, Amirhossein Esmaili, and Massoud Pedram. 2019. Bottlenet: A deep learning architecture for intelligent mobile cloud computing services. In 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED). 1–6. Google ScholarCross Ref
Derek Farren, Thai Pham, and Marco Alban-Hidalgo. 2016. Low latency anomaly detection and Bayesian network prediction of anomaly likelihood. arXiv preprint arXiv:1611.03898. Google Scholar
Igor Fedorov, Ryan P Adams, Matthew Mattina, and Paul Whatmough. 2019. Sparse: Sparse architecture search for cnns on resource-constrained microcontrollers. Advances in Neural Information Processing Systems, 32 (2019). Google Scholar
Igor Fedorov, Ryan P Adams, Matthew Mattina, and Paul Whatmough. 2019. Sparse: Sparse architecture search for cnns on resource-constrained microcontrollers. Advances in Neural Information Processing Systems, 32 (2019). Google Scholar
Benjamin Graham, Martin Engelcke, and Laurens Van Der Maaten. 2018. 3d semantic segmentation with submanifold sparse convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 9224–9232. Google ScholarCross Ref
Jiawei Guan, Feng Zhang, Jiesong Liu, Hsin-Hsuan Sung, Ruofan Wu, Xiaoyong Du, and Xipeng Shen. 2022. TREC: Transient Redundancy Elimination-based Convolution. In Neural Information Processing Systems 35 (Neurips 2022). Google Scholar
Chirag Gupta, Arun Sai Suggala, Ankit Goyal, Harsha Vardhan Simhadri, Bhargavi Paranjape, Ashish Kumar, Saurabh Goyal, Raghavendra Udupa, Manik Varma, and Prateek Jain. 2017. Protonn: Compressed and accurate knn for resource-scarce devices. In International Conference on Machine Learning. 1331–1340. Google ScholarDigital Library
Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149. Google Scholar
Bian Haoqiong, Sha Tiannan, and Anastasia Ailamaki. 2023. Using Cloud Functions as Accelerator for Elastic Data Analytics. In SIGMOD. Google Scholar
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network (2015). arXiv preprint arXiv:1503.02531, 2 (2015). Google Scholar
Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv preprint arXiv:1602.07360. Google Scholar
Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv preprint arXiv:1602.07360. Google Scholar
Sunil Jacob, Varun G Menon, Fadi Al-Turjman, PG Vinoj, and Leonardo Mostarda. 2019. Artificial muscle intelligence system with deep learning for post-stroke assistance and rehabilitation. Ieee Access, 7 (2019), 133463–133473. Google ScholarCross Ref
Jari Kaivo-oja. 2012. Weak signals analysis, knowledge management theory and systemic socio-cultural transitions. Futures, 44, 3 (2012), 206–217. Google ScholarCross Ref
Kuljeet Kaur, Sahil Garg, Gagangeet Singh Aujla, Neeraj Kumar, Joel JPC Rodrigues, and Mohsen Guizani. 2018. Edge computing in the industrial internet of things environment: Software-defined-networks-based edge-cloud interplay. IEEE communications magazine, 56, 2 (2018), 44–51. Google Scholar
Dongyeon Kim, Kyuhong Park, Yongjin Park, and Jae-Hyeon Ahn. 2019. Willingness to provide personal information: Perspective of privacy calculus in IoT services. Computers in Human Behavior, 92 (2019), 273–281. Google ScholarDigital Library
Aliaksei Kolesau and Dmitrij Šešok. 2020. Voice activation systems for embedded devices: Systematic literature review. Informatica, 31, 1 (2020), 65–88. Google ScholarDigital Library
Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. Google Scholar
Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. Google Scholar
Ashish Kumar, Saurabh Goyal, and Manik Varma. 2017. Resource-efficient machine learning in 2 KB RAM for the internet of things. In International Conference on Machine Learning. 1935–1944. Google Scholar
Liangzhen Lai and Naveen Suda. 2018. Enabling deep learning at the LoT Edge. In 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 1–6. Google ScholarDigital Library
Liangzhen Lai, Naveen Suda, and Vikas Chandra. 2017. Deep convolutional neural network inference with floating-point weights and fixed-point activations. arXiv preprint arXiv:1703.03073. Google Scholar
Liangzhen Lai, Naveen Suda, and Vikas Chandra. 2018. Cmsis-nn: Efficient neural network kernels for arm cortex-m cpus. arXiv preprint arXiv:1801.06601. Google Scholar
Liangzhen Lai, Naveen Suda, and Vikas Chandra. 2018. Not all ops are created equal!. arXiv preprint arXiv:1801.04326. Google Scholar
Xuesong Li, Jose Guivant, Ngaiming Kwok, Yongzhi Xu, Ruowei Li, and Hongkun Wu. 2019. Three-dimensional backbone network for 3d object detection in traffic scenes. arXiv preprint arXiv:1901.08373. Google Scholar
Andrea Massa, Davide Marcantonio, Xudong Chen, Maokun Li, and Marco Salucci. 2019. DNNs as applied to electromagnetics, antennas, and propagation—A review. IEEE Antennas and Wireless Propagation Letters, 18, 11 (2019), 2225–2229. Google ScholarCross Ref
Simon Mittermaier, Ludwig Kürzinger, Bernd Waschneck, and Gerhard Rigoll. 2020. Small-footprint keyword spotting on raw audio data with sinc-convolutions. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 7454–7458. Google ScholarCross Ref
Mao V Ngo, Hakima Chaouchi, Tie Luo, and Tony QS Quek. 2020. Adaptive anomaly detection for IoT data in hierarchical edge computing. arXiv preprint arXiv:2001.03314. Google Scholar
Lin Ning and Xipeng Shen. 2019. Deep reuse: streamline CNN inference on the fly via coarse-grained computation reuse. In Proceedings of the ACM International Conference on Supercomputing. 438–448. Google ScholarDigital Library
Lin Ning and Xipeng Shen. 2019. Deep Reuse: streamline CNN inference on the fly via coarse-grained computation reuse. In Proceedings of the ACM International Conference on Supercomputing. 438–448. Google ScholarDigital Library
Wei Niu, Xiaolong Ma, Sheng Lin, Shihao Wang, Xuehai Qian, Xue Lin, Yanzhi Wang, and Bin Ren. 2020. PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-Based Weight Pruning. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’20). Association for Computing Machinery, New York, NY, USA. 907–922. isbn:9781450371025 https://doi.org/10.1145/3373376.3378534 Google ScholarDigital Library
Nefy Puteri Novani, Mohammad Hafiz Hersyah, and Ryon Hamdanu. 2020. Electrical Household Appliances Control using Voice Command Based on Microcontroller. In 2020 International Conference on Information Technology Systems and Innovation (ICITSI). 288–293. Google ScholarCross Ref
Michela Paganini and Jessica Forde. 2020. Streamlining tensor and network pruning in pytorch. arXiv preprint arXiv:2004.13770. Google Scholar
Zheng Qin, Zhaoning Zhang, Xiaotao Chen, Changjian Wang, and Yuxing Peng. 2018. Fd-mobilenet: Improved mobilenet with a fast downsampling strategy. In 2018 25th IEEE International Conference on Image Processing (ICIP). 1363–1367. Google ScholarCross Ref
Marc Riera, Jose-Maria Arnau, and Antonio Gonzalez. 2018. Computation Reuse in DNNs by Exploiting Input Similarity. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). 57–68. https://doi.org/10.1109/ISCA.2018.00016 Google ScholarDigital Library
Manuele Rusci, Alessandro Capotondi, and Luca Benini. 2020. Memory-driven mixed low precision quantization for enabling deep network inference on microcontrollers. Proceedings of Machine Learning and Systems, 2 (2020), 326–335. Google Scholar
Falk Salewski and Stefan Kowalewski. 2008. Hardware/software design considerations for automotive embedded systems. IEEE Transactions on Industrial Informatics, 4, 3 (2008), 156–163. Google ScholarCross Ref
Jiawei Shao and Jun Zhang. 2020. Bottlenet++: An end-to-end approach for feature compression in device-edge co-inference systems. In 2020 IEEE International Conference on Communications Workshops (ICC Workshops). 1–6. Google ScholarCross Ref
Prerna Sharma and Deepali Kamthania. 2019. Intelligent object detection and avoidance system. In International Conference on Transforming IDEAS (Inter-Disciplinary Exchanges, Analysis, and Search) into Viable Solutions. 342–351. Google Scholar
Stanislava Soro. 2021. Tinyml for ubiquitous edge ai. arXiv preprint arXiv:2102.01255. Google Scholar
Srinivasa R Sridhara. 2011. Ultra-low power microcontrollers for portable, wearable, and implantable medical electronics. In 16th Asia and South Pacific Design Automation Conference (ASP-DAC 2011). 556–560. Google ScholarCross Ref
Hidetoshi Teraoka, Fumiharu Nakahara, and Kenichi Kurosawa. 2017. Incremental update method for vehicle microcontrollers. In 2017 IEEE 6th Global Conference on Consumer Electronics (GCCE). 1–2. Google ScholarCross Ref
Ching-Biau Tzeng. 2018. Vibration detection and analysis of wind turbine based on a wireless embedded microcontroller system. In 2018 IEEE International Conference on Applied System Invention (ICASI). 133–136. Google ScholarCross Ref
Jiayi Wang, Chengliang Chai, Nan Tang, Jiabin Liu, and Guoliang Li. 2022. Coresets over Multiple Tables for Feature-rich and Data-efficient Machine Learning. Proc. VLDB Endow., 16, 1 (2022), 64–76. https://www.vldb.org/pvldb/vol16/p64-wang.pdf Google ScholarDigital Library
Ruofan Wu, Feng Zhang, Jiawei Guan, Zhen Zheng, Xiaoyong Du, and Xipeng Shen. 2022. Drew: Efficient winograd cnn inference with deep reuse. In Proceedings of the ACM Web Conference 2022. 1807–1816. Google ScholarDigital Library
Ruofan Wu, Feng Zhang, Zhen Zheng, Xiaoyong Du, and Xipeng Shen. 2021. Exploring deep reuse in winograd CNN inference. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 483–484. Google ScholarDigital Library
Yan Yan, Yuxing Mao, and Bo Li. 2018. Second: Sparsely embedded convolutional detection. Sensors, 18, 10 (2018), 3337. Google ScholarCross Ref
Hyunho Yeo, Youngmok Jung, Jaehong Kim, Jinwoo Shin, and Dongsu Han. 2018. Neural adaptive content-aware internet video delivery. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). 645–661. Google Scholar
JZ Yi, YK Tan, ZR Ang, and SK Panda. 2007. Microcontroller based voice-activated powered wheelchair control. In Proceedings of the 1st international convention on Rehabilitation engineering & assistive technology: in conjunction with 1st Tan Tock Seng Hospital Neurorehabilitation Meeting. 67–72. Google ScholarDigital Library
Yunkai Yu, Zhihong Yang, Yuyang You, and Wenjing Shan. 2021. FASSNet: fast apnea syndrome screening neural network based on single-lead electrocardiogram for wearable devices. Physiological Measurement, 42, 8 (2021), 085005. Google ScholarCross Ref
Jian Yuan, Kok Kiong Tan, Tong Heng Lee, and Gerald Choon Huat Koh. 2014. Power-efficient interrupt-driven algorithms for fall detection and classification of activities of daily living. IEEE Sensors Journal, 15, 3 (2014), 1377–1387. Google ScholarCross Ref
Matthew D Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In European conference on computer vision. 818–833. Google ScholarCross Ref
Feng Zhang, Jidong Zhai, Bingsheng He, Shuhao Zhang, and Wenguang Chen. 2016. Understanding co-running behaviors on integrated CPU/GPU architectures. IEEE Transactions on Parallel and Distributed Systems, 28, 3 (2016), 905–918. Google ScholarDigital Library
Feng Zhang, Jidong Zhai, Xipeng Shen, Onur Mutlu, and Xiaoyong Du. 2022. POCLib: a high-performance framework for enabling near orthogonal processing on compression. IEEE Transactions on Parallel and Distributed Systems, 33, 2 (2022), 459–475. Google ScholarDigital Library
Yundong Zhang, Naveen Suda, Liangzhen Lai, and Vikas Chandra. 2017. Hello edge: Keyword spotting on microcontrollers. arXiv preprint arXiv:1711.07128. Google Scholar

Index Terms

Space-Efficient TREC for Enabling Deep Learning on Microcontrollers
1. Computer systems organization
  1. Real-time systems
2. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Advanced Deep Learning with Keras: Apply deep learning techniques, autoencoders, GANs, variational autoencoders, deep reinforcement learning, policy gradients, and more
Read More
Optimizing Deep Learning Inference via Global Analysis and Tensor Expressions
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1

Optimizing deep neural network (DNN) execution is important but becomes increasingly difficult as DNN complexity grows. Existing DNN compilers cannot effectively exploit optimization opportunities across operator boundaries, leaving room for improvement. ...
Read More
Deep learning on microcontrollers: a study on deployment costs and challenges
EuroMLSys '22: Proceedings of the 2nd European Workshop on Machine Learning and Systems

Microcontrollers are an attractive deployment target due to their low cost, modest power usage and abundance in the wild. However, deploying models to such hardware is non-trivial due to a small amount of on-chip RAM (often < 512KB) and limited compute ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3
March 2023
820 pages
ISBN:9781450399180
DOI:10.1145/3582016
General Chair:
Tor M. Aamodt
University of British Columbia, Canada
,
Program Chairs:
Natalie Enright Jerger
University of Toronto, Canada
,
Michael Swift
University of Wisconsin-Madison, USA
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 March 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
Author Tags
compiler optimization
real-time machine learning
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate535of2,713submissions,20%
Upcoming Conference
ASPLOS '24

Sponsor:

sigarch

sigarch

sigarch

29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

April 27 - May 1, 2024

La Jolla , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 508
  Total Downloads
- Downloads (Last 12 months)438
- Downloads (Last 6 weeks)39
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Space-Efficient TREC for Enabling Deep Learning on Microcontrollers

ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3

ABSTRACT

References

Cited By

Index Terms

Recommendations

Advanced Deep Learning with Keras: Apply deep learning techniques, autoencoders, GANs, variational autoencoders, deep reinforcement learning, policy gradients, and more

Optimizing Deep Learning Inference via Global Analysis and Tensor Expressions

Deep learning on microcontrollers: a study on deployment costs and challenges