ABSTRACT
Deploying deep neural networks (DNNs) for a resource-constrained environment and achieving satisfactory performance is challenging. It is especially so on microcontrollers for their stringent space and computing power. This paper focuses on new ways to make TREC, an optimization recently proposed to enable computation reuse in DNNs, space and time efficient on Microcontrollers. The solution maximizes the performance benefits while keeping the DNN accuracy stable. Experiments show that the solution eliminates over 96% computations in DNNs and makes them fit well into microcontrollers, producing 3.4-5× speedups with only marginal accuracy loss.
- 2020. CifarNet. http://places.csail.mit.edu/deepscene/small-projects/TRN-pytorch-pose/model_zoo/models/slim/nets/cifarnet.py Google Scholar
- Peter Bajcsy and Michael Majurski. 2021. Baseline Pruning-Based Approach to Trojan Detection in Neural Networks. arXiv preprint arXiv:2101.12016. Google Scholar
- Colby Banbury, Chuteng Zhou, Igor Fedorov, Ramon Matas, Urmish Thakker, Dibakar Gope, Vijay Janapa Reddi, Matthew Mattina, and Paul Whatmough. 2021. Micronets: Neural network architectures for deploying tinyml applications on commodity microcontrollers. Proceedings of Machine Learning and Systems, 3 (2021), 517–532. Google Scholar
- Jesús Benito-Picazo, Enrique Domínguez, Esteban J Palomo, Ezequiel López-Rubio, and Juan Miguel Ortiz-de Lazcano-Lobato. 2018. Deep learning-based anomalous object detection system powered by microcontroller for PTZ cameras. In 2018 International Joint Conference on Neural Networks (IJCNN). 1–7. Google ScholarCross Ref
- Neel Bhave, Aniket Dhagavkar, Kalpesh Dhande, Monis Bana, and Jyoti Joshi. 2019. Smart Signal–Adaptive Traffic Signal Control using Reinforcement Learning and Object Detection. In 2019 Third International conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC). 624–628. Google ScholarCross Ref
- Dimosthenis E Bolanakis. 2019. A survey of research in microcontroller education. IEEE Revista Iberoamericana de Tecnologias del Aprendizaje, 14, 2 (2019), 50–57. Google ScholarCross Ref
- Gianmarco Cerutti, Renzo Andri, Lukas Cavigelli, Elisabetta Farella, Michele Magno, and Luca Benini. 2020. Sound event detection with binary neural networks on tightly power-constrained IoT devices. In Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design. 19–24. Google ScholarDigital Library
- Beidi Chen, Zichang Liu, Binghui Peng, Zhaozhuo Xu, Jonathan Lingjie Li, Tri Dao, Zhao Song, Anshumali Shrivastava, and Christopher Re. 2021. MONGOOSE: A Learnable LSH Framework for Efficient Neural Network Training. In International Conference on Learning Representations. https://openreview.net/forum?id=wWK7yXkULyh Google Scholar
- Arm Company. 2010. Cortex®-M4 Technical Reference Manual. https://users.ece.utexas.edu/~valvano/EE345L/Labs/Fall2011/CortexM4_TRM_r0p1.pdf Google Scholar
- Robert David, Jared Duke, Advait Jain, Vijay Janapa Reddi, Nat Jeffries, Jian Li, Nick Kreeger, Ian Nappier, Meghna Natraj, and Tiezhen Wang. 2021. TensorFlow lite micro: Embedded machine learning for tinyml systems. Proceedings of Machine Learning and Systems, 3 (2021), 800–811. Google Scholar
- Amir Erfan Eshratifar, Amirhossein Esmaili, and Massoud Pedram. 2019. Bottlenet: A deep learning architecture for intelligent mobile cloud computing services. In 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED). 1–6. Google ScholarCross Ref
- Derek Farren, Thai Pham, and Marco Alban-Hidalgo. 2016. Low latency anomaly detection and Bayesian network prediction of anomaly likelihood. arXiv preprint arXiv:1611.03898. Google Scholar
- Igor Fedorov, Ryan P Adams, Matthew Mattina, and Paul Whatmough. 2019. Sparse: Sparse architecture search for cnns on resource-constrained microcontrollers. Advances in Neural Information Processing Systems, 32 (2019). Google Scholar
- Igor Fedorov, Ryan P Adams, Matthew Mattina, and Paul Whatmough. 2019. Sparse: Sparse architecture search for cnns on resource-constrained microcontrollers. Advances in Neural Information Processing Systems, 32 (2019). Google Scholar
- Benjamin Graham, Martin Engelcke, and Laurens Van Der Maaten. 2018. 3d semantic segmentation with submanifold sparse convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 9224–9232. Google ScholarCross Ref
- Jiawei Guan, Feng Zhang, Jiesong Liu, Hsin-Hsuan Sung, Ruofan Wu, Xiaoyong Du, and Xipeng Shen. 2022. TREC: Transient Redundancy Elimination-based Convolution. In Neural Information Processing Systems 35 (Neurips 2022). Google Scholar
- Chirag Gupta, Arun Sai Suggala, Ankit Goyal, Harsha Vardhan Simhadri, Bhargavi Paranjape, Ashish Kumar, Saurabh Goyal, Raghavendra Udupa, Manik Varma, and Prateek Jain. 2017. Protonn: Compressed and accurate knn for resource-scarce devices. In International Conference on Machine Learning. 1331–1340. Google ScholarDigital Library
- Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149. Google Scholar
- Bian Haoqiong, Sha Tiannan, and Anastasia Ailamaki. 2023. Using Cloud Functions as Accelerator for Elastic Data Analytics. In SIGMOD. Google Scholar
- Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network (2015). arXiv preprint arXiv:1503.02531, 2 (2015). Google Scholar
- Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv preprint arXiv:1602.07360. Google Scholar
- Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv preprint arXiv:1602.07360. Google Scholar
- Sunil Jacob, Varun G Menon, Fadi Al-Turjman, PG Vinoj, and Leonardo Mostarda. 2019. Artificial muscle intelligence system with deep learning for post-stroke assistance and rehabilitation. Ieee Access, 7 (2019), 133463–133473. Google ScholarCross Ref
- Jari Kaivo-oja. 2012. Weak signals analysis, knowledge management theory and systemic socio-cultural transitions. Futures, 44, 3 (2012), 206–217. Google ScholarCross Ref
- Kuljeet Kaur, Sahil Garg, Gagangeet Singh Aujla, Neeraj Kumar, Joel JPC Rodrigues, and Mohsen Guizani. 2018. Edge computing in the industrial internet of things environment: Software-defined-networks-based edge-cloud interplay. IEEE communications magazine, 56, 2 (2018), 44–51. Google Scholar
- Dongyeon Kim, Kyuhong Park, Yongjin Park, and Jae-Hyeon Ahn. 2019. Willingness to provide personal information: Perspective of privacy calculus in IoT services. Computers in Human Behavior, 92 (2019), 273–281. Google ScholarDigital Library
- Aliaksei Kolesau and Dmitrij Šešok. 2020. Voice activation systems for embedded devices: Systematic literature review. Informatica, 31, 1 (2020), 65–88. Google ScholarDigital Library
- Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. Google Scholar
- Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. Google Scholar
- Ashish Kumar, Saurabh Goyal, and Manik Varma. 2017. Resource-efficient machine learning in 2 KB RAM for the internet of things. In International Conference on Machine Learning. 1935–1944. Google Scholar
- Liangzhen Lai and Naveen Suda. 2018. Enabling deep learning at the LoT Edge. In 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 1–6. Google ScholarDigital Library
- Liangzhen Lai, Naveen Suda, and Vikas Chandra. 2017. Deep convolutional neural network inference with floating-point weights and fixed-point activations. arXiv preprint arXiv:1703.03073. Google Scholar
- Liangzhen Lai, Naveen Suda, and Vikas Chandra. 2018. Cmsis-nn: Efficient neural network kernels for arm cortex-m cpus. arXiv preprint arXiv:1801.06601. Google Scholar
- Liangzhen Lai, Naveen Suda, and Vikas Chandra. 2018. Not all ops are created equal!. arXiv preprint arXiv:1801.04326. Google Scholar
- Xuesong Li, Jose Guivant, Ngaiming Kwok, Yongzhi Xu, Ruowei Li, and Hongkun Wu. 2019. Three-dimensional backbone network for 3d object detection in traffic scenes. arXiv preprint arXiv:1901.08373. Google Scholar
- Andrea Massa, Davide Marcantonio, Xudong Chen, Maokun Li, and Marco Salucci. 2019. DNNs as applied to electromagnetics, antennas, and propagation—A review. IEEE Antennas and Wireless Propagation Letters, 18, 11 (2019), 2225–2229. Google ScholarCross Ref
- Simon Mittermaier, Ludwig Kürzinger, Bernd Waschneck, and Gerhard Rigoll. 2020. Small-footprint keyword spotting on raw audio data with sinc-convolutions. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 7454–7458. Google ScholarCross Ref
- Mao V Ngo, Hakima Chaouchi, Tie Luo, and Tony QS Quek. 2020. Adaptive anomaly detection for IoT data in hierarchical edge computing. arXiv preprint arXiv:2001.03314. Google Scholar
- Lin Ning and Xipeng Shen. 2019. Deep reuse: streamline CNN inference on the fly via coarse-grained computation reuse. In Proceedings of the ACM International Conference on Supercomputing. 438–448. Google ScholarDigital Library
- Lin Ning and Xipeng Shen. 2019. Deep Reuse: streamline CNN inference on the fly via coarse-grained computation reuse. In Proceedings of the ACM International Conference on Supercomputing. 438–448. Google ScholarDigital Library
- Wei Niu, Xiaolong Ma, Sheng Lin, Shihao Wang, Xuehai Qian, Xue Lin, Yanzhi Wang, and Bin Ren. 2020. PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-Based Weight Pruning. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’20). Association for Computing Machinery, New York, NY, USA. 907–922. isbn:9781450371025 https://doi.org/10.1145/3373376.3378534 Google ScholarDigital Library
- Nefy Puteri Novani, Mohammad Hafiz Hersyah, and Ryon Hamdanu. 2020. Electrical Household Appliances Control using Voice Command Based on Microcontroller. In 2020 International Conference on Information Technology Systems and Innovation (ICITSI). 288–293. Google ScholarCross Ref
- Michela Paganini and Jessica Forde. 2020. Streamlining tensor and network pruning in pytorch. arXiv preprint arXiv:2004.13770. Google Scholar
- Zheng Qin, Zhaoning Zhang, Xiaotao Chen, Changjian Wang, and Yuxing Peng. 2018. Fd-mobilenet: Improved mobilenet with a fast downsampling strategy. In 2018 25th IEEE International Conference on Image Processing (ICIP). 1363–1367. Google ScholarCross Ref
- Marc Riera, Jose-Maria Arnau, and Antonio Gonzalez. 2018. Computation Reuse in DNNs by Exploiting Input Similarity. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). 57–68. https://doi.org/10.1109/ISCA.2018.00016 Google ScholarDigital Library
- Manuele Rusci, Alessandro Capotondi, and Luca Benini. 2020. Memory-driven mixed low precision quantization for enabling deep network inference on microcontrollers. Proceedings of Machine Learning and Systems, 2 (2020), 326–335. Google Scholar
- Falk Salewski and Stefan Kowalewski. 2008. Hardware/software design considerations for automotive embedded systems. IEEE Transactions on Industrial Informatics, 4, 3 (2008), 156–163. Google ScholarCross Ref
- Jiawei Shao and Jun Zhang. 2020. Bottlenet++: An end-to-end approach for feature compression in device-edge co-inference systems. In 2020 IEEE International Conference on Communications Workshops (ICC Workshops). 1–6. Google ScholarCross Ref
- Prerna Sharma and Deepali Kamthania. 2019. Intelligent object detection and avoidance system. In International Conference on Transforming IDEAS (Inter-Disciplinary Exchanges, Analysis, and Search) into Viable Solutions. 342–351. Google Scholar
- Stanislava Soro. 2021. Tinyml for ubiquitous edge ai. arXiv preprint arXiv:2102.01255. Google Scholar
- Srinivasa R Sridhara. 2011. Ultra-low power microcontrollers for portable, wearable, and implantable medical electronics. In 16th Asia and South Pacific Design Automation Conference (ASP-DAC 2011). 556–560. Google ScholarCross Ref
- Hidetoshi Teraoka, Fumiharu Nakahara, and Kenichi Kurosawa. 2017. Incremental update method for vehicle microcontrollers. In 2017 IEEE 6th Global Conference on Consumer Electronics (GCCE). 1–2. Google ScholarCross Ref
- Ching-Biau Tzeng. 2018. Vibration detection and analysis of wind turbine based on a wireless embedded microcontroller system. In 2018 IEEE International Conference on Applied System Invention (ICASI). 133–136. Google ScholarCross Ref
- Jiayi Wang, Chengliang Chai, Nan Tang, Jiabin Liu, and Guoliang Li. 2022. Coresets over Multiple Tables for Feature-rich and Data-efficient Machine Learning. Proc. VLDB Endow., 16, 1 (2022), 64–76. https://www.vldb.org/pvldb/vol16/p64-wang.pdf Google ScholarDigital Library
- Ruofan Wu, Feng Zhang, Jiawei Guan, Zhen Zheng, Xiaoyong Du, and Xipeng Shen. 2022. Drew: Efficient winograd cnn inference with deep reuse. In Proceedings of the ACM Web Conference 2022. 1807–1816. Google ScholarDigital Library
- Ruofan Wu, Feng Zhang, Zhen Zheng, Xiaoyong Du, and Xipeng Shen. 2021. Exploring deep reuse in winograd CNN inference. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 483–484. Google ScholarDigital Library
- Yan Yan, Yuxing Mao, and Bo Li. 2018. Second: Sparsely embedded convolutional detection. Sensors, 18, 10 (2018), 3337. Google ScholarCross Ref
- Hyunho Yeo, Youngmok Jung, Jaehong Kim, Jinwoo Shin, and Dongsu Han. 2018. Neural adaptive content-aware internet video delivery. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). 645–661. Google Scholar
- JZ Yi, YK Tan, ZR Ang, and SK Panda. 2007. Microcontroller based voice-activated powered wheelchair control. In Proceedings of the 1st international convention on Rehabilitation engineering & assistive technology: in conjunction with 1st Tan Tock Seng Hospital Neurorehabilitation Meeting. 67–72. Google ScholarDigital Library
- Yunkai Yu, Zhihong Yang, Yuyang You, and Wenjing Shan. 2021. FASSNet: fast apnea syndrome screening neural network based on single-lead electrocardiogram for wearable devices. Physiological Measurement, 42, 8 (2021), 085005. Google ScholarCross Ref
- Jian Yuan, Kok Kiong Tan, Tong Heng Lee, and Gerald Choon Huat Koh. 2014. Power-efficient interrupt-driven algorithms for fall detection and classification of activities of daily living. IEEE Sensors Journal, 15, 3 (2014), 1377–1387. Google ScholarCross Ref
- Matthew D Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In European conference on computer vision. 818–833. Google ScholarCross Ref
- Feng Zhang, Jidong Zhai, Bingsheng He, Shuhao Zhang, and Wenguang Chen. 2016. Understanding co-running behaviors on integrated CPU/GPU architectures. IEEE Transactions on Parallel and Distributed Systems, 28, 3 (2016), 905–918. Google ScholarDigital Library
- Feng Zhang, Jidong Zhai, Xipeng Shen, Onur Mutlu, and Xiaoyong Du. 2022. POCLib: a high-performance framework for enabling near orthogonal processing on compression. IEEE Transactions on Parallel and Distributed Systems, 33, 2 (2022), 459–475. Google ScholarDigital Library
- Yundong Zhang, Naveen Suda, Liangzhen Lai, and Vikas Chandra. 2017. Hello edge: Keyword spotting on microcontrollers. arXiv preprint arXiv:1711.07128. Google Scholar
Index Terms
- Space-Efficient TREC for Enabling Deep Learning on Microcontrollers
Recommendations
Optimizing Deep Learning Inference via Global Analysis and Tensor Expressions
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1Optimizing deep neural network (DNN) execution is important but becomes increasingly difficult as DNN complexity grows. Existing DNN compilers cannot effectively exploit optimization opportunities across operator boundaries, leaving room for improvement. ...
Deep learning on microcontrollers: a study on deployment costs and challenges
EuroMLSys '22: Proceedings of the 2nd European Workshop on Machine Learning and SystemsMicrocontrollers are an attractive deployment target due to their low cost, modest power usage and abundance in the wild. However, deploying models to such hardware is non-trivial due to a small amount of on-chip RAM (often < 512KB) and limited compute ...
Comments