skip to main content
10.1145/3581791.3596831acmconferencesArticle/Chapter ViewAbstractPublication PagesmobisysConference Proceedingsconference-collections
research-article

ConvReLU++: Reference-based Lossless Acceleration of Conv-ReLU Operations on Mobile CPU

Published: 18 June 2023 Publication History

Abstract

Many activation values of Convolutional Neural Networks (CNNs) are zeros due to ReLU (Rectified Linear Unit), one of the most common activation functions used in modern neural networks. Since ReLU outputs are zero for all negative inputs, existing CNN acceleration approaches estimate zero outputs to skip redundant computation, which has to sacrifice accuracy for efficiency and leads to dilemma trade-offs and cockamamie configuration. In this paper, we introduce a lossless acceleration method ConvReLU++ for CNN inference on mobile devices, which accurately detects and skips zero-outputs for speedup without failures. The key to early negative detection is adopting reference-based upper-bounds calculation. This ensures that as soon as the intermediate results become negative, the final results are guaranteed to be negative. Upon detection, the remaining computation can be skipped and the following ReLU output can be simply set to zero. We rigorously prove the losslessness property of ConvReLU++, analyze the theoretical FLOPs reduction, and show the compatibility of our method with vector-level parallelism on mobile platforms. We implement ConvReLU++ in popular mobile inference frameworks and evaluate it on common deep vision tasks. The results demonstrate that ConvReLU++ can achieve 2.90% to 8.91% latency reduction over the original inference framework on edge devices without sacrificing accuracy. Our code can be found at https://github.com/monster119120/conv_relu_plus_plus.

References

[1]
Vahideh Akhlaghi, Amir Yazdanbakhsh, Kambiz Samadi, Rajesh K. Gupta, and Hadi Esmaeilzadeh. 2018. SnaPEA: Predictive Early Activation for Reducing Computation in Deep Convolutional Neural Networks. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). 662--673.
[2]
Cesare Alippi, Simone Disabato, and Manuel Roveri. 2018. Moving Convolutional Neural Networks to Embedded Systems: The AlexNet and VGG-16 Case. In 2018 17th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN). 212--223.
[3]
Maria Baldeon-Calisto and Susana K. Lai-Yuen. 2020. AdaResU-Net: Multiobjective adaptive convolutional neural network for medical image segmentation. Neurocomputing 392 (2020), 325--340.
[4]
Ali Borji, Simone Frintrop, Dicky N. Sihite, and Laurent Itti. 2012. Adaptive object tracking by learning background context. In 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. 23--30.
[5]
Alessio Burrello, Angelo Garofalo, Nazareno Bruschi, Giuseppe Tagliavini, Davide Rossi, and Francesco Conti. 2021. DORY: Automatic End-to-End Deployment of Real-World DNNs on Low-Cost IoT MCUs. IEEE Trans. Comput. 70, 8 (2021), 1253--1268.
[6]
Shijie Cao, Lingxiao Ma, Wencong Xiao, Chen Zhang, Yunxin Liu, Lintao Zhang, Lanshun Nie, and Zhi Yang. 2019. SeerNet: Predicting Convolutional Neural Network Feature-Map Sparsity Through Low-Bit Quantization. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 11208--11217.
[7]
Lukas Cavigelli and Luca Benini. 2020. CBinfer: Exploiting Frame-to-Frame Locality for Faster Convolutional Network Inference on Video Streams. IEEE Transactions on Circuits and Systems for Video Technology 30, 5 (2020), 1451--1465.
[8]
Ziqian Chen, Shiqi Wang, Dapeng Oliver Wu, Tiejun Huang, and Ling-Yu Duan. 2018. From Data to Knowledge: Deep Learning Model Compression, Transmission and Communication. In Proceedings of the 26th ACM International Conference on Multimedia (Seoul, Republic of Korea) (MM '18). Association for Computing Machinery, New York, NY, USA, 1625--1633.
[9]
Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cuDNN: Efficient Primitives for Deep Learning. CoRR abs/1410.0759 (2014). arXiv:1410.0759 http://arxiv.org/abs/1410.0759
[10]
L. Dagum and R. Menon. 1998. OpenMP: an industry standard API for shared-memory programming. IEEE Computational Science and Engineering 5, 1 (1998), 46--55.
[11]
Robert David, Jared Duke, Advait Jain, Vijay Janapa Reddi, Nat Jeffries, Jian Li, Nick Kreeger, Ian Nappier, Meghna Natraj, Tiezhen Wang, Pete Warden, and Rocky Rhodes. 2021. TensorFlow Lite Micro: Embedded Machine Learning for TinyML Systems. In Proceedings of Machine Learning and Systems, Vol. 3. 800--811.
[12]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248--255.
[13]
Simone Disabato, Manuel Roveri, and Cesare Alippi. 2021. Distributed Deep Convolutional Neural Networks for the Internet-of-Things. IEEE Trans. Comput. 70, 8 (2021), 1239--1252.
[14]
Xuanyi Dong, Junshi Huang, Yi Yang, and Shuicheng Yan. 2017. More is Less: A More Complicated Network with Less Inference Complexity. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1895--1903.
[15]
Graham Gobieski, Brandon Lucia, and Nathan Beckmann. 2019. Intelligence Beyond the Edge: Inference on Intermittent Embedded Systems. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (Providence, RI, USA) (ASP-LOS '19). Association for Computing Machinery, New York, NY, USA, 199--213.
[16]
Amirhossein Habibian, Davide Abati, Taco S. Cohen, and Babak Ehteshami Bejnordi. 2021. Skip-Convolutions for Efficient Video Processing. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2694--2703.
[17]
John A Hartigan and Manchek A Wong. 1979. Algorithm AS 136: A K-means Clustering Algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics) 28 (1979), 100--108.
[18]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770--778.
[19]
Changwan Hong, Aravind Sukumaran-Rajam, Israt Nisa, Kunal Singh, and P. Sadayappan. 2019. Adaptive Sparse Tiling for Sparse Matrix Multiplication. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming.
[20]
Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. CoRR abs/1704.04861 (2017). arXiv:1704.04861 http://arxiv.org/abs/1704.04861
[21]
Forrest N. Iandola, Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. CoRR abs/1602.07360 (2016). arXiv:1602.07360 http://arxiv.org/abs/1602.07360
[22]
Shiqi Jiang, Zhiqi Lin, Yuanchun Li, Yuanchao Shu, and Yunxin Liu. 2021. Flexible high-resolution object detection on edge devices with tunable latency. In Proceedings of the 27th Annual International Conference on Mobile Computing and Networking. 559--572.
[23]
Xiaotang Jiang, Huan Wang, Yiliu Chen, Ziqi Wu, Lichuan Wang, Bin Zou, Yafeng Yang, Zongyang Cui, Yu Cai, Tianhang Yu, Chengfei Lyu, and Zhihua Wu. 2020. MNN: A Universal and Efficient Inference Engine. In Proceedings of Machine Learning and Systems, Vol. 2. 1--13.
[24]
Namhyung Kim, Hanmin Park, Dongwoo Lee, Sungbum Kang, Jinho Lee, and Kiyoung Choi. 2022. ComPreEND: Computation Pruning through Predictive Early Negative Detection for ReLU in a Deep Neural Network Accelerator. IEEE Trans. Comput. 71, 7 (2022), 1537--1550.
[25]
Hugo Larochelle, Dumitru Erhan, Aaron C. Courville, James Bergstra, and Yoshua Bengio. 2007. An empirical evaluation of deep architectures on problems with many factors of variation. In Machine Learning, Proceedings of the Twenty-Fourth International Conference (ICML 2007), Corvalis, Oregon, USA, June 20--24, 2007 (ACM International Conference Proceeding Series, Vol. 227). ACM, 473--480.
[26]
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.
[27]
Seulki Lee and Shahriar Nirjon. 2020. Learning in the Wild: When, How, and What to Learn for On-Device Dataset Adaptation. In Proceedings of the 2nd International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things (Virtual Event, Japan) (AIChallengeIoT '20). Association for Computing Machinery, New York, NY, USA, 34--40.
[28]
Yanhong Li, D. Lopresti, G. Nagy, and A. Tomkins. 1996. Validation of image defect models for optical character recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 18, 2 (1996), 99--107.
[29]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In Computer Vision - ECCV 2014. Springer International Publishing, Cham, 740--755.
[30]
Yingyan Lin, Charbel Sakr, Yongjune Kim, and Naresh Shanbhag. 2017. Pre-dictiveNet: An energy-efficient convolutional neural network via zero prediction. In 2017 IEEE International Symposium on Circuits and Systems (ISCAS). 1--4.
[31]
Bingyan Liu, Yuanchun Li, Yunxin Liu, Yao Guo, and Xiangqun Chen. 2020. Pmc: A privacy-preserving deep learning model customization framework for edge computing. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 4 (2020), 1--25.
[32]
Shuying Liu and Weihong Deng. 2015. Very deep convolutional neural network based image classification using small training sample size. In 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR). 730--734.
[33]
Saeeda Naz, Khizar Hayat, Muhammad Imran Razzak, Muhammad Waqas Anwar, Sajjad A. Madani, and Samee U. Khan. 2014. The Optical Character Recognition of Urdu-like Cursive Scripts. Pattern Recogn. 47, 3 (mar 2014), 1229--1248.
[34]
Nihui. 2018. NCNN is a high-performance neural network inference framework optimized for the mobile platform. http://github.com/tencent/ncnn.
[35]
Lin Ning and Xipeng Shen. 2019. Deep Reuse: Streamline CNN Inference on the Fly via Coarse-Grained Computation Reuse. Association for Computing Machinery, New York, NY, USA.
[36]
NVIDIA. 2022. cuBLAS Library. https://developer.nvidia.com/cublas
[37]
Bowen Pan, Wuwei Lin, Xiaolin Fang, Chaoqin Huang, Bolei Zhou, and Cewu Lu. 2018. Recurrent Residual Module for Fast Inference in Videos. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1536--1545.
[38]
Mathias Parger, Chengcheng Tang, Christopher D. Twigg, Cem Keskin, Robert Wang, and Markus Steinberger. 2022. DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in Videos. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 12487--12496.
[39]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1 (Montreal, Canada) (NIPS'15). MIT Press, Cambridge, MA, USA, 91--99.
[40]
Francisco Romero, Qian Li 0027, Neeraja J. Yadwadkar, and Christos Kozyrakis. 2021. INFaaS: Automated Model-less Inference Serving. In 2021 USENIX Annual Technical Conference, USENIX ATC 2021, July 14--16, 2021. USENIX Association, 397--411. https://www.usenix.org/conference/atc21/presentation/romero
[41]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015. Springer International Publishing, Cham, 234--241.
[42]
Lili Song, Ying Wang, Yinhe Han, Xin Zhao, Bosheng Liu, and Xiaowei Li. 2016. C-Brain: A Deep Learning Accelerator That Tames the Diversity of CNNs through Adaptive Data-Level Parallelization. In Proceedings of the 53rd Annual Design Automation Conference (Austin, Texas) (DAC'16). Association for Computing Machinery, New York, NY, USA, Article 123, 6 pages.
[43]
Mingcong Song, Jiechen Zhao, Yang Hu, Jiaqi Zhang, and Tao Li. 2018. Prediction Based Execution on Deep Neural Networks. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). 752--763.
[44]
Ke Tan and DeLiang Wang. 2021. Towards Model Compression for Deep Learning Based Speech Enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing 29 (2021), 1785--1794.
[45]
TIANCHI-Alibaba. 2018. Industrial Defect Dataset. https://tianchi.aliyun.com/competition/entrance/231682/introduction?lang=en-us
[46]
Toshiaki Wakatsuki, Sekitoshi Kanai, and Yasuhiro Fujiwara. 2021. Accelerate Inference of CNNs for Video Analysis While Preserving Exactness Exploiting Activation Sparsity. In Proceedings of Machine Learning and Systems, Vol. 3. 860--872.
[47]
Chao Wang, Lei Gong, Qi Yu, Xi Li, Yuan Xie, and Xuehai Zhou. 2017. DLAU: A Scalable Deep Learning Accelerator Unit on FPGA. Trans. Comp.-Aided Des. Integ. Cir. Sys. 36, 3 (mar 2017), 513--517.
[48]
Qiang Wang, Li Zhang, Luca Bertinetto, Weiming Hu, and Philip H.S. Torr. 2019. Fast Online Object Tracking and Segmentation: A Unifying Approach. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1328--1338.
[49]
Guojun Wen, Zhijun Gao, Qi Cai, Yudan Wang, and Shuang Mei. 2020. A Novel Method Based on Deep Convolutional Neural Networks for Wafer Semiconductor Surface Defect Inspection. IEEE Transactions on Instrumentation and Measurement 69, 12 (2020), 9668--9680.
[50]
Hao Wen, Yuanchun Li, Zunshuai Zhang, Shiqi Jiang, Xiaozhou Ye, Ye Ouyang, Ya-Qin Zhang, and Yunxin Liu. 2023. AdaptiveNet: Post-deployment Neural Architecture Adaptation for Diverse Edge Environments. arXiv preprint arXiv:2303.07129 (2023).
[51]
Xinxin Wu, Zhihua Fan, Tianyu Liu, Wenming Li, Xiaochun Ye, and Dongrui Fant. 2022. LRP: Predictive output activation based on SVD approach for CNN s acceleration. In 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE). 831--836.
[52]
Luofeng Xie, Xiao Xiang, Huining Xu, Ling Wang, Lijun Lin, and Guofu Yin. 2021. FFCNN: A Deep Neural Network for Surface Defect Detection of Magnetic Tile. IEEE Transactions on Industrial Electronics 68, 4 (2021), 3506--3516.
[53]
Mengwei Xu, Mengze Zhu, Yunxin Liu, Felix Xiaozhu Lin, and Xuanzhe Liu. 2018. DeepCache: Principled Cache for Mobile Deep Vision. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking (New Delhi, India) (MobiCom '18). Association for Computing Machinery, New York, NY, USA, 129--144.
[54]
Jinrui Zhang, Huan Yang, Ju Ren, Deyu Zhang, Bangwen He, Ting Cao, Yuanchun Li, Yaoxue Zhang, and Yunxin Liu. 2022. MobiDepth: real-time depth estimation using on-device dual cameras. In Proceedings of the 28th Annual International Conference on Mobile Computing And Networking. 528--541.
[55]
Minjia Zhang, Samyam Rajbhandari, Wenhan Wang, and Yuxiong He. 2018. DeepCPU: Serving RNN-based Deep Learning Models 10x Faster. In 2018 USENIX Annual Technical Conference, USENIX ATC 2018, Boston, MA, USA, July 11--13, 2018. USENIX Association, 951--965. https://www.usenix.org/conference/atc18/presentation/zhang-minjia
[56]
Jingyang Zhu, Jingbo Jiang, Xizi Chen, and Chi-Ying Tsui. 2018. SparseNN: An energy-efficient neural network accelerator exploiting input and output sparsity. In 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE). 241--244.
[57]
Jingyang Zhu, Zhiliang Qian, and Chi-Ying Tsui. 2016. LRADNN: High-throughput and energy-efficient Deep Neural Network accelerator using Low Rank Approximation. In 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC). 581--586.
[58]
Zhe Zhu, Dun Liang, Songhai Zhang, Xiaolei Huang, Baoli Li, and Shimin Hu. 2016. Traffic-Sign Detection and Classification in the Wild. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2110--2118.

Cited By

View all
  • (2024)Artificial Intelligence of Things: A SurveyACM Transactions on Sensor Networks10.1145/369063921:1(1-75)Online publication date: 30-Aug-2024
  • (2024)AdaShadow: Responsive Test-time Model Adaptation in Non-stationary Mobile EnvironmentsProceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems10.1145/3666025.3699339(295-308)Online publication date: 4-Nov-2024
  • (2024)F2Zip: Finetuning-Free Model Compression for Scenario-Adaptive Embedded VisionProceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems10.1145/3666025.3699319(15-27)Online publication date: 4-Nov-2024
  • Show More Cited By

Index Terms

  1. ConvReLU++: Reference-based Lossless Acceleration of Conv-ReLU Operations on Mobile CPU

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MobiSys '23: Proceedings of the 21st Annual International Conference on Mobile Systems, Applications and Services
      June 2023
      651 pages
      ISBN:9798400701108
      DOI:10.1145/3581791
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      In-Cooperation

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 18 June 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. CNN inference
      2. lossless acceleration
      3. early negative detection
      4. mobile CPU

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      MobiSys '23
      Sponsor:

      Acceptance Rates

      MobiSys '23 Paper Acceptance Rate 41 of 198 submissions, 21%;
      Overall Acceptance Rate 274 of 1,679 submissions, 16%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)222
      • Downloads (Last 6 weeks)28
      Reflects downloads up to 27 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Artificial Intelligence of Things: A SurveyACM Transactions on Sensor Networks10.1145/369063921:1(1-75)Online publication date: 30-Aug-2024
      • (2024)AdaShadow: Responsive Test-time Model Adaptation in Non-stationary Mobile EnvironmentsProceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems10.1145/3666025.3699339(295-308)Online publication date: 4-Nov-2024
      • (2024)F2Zip: Finetuning-Free Model Compression for Scenario-Adaptive Embedded VisionProceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems10.1145/3666025.3699319(15-27)Online publication date: 4-Nov-2024
      • (2024)Mobile Foundation Model as FirmwareProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3649361(279-295)Online publication date: 29-May-2024

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media