skip to main content
10.1145/3299874.3319492acmconferencesArticle/Chapter ViewAbstractPublication PagesglsvlsiConference Proceedingsconference-collections
research-article

ADMM-based Weight Pruning for Real-Time Deep Learning Acceleration on Mobile Devices

Published: 13 May 2019 Publication History

Abstract

Deep learning solutions are being increasingly deployed in mobile applications, at least for the inference phase. Due to the large model size and computational requirements, model compression for deep neural networks (DNNs) becomes necessary, especially considering the real-time requirement in embedded systems. In this paper, we extend the prior work on systematic DNN weight pruning using ADMM (Alternating Direction Method of Multipliers). We integrate ADMM regularization with masked mapping/retraining, thereby guaranteeing solution feasibility and providing high solution quality. Besides superior performance on representative DNN benchmarks (e.g., AlexNet, ResNet), we focus on two new applications facial emotion detection and eye tracking, and develop a top-down framework of DNN training, model compression, and acceleration in mobile devices. Experimental results show that with negligible accuracy degradation, the proposed method can achieve significant storage/memory reduction and speedup in mobile devices.

References

[1]
Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, Jonathan Eckstein, et al. 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning 3, 1 (2011), 1--122.
[2]
G. Bradski. 2000. The OpenCV Library. Dr. Dobb's Journal of Software Tools (2000).
[3]
Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014).
[4]
Xiaoliang Dai, Hongxu Yin, and Niraj K. Jha. 2017. NeST: a neural network synthesis tool based on a grow-and-prune paradigm. arXiv preprint arXiv:1711.02017 (2017).
[5]
Gene H. Golub and Charles F. Van Loan. 2012. Matrix computations. Vol. 3. JHU press.
[6]
Ian J. Goodfellow, Dumitru Erhan, Pierre Luc Carrier, Aaron Courville, Mehdi Mirza, Ben Hamner, Will Cukierski, Yichuan Tang, David Thaler, Dong-Hyun Lee, et al. 2013. Challenges in representation learning: A report on three machine learning contests. In International Conference on Neural Information Processing. Springer, 117--124.
[7]
Gaël Guennebaud, Benoît Jacob, et al. 2010. Eigen v3. http://eigen.tuxfamily.org. (2010).
[8]
Yiwen Guo, Anbang Yao, and Yurong Chen. 2016. Dynamic network surgery for efficient dnns. In Advances In Neural Information Processing Systems. 1379--1387.
[9]
Song Han, Huizi Mao, and William J. Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).
[10]
Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning both weights and connections for efficient neural network. In Advances in neural information processing systems. 1135--1143.
[11]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[12]
Yihui He, Xiangyu Zhang, and Jian Sun. 2017. Channel pruning for accelerating very deep neural networks. In International Conference on Computer Vision (ICCV), Vol. 2.
[13]
Jostine Ho. 2016. Facial Emotion Recognition. https://github.com/JostineHo/mememoji. (2016).
[14]
Mingyi Hong, Zhi-Quan Luo, and Meisam Razaviyayn. 2016. Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM Journal on Optimization 26, 1 (2016), 337--364.
[15]
Joel Janai, Fatma Güney, Aseem Behl, and Andreas Geiger. 2017. Computer vision for autonomous vehicles: Problems, datasets and state-of-the-art. arXiv preprint arXiv:1704.05519 (2017).
[16]
Kyle Krafka, Aditya Khosla, Petr Kellnhofer, Harini Kannan, Suchendra Bhandarkar, Wojciech Matusik, and Antonio Torralba. 2016. Eye Tracking for Everyone. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.
[18]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278-- 2324.
[19]
Zhe Li, Xiaolong Ma, Hongjia Li, Qiyuan An, Aditya Singh Rathore, Qinru Qiu, Wenyao Xu, and Yanzhi Wang. 2018. C3PO: Database and Benchmark for Earlystage Malicious Activity Detection in 3D Printing. arXiv preprint arXiv:1803.07544 (2018).
[20]
Sheng Lin, Ning Liu, Mahdi Nazemi, Hongjia Li, Caiwen Ding, Yanzhi Wang, and Massoud Pedram. 2018. FFT-based deep learning deployment in embedded systems. In 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1045--1050.
[21]
Sijia Liu, Jie Chen, Pin-Yu Chen, and Alfred O. Hero. 2017. Zeroth-order online alternating direction method of multipliers: Convergence analysis and applications. arXiv preprint arXiv:1710.07804 (2017).
[22]
Huizi Mao, Song Han, Jeff Pool,Wenshuo Li, Xingyu Liu, YuWang, and William J. Dally. 2017. Exploring the regularity of sparse structure in convolutional neural networks. arXiv preprint arXiv:1705.08922 (2017).
[23]
Hua Ouyang, Niao He, Long Tran, and Alexander Gray. 2013. Stochastic alternating direction method of multipliers. In International Conference on Machine Learning. 80--88.
[24]
Taiji Suzuki. 2013. Dual averaging and proximal gradient descent for online alternating direction multiplier method. In International Conference on Machine Learning. 392--400.
[25]
Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, Mohammadhadi Bagheri, and Ronald M. Summers. 2017. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2097--2106.
[26]
Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. 2016. Learning structured sparsity in deep neural networks. In Advances in Neural Information Processing Systems. 2074--2082.
[27]
Tianyun Zhang, Shaokai Ye, Kaiqi Zhang, Jian Tang, Wujie Wen, Makan Fardad, and Yanzhi Wang. 2018. A systematic DNN weight pruning framework using alternating direction method of multipliers. arXiv preprint arXiv:1804.03294 (2018).

Cited By

View all
  • (2024)Pruning Deep Neural Networks for Green Energy-Efficient Models: A SurveyCognitive Computation10.1007/s12559-024-10313-0Online publication date: 5-Jul-2024
  • (2023)Self-Attentive Pooling for Efficient Deep Learning2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV56688.2023.00396(3963-3972)Online publication date: Jan-2023
  • (2023)Convolutional neural network pruning based on multi-objective feature map selection for image classificationApplied Soft Computing10.1016/j.asoc.2023.110229139:COnline publication date: 1-May-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
GLSVLSI '19: Proceedings of the 2019 Great Lakes Symposium on VLSI
May 2019
562 pages
ISBN:9781450362528
DOI:10.1145/3299874
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. acceleration
  2. mobile devices
  3. neural networks
  4. real-time

Qualifiers

  • Research-article

Funding Sources

  • National Science Foundation Awards

Conference

GLSVLSI '19
Sponsor:
GLSVLSI '19: Great Lakes Symposium on VLSI 2019
May 9 - 11, 2019
VA, Tysons Corner, USA

Acceptance Rates

Overall Acceptance Rate 312 of 1,156 submissions, 27%

Upcoming Conference

GLSVLSI '25
Great Lakes Symposium on VLSI 2025
June 30 - July 2, 2025
New Orleans , LA , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)27
  • Downloads (Last 6 weeks)3
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Pruning Deep Neural Networks for Green Energy-Efficient Models: A SurveyCognitive Computation10.1007/s12559-024-10313-0Online publication date: 5-Jul-2024
  • (2023)Self-Attentive Pooling for Efficient Deep Learning2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV56688.2023.00396(3963-3972)Online publication date: Jan-2023
  • (2023)Convolutional neural network pruning based on multi-objective feature map selection for image classificationApplied Soft Computing10.1016/j.asoc.2023.110229139:COnline publication date: 1-May-2023
  • (2023)Accelerated Stochastic Peaceman–Rachford Method for Empirical Risk MinimizationJournal of the Operations Research Society of China10.1007/s40305-023-00470-811:4(783-807)Online publication date: 31-Mar-2023
  • (2022)Diagnosis of Lumbar Spondylolisthesis Using a Pruned CNN ModelComputational and Mathematical Methods in Medicine10.1155/2022/27223152022(1-10)Online publication date: 10-May-2022
  • (2022)Towards Sparsification of Graph Neural Networks2022 IEEE 40th International Conference on Computer Design (ICCD)10.1109/ICCD56317.2022.00048(272-279)Online publication date: Oct-2022
  • (2022)Inference Time Reduction of Deep Neural Networks on Embedded Devices: A Case Study2022 25th Euromicro Conference on Digital System Design (DSD)10.1109/DSD57027.2022.00036(205-213)Online publication date: Aug-2022
  • (2022)Methods for Pruning Deep Neural NetworksIEEE Access10.1109/ACCESS.2022.318265910(63280-63300)Online publication date: 2022
  • (2022)Designing efficient convolutional neural network structureNeurocomputing10.1016/j.neucom.2021.08.158489:C(139-156)Online publication date: 7-Jun-2022
  • (2021)RT-mDLProceedings of the 19th ACM Conference on Embedded Networked Sensor Systems10.1145/3485730.3485938(1-14)Online publication date: 15-Nov-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media