research-article

ADMM-based Weight Pruning for Real-Time Deep Learning Acceleration on Mobile Devices

Authors:

Yanzhi WangAuthors Info & Claims

GLSVLSI '19: Proceedings of the 2019 Great Lakes Symposium on VLSI

Pages 501 - 506

https://doi.org/10.1145/3299874.3319492

Published: 13 May 2019 Publication History

Abstract

Deep learning solutions are being increasingly deployed in mobile applications, at least for the inference phase. Due to the large model size and computational requirements, model compression for deep neural networks (DNNs) becomes necessary, especially considering the real-time requirement in embedded systems. In this paper, we extend the prior work on systematic DNN weight pruning using ADMM (Alternating Direction Method of Multipliers). We integrate ADMM regularization with masked mapping/retraining, thereby guaranteeing solution feasibility and providing high solution quality. Besides superior performance on representative DNN benchmarks (e.g., AlexNet, ResNet), we focus on two new applications facial emotion detection and eye tracking, and develop a top-down framework of DNN training, model compression, and acceleration in mobile devices. Experimental results show that with negligible accuracy degradation, the proposed method can achieve significant storage/memory reduction and speedup in mobile devices.

References

[1]

Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, Jonathan Eckstein, et al. 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning 3, 1 (2011), 1--122.

Digital Library

[2]

G. Bradski. 2000. The OpenCV Library. Dr. Dobb's Journal of Software Tools (2000).

[3]

Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014).

[4]

Xiaoliang Dai, Hongxu Yin, and Niraj K. Jha. 2017. NeST: a neural network synthesis tool based on a grow-and-prune paradigm. arXiv preprint arXiv:1711.02017 (2017).

[5]

Gene H. Golub and Charles F. Van Loan. 2012. Matrix computations. Vol. 3. JHU press.

[6]

Ian J. Goodfellow, Dumitru Erhan, Pierre Luc Carrier, Aaron Courville, Mehdi Mirza, Ben Hamner, Will Cukierski, Yichuan Tang, David Thaler, Dong-Hyun Lee, et al. 2013. Challenges in representation learning: A report on three machine learning contests. In International Conference on Neural Information Processing. Springer, 117--124.

[7]

Gaël Guennebaud, Benoît Jacob, et al. 2010. Eigen v3. http://eigen.tuxfamily.org. (2010).

[8]

Yiwen Guo, Anbang Yao, and Yurong Chen. 2016. Dynamic network surgery for efficient dnns. In Advances In Neural Information Processing Systems. 1379--1387.

Digital Library

[9]

Song Han, Huizi Mao, and William J. Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).

[10]

Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning both weights and connections for efficient neural network. In Advances in neural information processing systems. 1135--1143.

Digital Library

[11]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[12]

Yihui He, Xiangyu Zhang, and Jian Sun. 2017. Channel pruning for accelerating very deep neural networks. In International Conference on Computer Vision (ICCV), Vol. 2.

[13]

Jostine Ho. 2016. Facial Emotion Recognition. https://github.com/JostineHo/mememoji. (2016).

[14]

Mingyi Hong, Zhi-Quan Luo, and Meisam Razaviyayn. 2016. Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM Journal on Optimization 26, 1 (2016), 337--364.

[15]

Joel Janai, Fatma Güney, Aseem Behl, and Andreas Geiger. 2017. Computer vision for autonomous vehicles: Problems, datasets and state-of-the-art. arXiv preprint arXiv:1704.05519 (2017).

[16]

Kyle Krafka, Aditya Khosla, Petr Kellnhofer, Harini Kannan, Suchendra Bhandarkar, Wojciech Matusik, and Antonio Torralba. 2016. Eye Tracking for Everyone. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.

Digital Library

[18]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278-- 2324.

[19]

Zhe Li, Xiaolong Ma, Hongjia Li, Qiyuan An, Aditya Singh Rathore, Qinru Qiu, Wenyao Xu, and Yanzhi Wang. 2018. C3PO: Database and Benchmark for Earlystage Malicious Activity Detection in 3D Printing. arXiv preprint arXiv:1803.07544 (2018).

[20]

Sheng Lin, Ning Liu, Mahdi Nazemi, Hongjia Li, Caiwen Ding, Yanzhi Wang, and Massoud Pedram. 2018. FFT-based deep learning deployment in embedded systems. In 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1045--1050.

[21]

Sijia Liu, Jie Chen, Pin-Yu Chen, and Alfred O. Hero. 2017. Zeroth-order online alternating direction method of multipliers: Convergence analysis and applications. arXiv preprint arXiv:1710.07804 (2017).

[22]

Huizi Mao, Song Han, Jeff Pool,Wenshuo Li, Xingyu Liu, YuWang, and William J. Dally. 2017. Exploring the regularity of sparse structure in convolutional neural networks. arXiv preprint arXiv:1705.08922 (2017).

[23]

Hua Ouyang, Niao He, Long Tran, and Alexander Gray. 2013. Stochastic alternating direction method of multipliers. In International Conference on Machine Learning. 80--88.

Digital Library

[24]

Taiji Suzuki. 2013. Dual averaging and proximal gradient descent for online alternating direction multiplier method. In International Conference on Machine Learning. 392--400.

Digital Library

[25]

Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, Mohammadhadi Bagheri, and Ronald M. Summers. 2017. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2097--2106.

[26]

Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. 2016. Learning structured sparsity in deep neural networks. In Advances in Neural Information Processing Systems. 2074--2082.

Digital Library

[27]

Tianyun Zhang, Shaokai Ye, Kaiqi Zhang, Jian Tang, Wujie Wen, Makan Fardad, and Yanzhi Wang. 2018. A systematic DNN weight pruning framework using alternating direction method of multipliers. arXiv preprint arXiv:1804.03294 (2018).

Cited By

Tmamna JAyed EFourati RGogate MArslan THussain AAyed M(2024)Pruning Deep Neural Networks for Green Energy-Efficient Models: A SurveyCognitive Computation10.1007/s12559-024-10313-0Online publication date: 5-Jul-2024
https://doi.org/10.1007/s12559-024-10313-0
Chen FDatta GKundu SBeerel P(2023)Self-Attentive Pooling for Efficient Deep Learning2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV56688.2023.00396(3963-3972)Online publication date: Jan-2023
https://doi.org/10.1109/WACV56688.2023.00396
Jiang PXue YNeri F(2023)Convolutional neural network pruning based on multi-objective feature map selection for image classificationApplied Soft Computing10.1016/j.asoc.2023.110229139:COnline publication date: 1-May-2023
https://dl.acm.org/doi/10.1016/j.asoc.2023.110229
Show More Cited By

Index Terms

ADMM-based Weight Pruning for Real-Time Deep Learning Acceleration on Mobile Devices
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
  2. Modeling and simulation
    1. Simulation types and techniques
      1. Real-time simulation

Recommendations

PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning
ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems

With the emergence of a spectrum of high-end mobile devices, many applications that formerly required desktop-level computation capability are being transferred to these devices. However, executing Deep Neural Networks (DNNs) inference is still ...
Deep Learning on Mobile and Embedded Devices: State-of-the-art, Challenges, and Future Directions

Recent years have witnessed an exponential increase in the use of mobile and embedded devices. With the great success of deep learning in many fields, there is an emerging trend to deploy deep learning on mobile and embedded devices to better meet the ...
Exploring the Capabilities of Mobile Devices Supporting Deep Learning
HPDC '18: Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing

With the increasingly more powerful mobile devices, it becomes possible to perform more deep learning tasks on the devices, and there are also important advantages of learning on devices, such as personalization and efficiency. However, a good ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

GLSVLSI '19: Proceedings of the 2019 Great Lakes Symposium on VLSI

May 2019

562 pages

ISBN:9781450362528

DOI:10.1145/3299874

General Chairs:
Houman Homayoun
George Mason University, USA
,
Baris Taskin
Drexel University, USA
,
Program Chairs:
Tinoosh Mohsenin
UMBC, USA
,
Weisheng Zhao
Beihang University, China

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation Awards

Conference

GLSVLSI '19

Sponsor:

SIGDA

GLSVLSI '19: Great Lakes Symposium on VLSI 2019

May 9 - 11, 2019

VA, Tysons Corner, USA

Acceptance Rates

Overall Acceptance Rate 312 of 1,156 submissions, 27%

Upcoming Conference

GLSVLSI '25

Sponsor:
sigda

Great Lakes Symposium on VLSI 2025

June 30 - July 2, 2025

New Orleans , LA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
477
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)3

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Tmamna JAyed EFourati RGogate MArslan THussain AAyed M(2024)Pruning Deep Neural Networks for Green Energy-Efficient Models: A SurveyCognitive Computation10.1007/s12559-024-10313-0Online publication date: 5-Jul-2024
https://doi.org/10.1007/s12559-024-10313-0
Chen FDatta GKundu SBeerel P(2023)Self-Attentive Pooling for Efficient Deep Learning2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV56688.2023.00396(3963-3972)Online publication date: Jan-2023
https://doi.org/10.1109/WACV56688.2023.00396
Jiang PXue YNeri F(2023)Convolutional neural network pruning based on multi-objective feature map selection for image classificationApplied Soft Computing10.1016/j.asoc.2023.110229139:COnline publication date: 1-May-2023
https://dl.acm.org/doi/10.1016/j.asoc.2023.110229
Bai JBian FChang XDu L(2023)Accelerated Stochastic Peaceman–Rachford Method for Empirical Risk MinimizationJournal of the Operations Research Society of China10.1007/s40305-023-00470-811:4(783-807)Online publication date: 31-Mar-2023
https://doi.org/10.1007/s40305-023-00470-8
Saravagi DAgrawal SSaravagi MRahman M(2022)Diagnosis of Lumbar Spondylolisthesis Using a Pruned CNN ModelComputational and Mathematical Methods in Medicine10.1155/2022/27223152022(1-10)Online publication date: 10-May-2022
https://doi.org/10.1155/2022/2722315
Peng HGurevin DHuang SGeng TJiang WKhan ODing C(2022)Towards Sparsification of Graph Neural Networks2022 IEEE 40th International Conference on Computer Design (ICCD)10.1109/ICCD56317.2022.00048(272-279)Online publication date: Oct-2022
https://doi.org/10.1109/ICCD56317.2022.00048
Sadou INabavinejad SLu ZEbrahimi M(2022)Inference Time Reduction of Deep Neural Networks on Embedded Devices: A Case Study2022 25th Euromicro Conference on Digital System Design (DSD)10.1109/DSD57027.2022.00036(205-213)Online publication date: Aug-2022
https://doi.org/10.1109/DSD57027.2022.00036
Vadera SAmeen S(2022)Methods for Pruning Deep Neural NetworksIEEE Access10.1109/ACCESS.2022.318265910(63280-63300)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3182659
Mi JFeng JHuang K(2022)Designing efficient convolutional neural network structureNeurocomputing10.1016/j.neucom.2021.08.158489:C(139-156)Online publication date: 7-Jun-2022
https://dl.acm.org/doi/10.1016/j.neucom.2021.08.158
Ling NWang KHe YXing GXie D(2021)RT-mDLProceedings of the 19th ACM Conference on Embedded Networked Sensor Systems10.1145/3485730.3485938(1-14)Online publication date: 15-Nov-2021
https://dl.acm.org/doi/10.1145/3485730.3485938
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten