research-article

Quantized Sparse Training: A Unified Trainable Framework for Joint Pruning and Quantization in DNNs

Authors:

Jun-Hyung Park,

Sangkeun LeeAuthors Info & Claims

ACM Transactions on Embedded Computing Systems (TECS), Volume 21, Issue 5

Article No.: 60, Pages 1 - 22

https://doi.org/10.1145/3524066

Published: 08 October 2022 Publication History

Abstract

Deep neural networks typically have extensive parameters and computational operations. Pruning and quantization techniques have been widely used to reduce the complexity of deep models. Both techniques can be jointly used for realizing significantly higher compression ratios. However, separate optimization processes and difficulties in choosing the hyperparameters limit the application of both the techniques simultaneously. In this study, we propose a novel compression framework, termed as quantized sparse training, that prunes and quantizes networks jointly in a unified training process. We integrate pruning and quantization into a gradient-based optimization process based on the straight-through estimator. Quantized sparse training enables us to simultaneously train, prune, and quantize a network from scratch. The empirical results validate the superiority of the proposed methodology over the recent state-of-the-art baselines with respect to both the model size and accuracy. Specifically, quantized sparse training achieves a 135 KB model size in the case of VGG16, without any accuracy degradation, which is 40% of the model size feasible based on the state-of-the-art pruning and quantization approach.

References

[1]

Hamzah Abdelaziz, Jong Hoon Shin, Ardavan Pedram, Joseph Hassoun, et al. 2021. Rethinking floating point overheads for mixed precision DNN accelerators. Proc. Mach. Learn. Syst. 3 (2021), 223–239.

[2]

Jose M. Alvarez and Mathieu Salzmann. 2016. Learning the number of neurons in deep networks. In Advances in Neural Information Processing Systems. MIT Press, 2270–2278.

[3]

Jimmy Ba and Rich Caruana. 2014. Do deep nets really need to be deep? In Advances in Neural Information Processing Systems. MIT Press, 2654–2662.

[4]

Ron Banner, Yury Nahshan, Elad Hoffer, and Daniel Soudry. 2019. Post-training 4-bit quantization of convolution networks for rapid-deployment. In Advances in Neural Information Processing Systems. MIT Press.

[5]

Guillaume Bellec, David Kappel, Wolfgang Maass, and Robert Legenstein. 2017. Deep rewiring: Training very sparse deep networks. In Proceedings of the International Conference on Learning Representations.

[6]

Yoshua Bengio, Nicholas Léonard, and Aaron Courville. 2013. Estimating or propagating gradients through stochastic neurons for conditional computation. Retrieved from https://arXiv:1308.3432.

[7]

Yash Bhalgat, Jinwon Lee, Markus Nagel, Tijmen Blankevoort, and Nojun Kwak. 2020. LSQ+: Improving low-bit quantization through learnable offsets and better initialization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 696–697.

[8]

Tianqi Chen, Ian Goodfellow, and Jonathon Shlens. 2015. Net2net: Accelerating learning via knowledge transfer. Retrieved from https://arXiv:1511.05641.

[9]

Yoni Choukroun, Eli Kravchik, Fan Yang, and Pavel Kisilev. 2019. Low-bit quantization of neural networks for efficient inference. In Proceedings of the ICCV Workshops. 3009–3018.

[10]

Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or \(-1\). Retrieved from https://arXiv:1602.02830.

[11]

Xiaoliang Dai, Hongxu Yin, and Niraj K. Jha. 2019. NeST: A neural network synthesis tool based on a grow-and-prune paradigm. IEEE Trans. Comput. 68, 10 (2019), 1487–1497.

[12]

Emily L. Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, and Rob Fergus. 2014. Exploiting linear structure within convolutional networks for efficient evaluation. In Advances in Neural Information Processing Systems. MIT Press, 1269–1277.

Digital Library

[13]

Tim Dettmers and Luke Zettlemoyer. 2019. Sparse networks from scratch: Faster training without losing performance. In Advances in Neural Information Processing Systems. MIT Press.

[14]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 4171–4186.

[15]

Xuanyi Dong and Yi Yang. 2019. Network pruning via transformable architecture search. Retrieved from https://arXiv:1905.09717.

[16]

Ahmed Elthakeb, Prannoy Pilligundla, FatemehSadat Mireshghallah, Amir Yazdanbakhsh, Sicuan Gao, and Hadi Esmaeilzadeh. 2019. Releq: An automatic reinforcement learning approach for deep quantization of neural networks. In Proceedings of the NeurIPS ML for Systems Workshop.

[17]

Steven K. Esser, Jeffrey L. McKinstry, Deepika Bablani, Rathinakumar Appuswamy, and Dharmendra S. Modha. 2020. Learned step size quantization. In Proceedings of the International Conference on Learning Representations.

[18]

Boyuan Feng, Yuke Wang, Tong Geng, Ang Li, and Yufei Ding. 2021. APNN-TC: Accelerating arbitrary precision neural networks on ampere GPU tensor cores. Retrieved from https://arXiv:2106.12169.

[19]

Timothy Foldy-Porto, Yeshwanth Venkatesha, and Priyadarshini Panda. 2020. Activation density driven energy-efficient pruning in training. Retrieved from https://arXiv:2002.02949.

[20]

Jonathan Frankle and Michael Carbin. 2019. The lottery ticket hypothesis: Finding sparse, trainable neural networks. In Proceedings of the International Conference on Learning Representations.

[21]

Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics. 249–256.

[22]

Yoav Goldberg. 2016. A primer on neural network models for natural language processing. J. Artific. Intell. Res. 57 (2016), 345–420.

Digital Library

[23]

Ruihao Gong, Xianglong Liu, Shenghu Jiang, Tianxiang Li, Peng Hu, Jiazhen Lin, Fengwei Yu, and Junjie Yan. 2019. Differentiable soft quantization: Bridging full-precision and low-bit neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4852–4861.

[24]

Ariel Gordon, Elad Eban, Ofir Nachum, Bo Chen, Hao Wu, Tien-Ju Yang, and Edward Choi. 2018. Morphnet: Fast & simple resource-constrained structure learning of deep networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1586–1595.

[25]

Yiwen Guo, Anbang Yao, and Yurong Chen. 2016. Dynamic network surgery for efficient dnns. In Advances in Neural Information Processing Systems. MIT Press, 1379–1387.

[26]

Song Han, Huizi Mao, and William J. Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. In Proceedings of the International Conference on Learning Representations.

[27]

Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems. MIT Press, 1135–1143.

Digital Library

[28]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision. 1026–1034.

Digital Library

[29]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.

[30]

Yang He, Guoliang Kang, Xuanyi Dong, Yanwei Fu, and Yi Yang. 2018. Soft filter pruning for accelerating deep convolutional neural networks. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2234–2240.

Digital Library

[31]

Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, and Song Han. 2018. AMC: Automl for model compression and acceleration on mobile devices. In Proceedings of the European Conference on Computer Vision. 784–800.

Digital Library

[32]

Yang He, Ping Liu, Ziwei Wang, Zhilan Hu, and Yi Yang. 2019. Filter pruning via geometric median for deep convolutional neural networks acceleration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4340–4349.

[33]

Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Brian Kingsbury, et al. 2012. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag. 29 (2012).

[34]

Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2704–2713.

[35]

Sambhav R. Jain, Albert Gural, Michael Wu, and Chris Dick. 2019. Trained uniform quantization for accurate and efficient neural network inference on fixed-point hardware. Retrieved from https://arXiv:1903.08066.

[36]

Eric Jang, Shixiang Gu, and Ben Poole. 2017. Categorical reparameterization with Gumbel-Softmax. In Proceedings of the International Conference on Learning Representations.

[37]

Zhe Jia, Marco Maggioni, Jeffrey Smith, and Daniele Paolo Scarpazza. 2019. Dissecting the NVidia turing T4 GPU via microbenchmarking. Retrieved from https://arXiv:1903.07486.

[38]

Jangho Kim, Yash Bhalgat, Jinwon Lee, Chirag Patel, and Nojun Kwak. 2019. QKD: Quantization-aware knowledge distillation. Retrieved from https://arXiv:1911.12491.

[39]

Raghuraman Krishnamoorthi. 2018. Quantizing deep convolutional networks for efficient inference: A whitepaper. Retrieved from https://arXiv:1806.08342.

[40]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. MIT Press, 1097–1105.

Digital Library

[41]

Yann LeCun, John S. Denker, and Sara A. Solla. 1990. Optimal brain damage. In Advances in Neural Information Processing Systems. MIT Press, 598–605.

Digital Library

[42]

Cong Leng, Zesheng Dou, Hao Li, Shenghuo Zhu, and Rong Jin. 2018. Extremely low bit neural network: Squeeze the last bit out with admm. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.

[43]

Fengfu Li, Bo Zhang, and Bin Liu. 2016. Ternary weight networks. Retrieved from https://arXiv:1605.04711.

[44]

Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. 2017. Pruning filters for efficient convnets. In Proceedings of the International Conference on Learning Representations.

[45]

Junjie Liu, Zhe Xu, Runbin Shi, Ray C. C. Cheung, and Hayden K. H. So. 2020. Dynamic sparse training: Find efficient sparse network from scratch with trainable masked layers. Retrieved from https://arXiv:2005.06870.

[46]

Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan, and Changshui Zhang. 2017. Learning efficient convolutional networks through network slimming. In Proceedings of the IEEE International Conference on Computer Vision. 2736–2744.

[47]

Zechun Liu, Haoyuan Mu, Xiangyu Zhang, Zichao Guo, Xin Yang, Kwang-Ting Cheng, and Jian Sun. 2019. Metapruning: Meta learning for automatic neural network channel pruning. In Proceedings of the IEEE International Conference on Computer Vision. 3296–3305.

[48]

Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, and Trevor Darrell. 2012. Neural networks for machine learning, coursera. Coursera, Video Lectures.

[49]

Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, and Trevor Darrell. 2019. Rethinking the value of network pruning. In Proceedings of the International Conference on Learning Representations.

[50]

Christos Louizos, Matthias Reisser, Tijmen Blankevoort, Efstratios Gavves, and Max Welling. 2019. Relaxed quantization for discretized neural networks. In Proceedings of the International Conference on Learning Representations.

[51]

Christos Louizos, Karen Ullrich, and Max Welling. 2017. Bayesian compression for deep learning. In Advances in Neural Information Processing Systems. MIT Press.

[52]

Jian-Hao Luo and Jianxin Wu. 2018. Autopruner: An end-to-end trainable filter pruning method for efficient deep model inference. Retrieved from https://arXiv:1805.08941.

[53]

Chris J. Maddison, Andriy Mnih, and Yee Whye Teh. 2016. The concrete distribution: A continuous relaxation of discrete random variables. Retrieved from https://arxiv.org/abs/1611.00712?context=cs.

[54]

Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, et al. 2018. Mixed precision training. In Proceedings of the International Conference on Learning Representations.

[55]

Breton Minnehan and Andreas Savakis. 2019. Cascaded projection: End-to-end network compression and acceleration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10715–10724.

[56]

Decebal Constantin Mocanu, Elena Mocanu, Peter Stone, Phuong H. Nguyen, Madeleine Gibescu, and Antonio Liotta. 2018. Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science. Nature Commun. 9, 1 (2018), 1–12.

[57]

Pavlo Molchanov, Arun Mallya, Stephen Tyree, Iuri Frosio, and Jan Kautz. 2019. Importance estimation for neural network pruning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11264–11272.

[58]

Hesham Mostafa and Xin Wang. 2019. Parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization. In Proceedings of the International Conference on Machine Learning. PMLR, 4646–4655.

[59]

Markus Nagel, Mart van Baalen, Tijmen Blankevoort, and Max Welling. 2019. Data-free quantization through weight equalization and bias correction. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1325–1334.

[60]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An imperative Style, high-performance deep learning library. In Advances in Neural Information Processing Systems. MIT Press, 8024–8035.

[61]

Eric Qin, Ananda Samajdar, Hyoukjun Kwon, Vineet Nadella, Sudarshan Srinivasan, Dipankar Das, Bharat Kaul, and Tushar Krishna. 2020. Sigma: A sparse and irregular gemm accelerator with flexible interconnects for dnn training. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA’20). IEEE, 58–70.

[62]

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4510–4520.

[63]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. Retrieved from https://arXiv:1409.1556.

[64]

Frederick Tung and Greg Mori. 2018. Clip-q: Deep network compression learning by in-parallel pruning-quantization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7873–7882.

[65]

Stefan Uhlich, Lukas Mauch, Fabien Cardinaux, Kazuki Yoshiyama, Javier Alonso Garcia, Stephen Tiedemann, Thomas Kemp, and Akira Nakamura. 2020. Mixed precision dnns: All you need is a good parametrization. In Proceedings of the International Conference on Learning Representations.

[66]

Mart van Baalen, Christos Louizos, Markus Nagel, Rana Ali Amjad, Ying Wang, Tijmen Blankevoort, and Max Welling. 2020. Bayesian bits: Unifying quantization and pruning. In Advances in Neural Information Processing Systems. MIT Press.

[67]

Karina Vasquez, Yeshwanth Venkatesha, Abhiroop Bhattacharjee, Abhishek Moitra, and Priyadarshini Panda. 2021. Activation density based mixed-precision quantization for energy efficient neural networks. Retrieved from https://arXiv:2101.04354.

[68]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Adv. Neural Info. Process. Syst. 30 (2017).

[69]

Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, and Song Han. 2019. Haq: Hardware-aware automated quantization with mixed precision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8612–8620.

[70]

Ziheng Wang. 2020. SparseRT: Accelerating unstructured sparsity on GPUs for deep learning inference. In Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques.

Digital Library

[71]

Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. 2016. Learning structured sparsity in deep neural networks. In Advances in Neural Information Processing Systems. MIT Press, 2074–2082.

[72]

Bichen Wu, Yanghan Wang, Peizhao Zhang, Yuandong Tian, Peter Vajda, and Kurt Keutzer. 2018. Mixed precision quantization of convnets via differentiable neural architecture search. Retrieved from https://arXiv:1812.00090.

[73]

Shoukai Xu, Haokun Li, Bohan Zhuang, Jing Liu, Jiezhang Cao, Chuangrun Liang, and Mingkui Tan. 2020. Generative low-bitwidth data free quantization. In Proceedings of the European Conference on Computer Vision. Springer, 1–17.

[74]

Haichuan Yang, Shupeng Gui, Yuhao Zhu, and Ji Liu. 2020. Automatic neural network compression by sparsity-quantization joint learning: A constrained optimization-based approach. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2178–2188.

[75]

Huanrui Yang, Wei Wen, and Hai Li. 2020. DeepHoyer: Learning sparser neural network with differentiable scale-invariant sparsity measures. In Proceedings of the International Conference on Learning Representations.

[76]

Shaokai Ye, Tianyun Zhang, Kaiqi Zhang, Jiayu Li, Jiaming Xie, Yun Liang, Sijia Liu, Xue Lin, and Yanzhi Wang. 2018. A unified framework of dnn weight pruning and weight clustering/quantization using admm. Retrieved from https://arXiv:1811.01907.

[77]

Hongxu Yin, Pavlo Molchanov, Jose M. Alvarez, Zhizhong Li, Arun Mallya, Derek Hoiem, Niraj K. Jha, and Jan Kautz. 2020. Dreaming to distill: Data-free knowledge transfer via deepinversion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8715–8724.

[78]

Penghang Yin, Jiancheng Lyu, Shuai Zhang, Stanley Osher, Yingyong Qi, and Jack Xin. 2019. Understanding straight-through estimator in training activation quantized neural nets. In Proceedings of the International Conference on Learning Representations.

[79]

Jie-Fang Zhang, Ching-En Lee, Chester Liu, Yakun Sophia Shao, Stephen W. Keckler, and Zhengya Zhang. 2020. SNAP: An efficient sparse neural acceleration processor for unstructured sparse deep neural network inference. IEEE J. Solid-State Circ. 56, 2 (2020), 636–647.

[80]

Tianyun Zhang, Shaokai Ye, Kaiqi Zhang, Jian Tang, Wujie Wen, Makan Fardad, and Yanzhi Wang. 2018. A systematic dnn weight pruning framework using alternating direction method of multipliers. In Proceedings of the European Conference on Computer Vision. 184–199.

Digital Library

[81]

Ritchie Zhao, Yuwei Hu, Jordan Dotzel, Chris De Sa, and Zhiru Zhang. 2019. Improving neural network quantization without retraining using outlier channel splitting. In Proceedings of the International Conference on Machine Learning. PMLR, 7543–7552.

[82]

Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, and Yurong Chen. 2017. Incremental network quantization: Towards lossless cnns with low-precision weights. In Proceedings of the International Conference on Learning Representations.

[83]

Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. 2016. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. Retrieved from https://arXiv:1606.06160.

[84]

Chen Zhu, Zheng Xu, Ali Shafahi, Manli Shu, Amin Ghiasi, and Tom Goldstein. 2020. Towards accurate quantization and pruning via data-free knowledge transfer. Retrieved from https://arXiv:2010.07334.

[85]

Barret Zoph and Quoc V. Le. 2017. Neural architecture search with reinforcement learning. In Proceedings of the International Conference on Learning Representations.

Cited By

Yao BLiu LPeng YPeng X(2024)Intelligent Measurement on Edge Devices Using Hardware Memory-Aware Joint Compression Enabled Neural NetworksIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2023.334112673(1-13)Online publication date: 2024
https://doi.org/10.1109/TIM.2023.3341126
Wang YLiu Q(2024)AQA: An Adaptive Post-Training Quantization Method for Activations of CNNsIEEE Transactions on Computers10.1109/TC.2024.339850373:8(2025-2035)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1109/TC.2024.3398503
Han YLong ZZhang YWu JFang ZFan R(2024)Generalized Correspondence Matching via Flexible Hierarchical Refinement and Patch Descriptor Distillation2024 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA57147.2024.10611456(10290-10297)Online publication date: 13-May-2024
https://doi.org/10.1109/ICRA57147.2024.10611456
Show More Cited By

Index Terms

Quantized Sparse Training: A Unified Trainable Framework for Joint Pruning and Quantization in DNNs
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Embedded systems
      1. Embedded software
2. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

A Compression Method for Object Detection Network Using Joint Pruning and Quantization
ISMSI '24: Proceedings of the 2024 8th International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence

In recent years, the application scenarios of artificial intelligence technology have become increasingly diverse, with more and more involvement in terminal devices, whose computational and storage capacities are typically limited. At the same time, ...
Neural Network Compression and Acceleration by Federated Pruning
Algorithms and Architectures for Parallel Processing
Abstract
In recent years, channel pruning is one of the important methods for deep model compression. But the resulting model still has tremendous redundant feature maps. In this paper, we propose a novel method, namely federated pruning algorithm, to ...
Differentiable Joint Pruning and Quantization for Hardware Efficiency
Computer Vision – ECCV 2020
Abstract
We present a differentiable joint pruning and quantization (DJPQ) scheme. We frame neural network compression as a joint gradient-based optimization problem, trading off between model pruning and quantization automatically for hardware efficiency. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems

ACM Transactions on Embedded Computing Systems Volume 21, Issue 5

September 2022

526 pages

ISSN:1539-9087

EISSN:1558-3465

DOI:10.1145/3561947

Editor:
Tulika Mitra
National University of Singapore, Singapore

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 08 October 2022

Online AM: 15 July 2022

Accepted: 03 March 2022

Revised: 25 February 2022

Received: 01 June 2021

Published in TECS Volume 21, Issue 5

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

National Research Foundation of Korea (NRF)
Institute of Information & communications Technology Planning & Evaluation (IITP)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
1,089
Total Downloads

Downloads (Last 12 months)405
Downloads (Last 6 weeks)38

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yao BLiu LPeng YPeng X(2024)Intelligent Measurement on Edge Devices Using Hardware Memory-Aware Joint Compression Enabled Neural NetworksIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2023.334112673(1-13)Online publication date: 2024
https://doi.org/10.1109/TIM.2023.3341126
Wang YLiu Q(2024)AQA: An Adaptive Post-Training Quantization Method for Activations of CNNsIEEE Transactions on Computers10.1109/TC.2024.339850373:8(2025-2035)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1109/TC.2024.3398503
Han YLong ZZhang YWu JFang ZFan R(2024)Generalized Correspondence Matching via Flexible Hierarchical Refinement and Patch Descriptor Distillation2024 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA57147.2024.10611456(10290-10297)Online publication date: 13-May-2024
https://doi.org/10.1109/ICRA57147.2024.10611456
Naveen SKounte M(2024)Optimized Convolutional Neural Network at the IoT edge for image detection using pruning and quantizationMultimedia Tools and Applications10.1007/s11042-024-20523-1Online publication date: 26-Dec-2024
https://doi.org/10.1007/s11042-024-20523-1
Rokh BAzarpeyvand AKhanteymoori A(2023)A Comprehensive Survey on Model Quantization for Deep Neural Networks in Image ClassificationACM Transactions on Intelligent Systems and Technology10.1145/362340214:6(1-50)Online publication date: 14-Nov-2023
https://dl.acm.org/doi/10.1145/3623402
Fang CSun WZhou AWang Z(2023)Efficient N:M Sparse DNN Training Using Algorithm, Architecture, and Dataflow Co-DesignIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.331778943:2(506-519)Online publication date: 20-Sep-2023
https://dl.acm.org/doi/10.1109/TCAD.2023.3317789
Aboutahoun DZewail RKimura KSoliman M(2023)Cross-Domain Few-Shot Sparse-Quantization Aware Learning for Lymphoblast Detection in Blood Smear ImagesPattern Recognition10.1007/978-3-031-47665-5_18(213-226)Online publication date: 5-Nov-2023
https://doi.org/10.1007/978-3-031-47665-5_18

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents