ABSTRACT
Convolutional neural networks have become ubiquitous in image classification tasks. The state-of-the-art models for image classifications use convolutional layers in one way or another. There is a need for deploying deep learning models, especially the real-time vision models, in the edge devices to get better latency. But deploying such models in edge devices are becoming critical as the networks are becoming deeper and more dense. An overparameterized network is not necessarily required in many of the use cases of such deployment. This led researcher to develop technique for optimizing smaller and shallower networks, network architecture search techniques, and deep learning model compression techniques. In this research, we proposed a framework that utilizes deep determinisitic policy gradient, a class of deep reinforcement learning algorithm, to the learn the best set of filters considering the intrinsic dimensionality of the dataset, feature of each layer and the criteria based on which the filters of a convolutional layer will be ranked. By learning this relationship, we can prune off unnecessary filters which will reduce both computational and memory requirement for the model without losing too much accuracy. Our method showed that the model can prune off 66% filters overall.
- Vincent Andrearczyk and Paul F Whelan. 2016. Using filter banks in convolutional neural networks for texture classification. Pattern Recognition Letters 84 (2016), 63--69.Google ScholarDigital Library
- Zachary Ankner, Alex Renda, Gintare Karolina Dziugaite, Jonathan Frankle, and Tian Jin. 2022. The Effect of Data Dimensionality on Neural Network Prunability. arXiv preprint arXiv:2212.00291 (2022).Google Scholar
- Yanming Chen, Xiang Wen, Yiwen Zhang, and Qiang He. 2022. FPC: Filter pruning via the contribution of output feature map for deep convolutional neural networks acceleration. Knowledge-Based Systems 238 (2022), 107876.Google ScholarDigital Library
- Radosvet Desislavov, Fernando Martínez-Plumed, and José Hernández-Orallo. 2021. Compute and energy consumption trends in deep learning inference. arXiv preprint arXiv:2109.05472 (2021).Google Scholar
- Elena Facco, Maria d'Errico, Alex Rodriguez, and Alessandro Laio. 2017. Estimating the intrinsic dimension of datasets by a minimal neighborhood information. Scientific reports 7, 1 (2017), 12140.Google Scholar
- Yihao Feng, Chao Huang, Long Wang, Xiong Luo, and Qingwen Li. 2022. A Novel Filter-Level Deep Convolutional Neural Network Pruning Method Based on Deep Reinforcement Learning. Applied Sciences 12, 22 (2022), 11414.Google ScholarCross Ref
- Jonathan Frankle and Michael Carbin. 2018. The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635 (2018).Google Scholar
- Jonathan Frankle, Gintare Karolina Dziugaite, Daniel M Roy, and Michael Carbin. 2019. Stabilizing the lottery ticket hypothesis. arXiv preprint arXiv:1903.01611 (2019).Google Scholar
- Lili Geng and Baoning Niu. 2022. Pruning convolutional neural networks via filter similarity analysis. Machine Learning 111, 9 (2022), 3161--3180.Google ScholarDigital Library
- Abir Mohammad Hadi, Kwanghee Won, and Sung Shin. 2022. Task dependent model complexity: shallow vs deep network. In Proceedings of the Conference on Research in Adaptive and Convergent Systems. 107--111.Google ScholarDigital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision. 1026--1034.Google ScholarDigital Library
- Romancha Khatri and Kwanghee Won. 2020. Kernel-Controlled DQN Based CNN Pruning for Model Compression and Acceleration. In Proceedings of the International Conference on Research in Adaptive and Convergent Systems. 36--41.Google ScholarDigital Library
- Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).Google Scholar
- Suresh Kirthi Kumaraswamy, PS Sastry, and Kalpathi Ramakrishnan. 2016. Bank of weight filters for deep cnns. In Asian Conference on Machine Learning. PMLR, 334--349.Google Scholar
- Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015).Google Scholar
- Dmytro Mishkin and Jiri Matas. 2015. All you need is a good init. arXiv preprint arXiv:1511.06422 (2015).Google Scholar
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. nature 518, 7540 (2015), 529--533.Google Scholar
- Phillip Pope, Chen Zhu, Ahmed Abdelkader, Micah Goldblum, and Tom Goldstein. 2021. The Intrinsic Dimension of Images and Its Impact on Learning. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net. htttps://openreview.net/forum?id=XJk19XzGq2JGoogle Scholar
- Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. 2014. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550 (2014).Google Scholar
- John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).Google Scholar
- Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1409.1556Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).Google Scholar
- Zi Wang and Chengcheng Li. 2022. Channel pruning via lookahead search guided reinforcement learning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2029--2040.Google ScholarCross Ref
- Chenbin Yang and Huiyi Liu. 2022. Channel pruning based on convolutional neural network sensitivity. Neurocomputing 507 (2022), 97--106.Google ScholarDigital Library
- Huixin Zhan, Wei-Ming Lin, and Yongcan Cao. 2021. Deep model compression via two-stage deep reinforcement learning. In Machine Learning and Knowledge Discovery in Databases. Research Track: European Conference, ECML PKDD 2021, Bilbao, Spain, September 13--17, 2021, Proceedings, Part I 21. Springer, 238--254.Google ScholarDigital Library
- Weiwei Zhang, Ming Ji, Haoran Yu, and Chenghui Zhen. 2022. ReLP: Reinforcement Learning Pruning Method Based on Prior Knowledge. Neural Processing Letters (2022), 1--18.Google Scholar
Index Terms
- Deep Reinforcement Learning Agent for Dynamic Pruning of Convolutional Layers
Recommendations
Pruning convolutional neural networks via filter similarity analysis
AbstractDeep learning has shown excellent performance in many fields, especially image recognition and retrieval in recent years. The performance of convolutional neural networks (CNNs) is particularly outstanding. CNNs, however, are usually ...
Conversational Recommender System Using Deep Reinforcement Learning
RecSys '22: Proceedings of the 16th ACM Conference on Recommender SystemsDeep Reinforcement Learning (DRL) uses the best of both Reinforcement Learning and Deep Learning for solving problems which cannot be addressed by them individually. Deep Reinforcement Learning has been used widely for games, robotics etc. Limited work ...
Deep neural network pruning method based on sensitive layers and reinforcement learning
AbstractIt is of great significance to compress neural network models so that they can be deployed on resource-constrained embedded mobile devices. However, due to the lack of theoretical guidance for non-salient network components, existing model ...
Comments