An Empirical Study on Model Pruning and Quantization

Tian, Yuzhe; Luan, Tom H.; Zheng, Xi

doi:10.1007/978-3-031-40467-2_7

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 511))

Included in the following conference series:

International Conference on Broadband Communications, Networks and Systems

251 Accesses

Abstract

In machine learning, model compression is vital for resource-constrained Internet of Things (IoT) devices, such as unmanned aerial vehicles (UAVs) and smart phones. Currently there are some state-of-the-art (SOTA) compression methods, but little study is conducted to evaluate these techniques across different models and datasets. In this paper, we present an in-depth study on two SOTA model compression methods, pruning and quantization. We apply these methods on AlexNet, ResNet18, VGG16BN and VGG19BN, with three well known datasets, Fashion-MNIST, CIFAR-10, and UCI-HAR. Through our study, we draw the conclusion that, applying pruning and retraining could keep the performance (average less than $0.5\%$ degrade) while reducing the model size (at $10\times $ compression rate) on spatial domain datasets (e.g. pictures); the performance on temporal domain datasets (e.g. motion sensors data) degrades more (average about $5.0\%$ degrade); the performance of quantization is related with the pruning rate and the network architecture. We also compare different clustering methods and reveal the impact on model accuracy and quantization ratio. Finally, we provide some interesting directions for future research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Anguita, D., Ghio, A., Oneto, L., Parra Perez, X., Reyes Ortiz, J.L.: A public domain dataset for human activity recognition using smartphones. In: Proceedings of the 21th International European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pp. 437–442 (2013)
Google Scholar
Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. Technical report, Stanford (2006)
Google Scholar
Ba, J., Caruana, R.: Do deep nets really need to be deep? Adv. Neural Inf. Process. Syst. 27 (2014)
Google Scholar
Bhandari, B., Lu, J., Zheng, X., Rajasegarar, S., Karmakar, C.: Non-invasive sensor based automated smoking activity detection. In: 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 845–848. IEEE (2017)
Google Scholar
Buciluǎ, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 535–541 (2006)
Google Scholar
Cai, H., Zhu, L., Han, S.: ProxylessNAS: direct neural architecture search on target task and hardware. arXiv preprint arXiv:1812.00332 (2018)
Celebi, M.E., Kingravi, H.A.: Linear, deterministic, and order-invariant initialization methods for the K-means clustering algorithm. In: Celebi, M.E. (ed.) Partitional Clustering Algorithms, pp. 79–98. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-09259-1_3
Chapter Google Scholar
Cheng, Y., Wang, D., Zhou, P., Zhang, T.: A survey of model compression and acceleration for deep neural networks. arXiv preprint arXiv:1710.09282 (2017)
Dalhatu, K., Sim, A.T.H.: Density base k-mean’s cluster centroid initialization algorithm. Int. J. Comput. Appl. 137(11) (2016)
Google Scholar
Deep, S., Tian, Y., Lu, J., Zhou, Y., Zheng, X.: Leveraging multi-view learning for human anomaly detection in industrial internet of things. In: 2020 International Conferences on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData) and IEEE Congress on Cybermatics (Cybermatics), pp. 533–537. IEEE (2020)
Google Scholar
Denton, E.L., Zaremba, W., Bruna, J., LeCun, Y., Fergus, R.: Exploiting linear structure within convolutional networks for efficient evaluation. Adv. Neural Inf. Process. Syst. 27 (2014)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16$\times $16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Drias, Z., Serhrouchni, A., Vogel, O.: Analysis of cyber security for industrial control systems. In: 2015 International Conference on Cyber Security of Smart Cities, Industrial Control System and Communications (SSIC), pp. 1–8. IEEE (2015)
Google Scholar
Fujii, K., Higuchi, K., Rekimoto, J.: Endless flyer: a continuous flying drone with automatic battery replacement. In: 2013 IEEE 10th International Conference on Ubiquitous Intelligence and Computing and 2013 IEEE 10th International Conference on Autonomic and Trusted Computing, pp. 216–223. IEEE (2013)
Google Scholar
Han, S., Shen, H., Philipose, M., Agarwal, S., Wolman, A., Krishnamurthy, A.: MCDNN: an approximation-based execution framework for deep stream processing under resource constraints. In: Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services, pp. 123–136 (2016)
Google Scholar
Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv preprint arXiv:1510.00149 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and $<$0.5 mb model size. arXiv preprint arXiv:1602.07360 (2016)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR (2015)
Google Scholar
Kang, Y., Hauswald, J., Gao, C., Rovinski, A., Mudge, T., Mars, J., Tang, L.: Neurosurgeon: collaborative intelligence between the cloud and mobile edge. ACM SIGARCH Comput. Archit. News 45(1), 615–629 (2017)
Article Google Scholar
Krizhevsky, A.: Learning multiple layers of features from tiny images (2009)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25 (2012)
Google Scholar
Lu, J., Wang, J., Zheng, X., Karmakar, C., Rajasegarar, S.: Detection of smoking events from confounding activities of daily living. In: Proceedings of the Australasian Computer Science Week Multiconference, pp. 1–9 (2019)
Google Scholar
Lu, J., Zheng, X., Sheng, M., Jin, J., Yu, S.: Efficient human activity recognition using a single wearable sensor. IEEE Internet Things J. 7(11), 11137–11146 (2020)
Article Google Scholar
Lu, J., et al.: Can steering wheel detect your driving fatigue? IEEE Trans. Veh. Technol. 70(6), 5537–5550 (2021)
Article Google Scholar
Lu, Y., Kumar, A., Zhai, S., Cheng, Y., Javidi, T., Feris, R.: Fully-adaptive feature sharing in multi-task networks with applications in person attribute classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5334–5343 (2017)
Google Scholar
MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, vol. 1, pp. 281–297 (1967)
Google Scholar
Mammadova, M., Jabrayilova, Z.: Conceptual approaches to IoT-based personnel health management in offshore oil and gas industry. Control Optim. Industr. Appl. 257 (2020)
Google Scholar
Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data 6(1), 1–48 (2019)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Sung, W.T., Hsu, Y.C.: Designing an industrial real-time measurement and monitoring system based on embedded system and ZigBee. Expert Syst. Appl. 38(4), 4522–4529 (2011)
Article Google Scholar
Tan, M., et al.: MnasNet: platform-aware neural architecture search for mobile. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2820–2828 (2019)
Google Scholar
Tan, M., Le, Q.: EfficientNet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114. PMLR (2019)
Google Scholar
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Google Scholar
Wang, S., Zhang, X., Uchiyama, H., Matsuda, H.: Hivemind: towards cellular native machine learning model splitting. IEEE J. Sel. Areas Commun. 40(2), 626–640 (2021)
Article Google Scholar
Wang, T., et al.: Mobile edge-enabled trust evaluation for the internet of things. Inf. Fusion 75, 90–100 (2021)
Article Google Scholar
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms (2017)
Google Scholar
Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)
Zhang, X., Zhou, X., Lin, M., Sun, J.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018)
Google Scholar

Download references

Acknowledgements

This work is in part supported by an Australian Research Council (ARC) Discovery Project (DP210102447), an ARC Linkage Project (LP190100676), and a DATA61 project (Data61 CRP C020996).

Author information

Authors and Affiliations

School of Computing, Macquarie University, Macquarie Park, NSW, 2109, Australia
Yuzhe Tian & Xi Zheng
School of Cyber Engineering, Xidian University, Xi’an, 710126, Shaanxi, China
Tom H. Luan

Authors

Yuzhe Tian
View author publications
You can also search for this author in PubMed Google Scholar
Tom H. Luan
View author publications
You can also search for this author in PubMed Google Scholar
Xi Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xi Zheng .

Editor information

Editors and Affiliations

Harbin Engineering University, Harbin, Heilongjiang, China
Wei Wang
Shanghai Jiao Tong University, Shanghai, China
Jun Wu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tian, Y., Luan, T.H., Zheng, X. (2023). An Empirical Study on Model Pruning and Quantization. In: Wang, W., Wu, J. (eds) Broadband Communications, Networks, and Systems. BROADNETS 2023. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 511. Springer, Cham. https://doi.org/10.1007/978-3-031-40467-2_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-40467-2_7
Published: 30 July 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40466-5
Online ISBN: 978-3-031-40467-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Empirical Study on Model Pruning and Quantization