Abstract
Deep learning has become the default solution for a plethora of problems nowadays. However, one drawback of such deep learning-based solutions is that the models are very large and cumbersome to process. As such, they are difficult to use in small or embedded devices and to transmit across the web. In light of this problem, this paper presents a novel method for converting large neural networks into lightweight, compressed models. Our method utilizes the dimensionality reduction algorithm known as Principal Component Analysis to decompose the network weights into smaller matrices to create a new, compressed architecture. This compressed model is further trained to overcome the error due to the lossy compression and then the parameters are finally stored after quantization. Experiments on benchmark datasets using standard models show that we achieve high compression, with compression rates between 5 to 35 depending on the complexity of the model, with little to no fall in model accuracy. Comparison with other state-of-the-art methods shows that the performance of our compression method is similar or even better in certain cases. This is the first work where dimensionality reduction and quantization are combined to create a new, compressed model.
Similar content being viewed by others
References
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Graves A, Schmidhuber J (2005) Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw 18(5-6):602–610
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537
LeCun Y, Bottou L, Bengio Y, Haffner P, et al. (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Taigman Y, Yang M, Ranzato M, Wolf L (2014) Deepface: Closing the gap to human-level performance in face verification. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1701–1708
Coates A, Huval B, Wang T, Wu D, Catanzaro B, Andrew N (2013) Deep learning with cots hpc systems. Int Conf Mach Learn:1337–1345
Neill JO (2020) An overview of neural network compression. arXiv:2006.03669
Smith LI (2002) A tutorial on principal components analysis. Technical Report
Gong Y, Liu L, Yang M, Bourdev L (2014) Compressing deep convolutional networks using vector quantization. arXiv:1412.6115
Han S, Mao H, Dally WJ (2016) Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. 4th International Conference on Learning Representations, ICLR 2016. Conference Track Proceedings, San Juan
Ba J, Caruana R (2014) Do deep nets really need to be deep? Adv Neural Inf Process Syst:2654–2662
Swaminathan S, Garg D, Kannan R, Andres F (2020) Sparse low rank factorization for deep neural network compression. Neurocomputing 398:185–196
Mirzadeh SI, Farajtabar M, Li A, Levine N, Matsukawa A, Ghasemzadeh H (2020) Improved knowledge distillation via teacher assistant. Proc AAAI Conf Artif Intell 34(04):5191–5198
Mishra R, Gupta HP, Dutta T (2020) A survey on deep neural network compression: Challenges, overview, and solutions. arXiv:2010.03954
Peng H, Wu J, Chen S, Huang J (2019) Collaborative channel pruning for deep networks. Int Conf Mach Learn:5113–5122
Luo J-H, Wu J, Lin W (2017) Thinet: A filter level pruning method for deep neural network compression. Proceedings of the IEEE Int Conf Comput Vis:5058–5066
He X, Zhou Z, Thiele L (2018) Multi-task zipping via layer-wise neuron sharing. Adv Neural Inf Process Syst:6016–6026
Wu J, Leng C, Wang Y, Hu Q, Cheng J (2016) Quantized convolutional neural networks for mobile devices. Proc IEEE Conf Comput Vis Pattern Recognit:4820–4828
Vanhoucke V, Senior A, Mao MZ (2011) Improving the speed of neural networks on cpus
Courbariaux M, Bengio Y, David J-P (2015) Binaryconnect: Training deep neural networks with binary weights during propagations. Adv Neural Inf Process Syst:3123–3131
Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) Xnor-net: Imagenet classification using binary convolutional neural networks. Eur Conf Comput Vis:525–542
Srinivas S, Babu RV (2015) Data-free parameter pruning for deep neural networks. Proceedings of the British Machine Vision Conference 2015, BMVC 2015, Swansea, pp 31.1–31.12
Han S, Pool J, Tran J, Dally W (2015) Learning both weights and connections for efficient neural network. Advances in Neural Inf Process Syst:1135–1143
Hanson SJ, Pratt LY (1989) Comparing biases for minimal network construction with back-propagation. Advances in Neural Inf Process Syst:177–185
Chen W, Wilson J, Tyree S, Weinberger K, Chen Y (2015) Compressing neural networks with the hashing trick. Int Conf Mach Learn:2285–2294
Lebedev V, Lempitsky V (2016) Fast convnets using group-wise brain damage. Proc IEEE Conf Comput Vis Pattern Recognit:2554–2564
Zhou H, Alvarez JM, Porikli F (2016) Less is more: Towards compact cnns. Eur Conf Comput Vis:662–677
Sindhwani V, Sainath T, Kumar S (2015) Structured transforms for small-footprint deep learning. Adv Neural Inf Process Syst:3088–3096
Cheng Y, Yu FX, Feris RS, Kumar S, Choudhary A, Chang S-F (2015) An exploration of parameter redundancy in deep networks with circulant projections. Proc IEEE Int Conf Comput Vis:2857–2865
Denton EL, Zaremba W, Bruna J, LeCun Y, Fergus R (2014) Exploiting linear structure within convolutional networks for efficient evaluation. Adv Neural Inf Process Syst:1269–1277
Bucilu C, Caruana R, Niculescu-Mizil A (2006) Model compression. Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 535– 541
Sadiq MT, Yu X, Yuan Z, Aziz MZ, Siuly S, Ding W (2021) Toward the development of versatile brain–computer interfaces. IEEE Trans Artif Intell 2(4):314–328
Sadiq MT, Yu X, Yuan Z, Aziz MZ, Siuly S, Ding W (2020) A matrix determinant feature extraction approach for decoding motor and mental imagery eeg in subject specific tasks. IEEE Transactions on Cognitive and Developmental Systems
Sadiq MT, Yu X, Yuan Z, Fan Z, Rehman AU, Li G, Xiao G (2019) Motor imagery eeg signals classification based on mode amplitude and frequency components using empirical wavelet transform. IEEE Access 7:127678–127692
Sadiq MT, Yu X, Yuan Z, Zeming F, Rehman AU, Ullah I, Li G, Xiao G (2019) Motor imagery eeg signals decoding by multivariate empirical wavelet transform-based framework for robust brain–computer interfaces. IEEE access 7:171431– 171451
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hasan, M.S., Alam, R. & Adnan, M.A. Compressed neural architecture utilizing dimensionality reduction and quantization. Appl Intell 53, 1271–1286 (2023). https://doi.org/10.1007/s10489-022-03221-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03221-z