Abstract
Low-rank decomposition that explores and eliminates the linear dependency within a tensor is often used as a structured model pruning method for deep convolutional neural networks. However, the model accuracy declines rapidly as the compression ratio increases over a threshold. We have observed that with a small amount of sparse elements, the model accuracy can be recovered significantly for the highly compressed CNN models. Based on this premise, we developed a novel method, called LPSD (Low-rank Plus Sparse Decomposition), that decomposes a CNN weight tensor into a combination of a low-rank and a sparse components, which can better maintain the accuracy for the high compression ratio. For a pretrained model, the network structure of each layer is split into two branches: one for low-rank part and one for sparse part. LPSD adapts the alternating approximation algorithm to minimize the global error and the local error alternatively. An exhausted search method with pruning is designed to search the optimal group number, ranks, and sparsity. Experimental results demonstrate that in most scenarios, LPSD achieves better accuracy compared to the state-of-the-art methods when the model is highly compressed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cai, J.F., Li, J., Xia, D.: Generalized low-rank plus sparse tensor estimation by fast Riemannian optimization (2022)
Chu, B.S., Lee, C.R.: Low-rank tensor decomposition for compression of convolutional neural networks using funnel regularization (2021)
Guo, K., Xie, X., Xu, X., Xing, X.: Compressing by learning in a low-rank and sparse decomposition form. IEEE Access 7, 150823–150832 (2019). https://doi.org/10.1109/ACCESS.2019.2947846
Han, S., et al.: DSD: Dense-sparse-dense training for deep neural networks (2017)
Hawkins, C., Yang, H., Li, M., Lai, L., Chandra, V.: Low-rank+sparse tensor compression for neural networks (2021)
Huang, W., et al.: Deep low-rank plus sparse network for dynamic MR imaging (2021)
Idelbayev, Y., Carreira-Perpinan, M.A.: Low-rank compression of neural nets: learning the rank of each layer. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8046–8056 (2020). https://doi.org/10.1109/CVPR42600.2020.00807
Kaloshin, P.: Convolutional neural networks compression with low rank and sparse tensor decompositions (2020)
Kim, Y.D., Park, E., Yoo, S., Choi, T., Yang, L., Shin, D.: Compression of deep convolutional neural networks for fast and low power mobile applications (2016)
Liang, C.C., Lee, C.R.: Automatic selection of tensor decomposition for compressing convolutional neural networks a case study on VGG-type networks. In: 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 770–778 (2021). https://doi.org/10.1109/IPDPSW52791.2021.00115
Liebenwein, L., Maalouf, A., Gal, O., Feldman, D., Rus, D.: Compressing neural networks: Towards determining the optimal layer-wise decomposition (2021). CoRR abs/2107.11442, https://arxiv.org/abs/2107.11442
Lin, T., Stich, S.U., Barba, L., Dmitriev, D., Jaggi, M.: Dynamic model pruning with feedback (2020)
Otazo, R., Candès, E., Sodickson, D.: Low-rank plus sparse matrix decomposition for accelerated dynamic MRI with separation of background and dynamic components. Magn. Reson. Med. 73, 1125–1136 (2014). https://doi.org/10.1002/mrm.25240
Yin, M., Phan, H., Zang, X., Liao, S., Yuan, B.: BATUDE: budget-aware neural network compression based on tucker decomposition. Proc. AAAI Conf. Artif. Intell. 36, 8874–8882 (2022). https://doi.org/10.1609/aaai.v36i8.20869
Yu, X., Liu, T., Wang, X., Tao, D.: On compressing deep models by low rank and sparse decomposition. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 67–76 (2017). https://doi.org/10.1109/CVPR.2017.15
Zhang, X., Wang, L., Gu, Q.: A unified framework for low-rank plus sparse matrix recovery (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
1.1 A. Optimality of Sparsity Selection
The optimality of sparsity selection method can be proven by the Eckart-Young-Mirsky theorem [11]. Let A be an \(m\times n\) matrix, and \(\textrm{nnz}(A)\) be the number of nonzero elements pf A. The norm used is Frobenius norm, whose definition is \(\Vert A\Vert _F = \sqrt{\sum _{i=1}^m \sum _{j=1}^n A_{i,j}^2}, \) where \(A_{i,j}\) is the (i, j)th element of A. The following lemma shows how to find the optimal sparse matrix S to minimize \(\Vert A-S\Vert _F\).
Lemma 1
Let A be an \(m\times n\) matrix. The solution to minimize \(\Vert A-S\Vert _F\) such that nnz(S)=s is the matrix T that contains only the largest s \(|A_{i,j}|\) elements at the same indices, and other elements are zeros.
The proof is straightforward, since \(\Vert A-S\Vert _F^2 = \sum _{i=1}^m\sum _{j=1}^n (A_{i,j}-S_{i,j})^2.\) It minimal value can be obtained by removing the s largest \(|A_{i,j}|\) elements, which is equivalent to make \((A_{i,j}-S_{i,j})=0\) for those largest elements in magnitude.
1.2 B. Error Estimation
Theorem 1
If a collection of data with size n is in normal distribution with mean 0, then top-k squares sum can be estimated by the formula:
where \(t = F_X^{-1}\biggl (1 - \frac{k}{2n}\biggl )\), \(F_{X}(t)\) is the cumulative distribution function of \(f_{X}(t) = \frac{1}{\sigma \sqrt{2\pi }} e^\frac{-t^2}{2\sigma ^2}\), the probability density function of normal distribution with mean 0.
Proof
Let \(X \sim N(\mu =0, \sigma ^2)\) be the random variable of the data, the probability density function of X is \(f_{X}(t) = \frac{1}{\sigma \sqrt{2\pi }} e^\frac{-t^2}{2\sigma ^2}\). Now consider another random variable \(Y=X^2\). We can find out the probability density function of Y by:
We can rewrite \(f_Y(y)\) as:
After obtaining the probability density function of \(Y=X^2\), the kth largest square value in data can be found. Assume that the number is \(t^2 \ (t>0)\), then
After obtaining the kth largest square value, \(t^2\), the average of top-k squares can be found by expected value:
Focus on the integral part:
The top-k squares sum can be estimated by:
Corollary 1
If values of a \(a\times b\) matrix W are in normal distribution with mean 0, we can estimate the top-k squares sum by Theorem 1.
\(\sigma \) can be estimated by the Frobenius Norm divided by matrix size:
and \(n=ab\)
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Huang, KH., Sie, CY., Lin, JE., Lee, CR. (2024). LPSD: Low-Rank Plus Sparse Decomposition for Highly Compressed CNN Models. In: Yang, DN., Xie, X., Tseng, V.S., Pei, J., Huang, JW., Lin, J.CW. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2024. Lecture Notes in Computer Science(), vol 14645. Springer, Singapore. https://doi.org/10.1007/978-981-97-2242-6_28
Download citation
DOI: https://doi.org/10.1007/978-981-97-2242-6_28
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2241-9
Online ISBN: 978-981-97-2242-6
eBook Packages: Computer ScienceComputer Science (R0)