LPSD: Low-Rank Plus Sparse Decomposition for Highly Compressed CNN Models

Huang, Kuei-Hsiang; Sie, Cheng-Yu; Lin, Jhong-En; Lee, Che-Rung

doi:10.1007/978-981-97-2242-6_28

Kuei-Hsiang Huang¹³,
Cheng-Yu Sie¹³,
Jhong-En Lin¹³ &
…
Che-Rung Lee ORCID: orcid.org/0000-0003-3940-4478¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14645))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

186 Accesses

Abstract

Low-rank decomposition that explores and eliminates the linear dependency within a tensor is often used as a structured model pruning method for deep convolutional neural networks. However, the model accuracy declines rapidly as the compression ratio increases over a threshold. We have observed that with a small amount of sparse elements, the model accuracy can be recovered significantly for the highly compressed CNN models. Based on this premise, we developed a novel method, called LPSD (Low-rank Plus Sparse Decomposition), that decomposes a CNN weight tensor into a combination of a low-rank and a sparse components, which can better maintain the accuracy for the high compression ratio. For a pretrained model, the network structure of each layer is split into two branches: one for low-rank part and one for sparse part. LPSD adapts the alternating approximation algorithm to minimize the global error and the local error alternatively. An exhausted search method with pruning is designed to search the optimal group number, ranks, and sparsity. Experimental results demonstrate that in most scenarios, LPSD achieves better accuracy compared to the state-of-the-art methods when the model is highly compressed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cai, J.F., Li, J., Xia, D.: Generalized low-rank plus sparse tensor estimation by fast Riemannian optimization (2022)
Google Scholar
Chu, B.S., Lee, C.R.: Low-rank tensor decomposition for compression of convolutional neural networks using funnel regularization (2021)
Google Scholar
Guo, K., Xie, X., Xu, X., Xing, X.: Compressing by learning in a low-rank and sparse decomposition form. IEEE Access 7, 150823–150832 (2019). https://doi.org/10.1109/ACCESS.2019.2947846
Article Google Scholar
Han, S., et al.: DSD: Dense-sparse-dense training for deep neural networks (2017)
Google Scholar
Hawkins, C., Yang, H., Li, M., Lai, L., Chandra, V.: Low-rank+sparse tensor compression for neural networks (2021)
Google Scholar
Huang, W., et al.: Deep low-rank plus sparse network for dynamic MR imaging (2021)
Google Scholar
Idelbayev, Y., Carreira-Perpinan, M.A.: Low-rank compression of neural nets: learning the rank of each layer. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8046–8056 (2020). https://doi.org/10.1109/CVPR42600.2020.00807
Kaloshin, P.: Convolutional neural networks compression with low rank and sparse tensor decompositions (2020)
Google Scholar
Kim, Y.D., Park, E., Yoo, S., Choi, T., Yang, L., Shin, D.: Compression of deep convolutional neural networks for fast and low power mobile applications (2016)
Google Scholar
Liang, C.C., Lee, C.R.: Automatic selection of tensor decomposition for compressing convolutional neural networks a case study on VGG-type networks. In: 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 770–778 (2021). https://doi.org/10.1109/IPDPSW52791.2021.00115
Liebenwein, L., Maalouf, A., Gal, O., Feldman, D., Rus, D.: Compressing neural networks: Towards determining the optimal layer-wise decomposition (2021). CoRR abs/2107.11442, https://arxiv.org/abs/2107.11442
Lin, T., Stich, S.U., Barba, L., Dmitriev, D., Jaggi, M.: Dynamic model pruning with feedback (2020)
Google Scholar
Otazo, R., Candès, E., Sodickson, D.: Low-rank plus sparse matrix decomposition for accelerated dynamic MRI with separation of background and dynamic components. Magn. Reson. Med. 73, 1125–1136 (2014). https://doi.org/10.1002/mrm.25240
Yin, M., Phan, H., Zang, X., Liao, S., Yuan, B.: BATUDE: budget-aware neural network compression based on tucker decomposition. Proc. AAAI Conf. Artif. Intell. 36, 8874–8882 (2022). https://doi.org/10.1609/aaai.v36i8.20869
Yu, X., Liu, T., Wang, X., Tao, D.: On compressing deep models by low rank and sparse decomposition. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 67–76 (2017). https://doi.org/10.1109/CVPR.2017.15
Zhang, X., Wang, L., Gu, Q.: A unified framework for low-rank plus sparse matrix recovery (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

National Tsing Hua University, HsinChu, Taiwan
Kuei-Hsiang Huang, Cheng-Yu Sie, Jhong-En Lin & Che-Rung Lee

Authors

Kuei-Hsiang Huang
View author publications
You can also search for this author in PubMed Google Scholar
Cheng-Yu Sie
View author publications
You can also search for this author in PubMed Google Scholar
Jhong-En Lin
View author publications
You can also search for this author in PubMed Google Scholar
Che-Rung Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Che-Rung Lee .

Editor information

Editors and Affiliations

Academia Sinica, Taipei, Taiwan
De-Nian Yang
Microsoft Research Asia, Beijing, China
Xing Xie
National Yang Ming Chiao Tung University, Hsinchu, Taiwan
Vincent S. Tseng
Duke University, Durham, NC, USA
Jian Pei
National Cheng Kung University, Tainan, Taiwan
Jen-Wei Huang
Silesian University of Technology, Gliwice, Poland
Jerry Chun-Wei Lin

Appendix

1.1 A. Optimality of Sparsity Selection

The optimality of sparsity selection method can be proven by the Eckart-Young-Mirsky theorem [11]. Let A be an $m\times n$ matrix, and $\textrm{nnz}(A)$ be the number of nonzero elements pf A. The norm used is Frobenius norm, whose definition is $\Vert A\Vert _F = \sqrt{\sum _{i=1}^m \sum _{j=1}^n A_{i,j}^2}, $ where $A_{i,j}$ is the (i, j)th element of A. The following lemma shows how to find the optimal sparse matrix S to minimize $\Vert A-S\Vert _F$.

Lemma 1

Let A be an $m\times n$ matrix. The solution to minimize $\Vert A-S\Vert _F$ such that nnz(S)=s is the matrix T that contains only the largest s $|A_{i,j}|$ elements at the same indices, and other elements are zeros.

The proof is straightforward, since $\Vert A-S\Vert _F^2 = \sum _{i=1}^m\sum _{j=1}^n (A_{i,j}-S_{i,j})^2.$ It minimal value can be obtained by removing the s largest $|A_{i,j}|$ elements, which is equivalent to make $(A_{i,j}-S_{i,j})=0$ for those largest elements in magnitude.

1.2 B. Error Estimation

Theorem 1

If a collection of data with size n is in normal distribution with mean 0, then top-k squares sum can be estimated by the formula:

$$ n \frac{-2\sigma ^ 2}{\sigma \sqrt{2 \pi }} \Biggr [ -t e^{\frac{-t^2}{2 \sigma ^ 2}} - \sigma \sqrt{2 \pi } + \sigma \sqrt{2 \pi } F_X(t) \Biggr ], $$

where $t = F_X^{-1}\biggl (1 - \frac{k}{2n}\biggl )$, $F_{X}(t)$ is the cumulative distribution function of $f_{X}(t) = \frac{1}{\sigma \sqrt{2\pi }} e^\frac{-t^2}{2\sigma ^2}$, the probability density function of normal distribution with mean 0.

Proof

Let $X \sim N(\mu =0, \sigma ^2)$ be the random variable of the data, the probability density function of X is $f_{X}(t) = \frac{1}{\sigma \sqrt{2\pi }} e^\frac{-t^2}{2\sigma ^2}$. Now consider another random variable $Y=X^2$. We can find out the probability density function of Y by:

$$ f_{Y}(y) = \frac{d}{dy}Pr(Y\le y) = \frac{d}{dy}Pr(-\sqrt{y} \le X \le \sqrt{y}) = \frac{d}{dy} \int _{-\sqrt{y}}^{\sqrt{y}} f_{X}(x) \ dx $$

We can rewrite $f_Y(y)$ as:

$$\begin{aligned} &f_{Y}(y) = \frac{d}{dy} \int _{-\sqrt{y}}^{\sqrt{y}} f_{X}(x) \ dx\ = \frac{d}{dy} F_X(x)\Biggr |_{-\sqrt{y}}^{\sqrt{y}} \\ &= \frac{d}{dy} \biggl (F_X(\sqrt{y}) - F_X(-\sqrt{y})\biggl ) = f_X(\sqrt{y})\frac{1}{2\sqrt{y}} + f_X(-\sqrt{y})\frac{1}{2\sqrt{y}} = \frac{1}{\sqrt{y}} f_X(\sqrt{y}) \end{aligned}$$

After obtaining the probability density function of $Y=X^2$, the kth largest square value in data can be found. Assume that the number is $t^2 \ (t>0)$, then

$$\begin{aligned} &Pr(Y\le t^2) = 1 - \frac{k}{n} \\ &\Rightarrow Pr(-t \le X \le t) = 1 - \frac{k}{n} \\ &\Rightarrow Pr(X > t) = \frac{k}{2n} \Rightarrow Pr(X \le t) = 1 - \frac{k}{2n} \\ &\Rightarrow t = F_X^{-1}\biggl (1 - \frac{k}{2n}\biggl ) \end{aligned}$$

After obtaining the kth largest square value, $t^2$, the average of top-k squares can be found by expected value:

$$ E[Y | Y\ge t^2] = \frac{\int _{t^2}^{\infty } yf_Y(y) \ dy}{\int _{t^2}^{\infty } f_Y(y) \ dy} = \frac{1}{\frac{k}{n}} \int _{t^2}^{\infty } \frac{y}{\sqrt{y}} f_X(\sqrt{y}) \ dy = \frac{1}{\frac{k}{n}} \int _{t^2}^{\infty } \sqrt{y} \frac{1}{\sigma \sqrt{2 \pi }} e^{\frac{-y}{2\sigma ^2}} \ dy $$

Focus on the integral part:

$$\begin{aligned} &\int _{t^2}^{\infty } \sqrt{y} \frac{1}{\sigma \sqrt{2 \pi }} e^{\frac{-y}{2\sigma ^2}} \ dy = \int _{t^2}^{\infty } \sqrt{y} \frac{-2\sigma ^ 2}{\sigma \sqrt{2 \pi }} e^{\frac{-y}{2\sigma ^2}} \ d\biggl (\frac{-y}{2\sigma ^2}\biggl ) = \frac{-2\sigma ^ 2}{\sigma \sqrt{2 \pi }} \int _{t^2}^{\infty } \sqrt{y} e^{\frac{-y}{2\sigma ^2}} \ d\biggl (\frac{-y}{2\sigma ^2}\biggl ) \\ &= \frac{-2\sigma ^ 2}{\sigma \sqrt{2 \pi }} \Biggr [ \sqrt{y} e^{\frac{-y}{2\sigma ^2}} \Biggr |^{\infty }_{t^2} -\int _{t^2}^{\infty } e^{\frac{-y}{2\sigma ^2}} \ d(\sqrt{y}) \Biggr ] = \frac{-2\sigma ^ 2}{\sigma \sqrt{2 \pi }} \Biggr [ -t e^{\frac{-t^2}{2 \sigma ^ 2}} - \sigma \sqrt{2 \pi } F_X(\sqrt{y})\Biggr |^{\infty }_{t^2} \Biggr ] \\ &= \frac{-2\sigma ^ 2}{\sigma \sqrt{2 \pi }} \Biggr [ -t e^{\frac{-t^2}{2 \sigma ^ 2}} - \sigma \sqrt{2 \pi } + \sigma \sqrt{2 \pi } F_X(t) \Biggr ] \\ \end{aligned}$$

The top-k squares sum can be estimated by:

$$\begin{aligned} &k E[Y \ | \ Y\ge t^2] = k \frac{1}{\frac{k}{n}} \frac{-2\sigma ^ 2}{\sigma \sqrt{2 \pi }} \Biggr [ -t e^{\frac{-t^2}{2 \sigma ^ 2}} - \sigma \sqrt{2 \pi } + \sigma \sqrt{2 \pi } F_X(t) \Biggr ] \\ &= n \frac{-2\sigma ^ 2}{\sigma \sqrt{2 \pi }} \Biggr [ -t e^{\frac{-t^2}{2 \sigma ^ 2}} - \sigma \sqrt{2 \pi } + \sigma \sqrt{2 \pi } F_X(t) \Biggr ] \end{aligned}$$

Corollary 1

If values of a $a\times b$ matrix W are in normal distribution with mean 0, we can estimate the top-k squares sum by Theorem 1.

$$ n \frac{-2\sigma ^ 2}{\sigma \sqrt{2 \pi }} \Biggr [ -t e^{\frac{-t^2}{2 \sigma ^ 2}} - \sigma \sqrt{2 \pi } + \sigma \sqrt{2 \pi } F_X(t) \Biggr ] $$

$\sigma $ can be estimated by the Frobenius Norm divided by matrix size:

$$\begin{aligned} \sigma = E[X^2] = \frac{\Vert W\Vert _F^2}{n} \end{aligned}$$

and $n=ab$

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huang, KH., Sie, CY., Lin, JE., Lee, CR. (2024). LPSD: Low-Rank Plus Sparse Decomposition for Highly Compressed CNN Models. In: Yang, DN., Xie, X., Tseng, V.S., Pei, J., Huang, JW., Lin, J.CW. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2024. Lecture Notes in Computer Science(), vol 14645. Springer, Singapore. https://doi.org/10.1007/978-981-97-2242-6_28

Download citation

DOI: https://doi.org/10.1007/978-981-97-2242-6_28
Published: 25 April 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2241-9
Online ISBN: 978-981-97-2242-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

LPSD: Low-Rank Plus Sparse Decomposition for Highly Compressed CNN Models

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 A. Optimality of Sparsity Selection

Lemma 1

1.2 B. Error Estimation

Theorem 1

Proof

Corollary 1

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation