Skip to main content
Log in

Proximal linearization methods for Schatten p-quasi-norm minimization

  • Published:
Numerische Mathematik Aims and scope Submit manuscript

Abstract

Schatten p-quasi-norm minimization has advantages over nuclear norm minimization in recovering low-rank matrices. However, Schatten p-quasi-norm minimization is much more difficult, especially for generic linear matrix equations. In this paper, we first extend the lower bound theory of \(\ell _p\) minimization to Schatten p-quasi-norm minimization. We prove that the positive singular values of local minimizers are bounded from below by a constant. Motivated by this property, we propose a proximal linearization method, whose subproblems can be solved efficiently by the (linearized) alternating direction method of multipliers. The convergence analysis of the proposed method involves the nonsmooth analysis of singular value functions. We give a necessary and sufficient condition for a singular value function to be a Kurdyka–Łojasiewicz function. The subdifferentials of related singular value functions are computed. The global convergence of the proposed method is established under some assumptions. Experiments on matrix completion, Sylvester equation and image deblurring show the effectiveness of the algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. The code is available at https://zhouchenlin.github.io/.

  2. Given a matrix X, we set the numerical rank as the number of singular values \(\sigma _r(X)\) satisfying \(\sigma _r(X)/\Vert X\Vert _F\ge 10^{-4}\).

References

  1. Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. 116(1), 5–16 (2009)

    MathSciNet  MATH  Google Scholar 

  2. Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)

    MathSciNet  MATH  Google Scholar 

  3. Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137(1–2), 91–129 (2013)

    MathSciNet  MATH  Google Scholar 

  4. Beck, A., Teboulle, M.: Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Trans. Image Process. 18(11), 2419–2434 (2009)

    MathSciNet  MATH  Google Scholar 

  5. Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)

    MathSciNet  MATH  Google Scholar 

  6. Boţ, R.I., Nguyen, D.-K.: The proximal alternating direction method of multipliers in the nonconvex setting: convergence analysis and rates. Math. Oper. Res. 45(2), 682–712 (2020)

    MathSciNet  MATH  Google Scholar 

  7. Cai, J.-F., Candès, E.J., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20(4), 1956–1982 (2010)

    MathSciNet  MATH  Google Scholar 

  8. Candès, E.J., Recht, B.: Exact matrix completion via convex optimization. Found. Comput. Math. 9(6), 717 (2009)

    MathSciNet  MATH  Google Scholar 

  9. Candès, E.J., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006)

    MathSciNet  MATH  Google Scholar 

  10. Candès, E.J., Tao, T.: The power of convex relaxation: near-optimal matrix completion. IEEE Trans. Inf. Theory 56(5), 2053–2080 (2010)

    MathSciNet  MATH  Google Scholar 

  11. Candès, E.J., Wakin, M.B., Boyd, S.P.: Enhancing sparsity by reweighted \(\ell _1\) minimization. J. Four. Anal. Appl. 14(5), 877–905 (2008)

    MATH  Google Scholar 

  12. Chan, R.H., Tao, M., Yuan, X.: Constrained total variation deblurring models and fast algorithms based on alternating direction method of multipliers. SIAM J. Imag. Sci. 6(1), 680–697 (2013)

    MathSciNet  MATH  Google Scholar 

  13. Chen, C., He, B., Yuan, X.: Matrix completion via an alternating direction method. IMA J. Numer. Anal. 32(1), 227–245 (2012)

    MathSciNet  MATH  Google Scholar 

  14. Chen, X., Ng, M.K., Zhang, C.: Non-Lipschitz-regularization and box constrained model for image restoration. IEEE Trans. Image Process. 21(12), 4709–4721 (2012)

    MathSciNet  MATH  Google Scholar 

  15. Chen, X., Xu, F., Ye, Y.: Lower bound theory of nonzero entries in solutions of \(\ell _2\)-\(\ell _p\) minimization. SIAM J. Sci. Comput. 32(5), 2832–2852 (2011)

    Google Scholar 

  16. Donoho, D.L.: For most large underdetermined systems of linear equations the minimal \(\ell _1\)-norm solution is also the sparsest solution. Commun. Pure Appl. Math. 59(6), 797–829 (2006)

    MathSciNet  MATH  Google Scholar 

  17. Eckart, C., Young, G.: The approximation of one matrix by another of lower rank. Psychometrika 1(3), 211–218 (1936)

    MATH  Google Scholar 

  18. El Ghaoui, L., Gahinet, P.: Rank minimization under LMI constraints: a framework for output feedback problems. In: European Control Conf., pp. 1176–1179 (1993)

  19. Fazel, M., Hindi, H., Boyd, S.P.: A rank minimization heuristic with application to minimum order system approximation. In: Proceedings of the 2001 American Control Conference (Cat. No. 01CH37148), vol. 6, pp. 4734–4739. IEEE (2001)

  20. Fornasier, M., Rauhut, H., Ward, R.: Low-rank matrix recovery via iteratively reweighted least squares minimization. SIAM J. Optim. 21(4), 1614–1640 (2011)

    MathSciNet  MATH  Google Scholar 

  21. Gazzola, S., Meng, C., Nagy, J.G.: Krylov methods for low-rank regularization. SIAM J. Matrix Anal. Appl. 41(4), 1477–1504 (2020)

    MathSciNet  MATH  Google Scholar 

  22. Gu, S., Xie, Q., Meng, D., Zuo, W., Feng, X., Zhang, L.: Weighted nuclear norm minimization and its applications to low level vision. Int. J. Comput. Vis. 121(2), 183–208 (2017)

    MATH  Google Scholar 

  23. Gu, S., Zhang, L., Zuo, W., Feng, X.: Weighted nuclear norm minimization with application to image denoising. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2862–2869 (2014)

  24. Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge University Press, Cambridge (2012)

    Google Scholar 

  25. Hosseini, S., Luke, D.R., Uschmajew, A.: Tangent and normal cones for low-rank matrices. In: Nonsmooth Optimization and its Applications, pp. 45–53 (2019)

  26. Lai, M.-J., Liu, Y., Li, S., Wang, H.: On the Schatten \(p\)-quasi-norm minimization for low-rank matrix recovery. Appl. Comput. Harmon. Anal. 51, 157–170 (2021)

    MathSciNet  MATH  Google Scholar 

  27. Lai, M.-J., Xu, Y., Yin, W.: Improved iteratively reweighted least squares for unconstrained smoothed \(\ell _q\) minimization. SIAM J. Numer. Anal. 51(2), 927–957 (2013)

    MathSciNet  MATH  Google Scholar 

  28. Lai, M.-J., Yin, W.: Augmented \(\ell _1\) and nuclear-norm models with a globally linearly convergent algorithm. SIAM J. Imag. Sci. 6(2), 1059–1091 (2013)

    MATH  Google Scholar 

  29. Larsen, R.M.: PROPACK-software for large and sparse SVD calculations. http://sun.stanford.edu/~rmunk/PROPACK/

  30. Lee, K., Elman, H.C.: A preconditioned low-rank projection method with a rank-reduction scheme for stochastic partial differential equations. SIAM J. Sci. Comput. 39(5), S828–S850 (2017)

    MathSciNet  MATH  Google Scholar 

  31. Lewis, A.S., Sendov, H.S.: Nonsmooth analysis of singular values. Part I: theory. Set-Valued Anal. 13(3), 213–241 (2005)

    MathSciNet  MATH  Google Scholar 

  32. Li, G., Pong, T.K.: Global convergence of splitting methods for nonconvex composite optimization. SIAM J. Optim. 25(4), 2434–2460 (2015)

    MathSciNet  MATH  Google Scholar 

  33. Li, G., Pong, T.K.: Calculus of the exponent of Kurdyka–Łojasiewicz inequality and its applications to linear convergence of first-order methods. Found. Comput. Math. 18(5), 1199–1232 (2018)

    MathSciNet  MATH  Google Scholar 

  34. Lin, Z.: Some software packages for partial SVD computation. arXiv preprint arXiv:1108.1548 (2011)

  35. Liu, Z., Wu, C., Zhao, Y.: A new globally convergent algorithm for non-Lipschitz \(\ell _p-\ell _q\) minimization. Adv. Comput. Math. 45(3), 1369–1399 (2019)

    MathSciNet  Google Scholar 

  36. Lu, C., Lin, Z., Yan, S.: Smoothed low rank and sparse matrix recovery by iteratively reweighted least squares minimization. IEEE Trans. Image Process. 24(2), 646–654 (2014)

    MathSciNet  MATH  Google Scholar 

  37. Markovsky, I.: Structured low-rank approximation and its applications. Automatica 44(4), 891–909 (2008)

    MathSciNet  MATH  Google Scholar 

  38. Mohan, K., Fazel, M.: Iterative reweighted algorithms for matrix rank minimization. J. Mach. Learn. Res. 13(1), 3441–3473 (2012)

    MathSciNet  MATH  Google Scholar 

  39. Nikolova, M.: Analysis of the recovery of edges in images and signals by minimizing nonconvex regularized least-squares. Multiscale Model. Simul. 4(3), 960–991 (2005)

    MathSciNet  MATH  Google Scholar 

  40. Pong, T.K., Tseng, P., Ji, S., Ye, J.: Trace norm regularization: reformulations, algorithms, and multi-task learning. SIAM J. Optim. 20(6), 3465–3489 (2010)

    MathSciNet  MATH  Google Scholar 

  41. Recht, B., Fazel, M., Parrilo, P.A.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010)

    MathSciNet  MATH  Google Scholar 

  42. Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis, vol. 317. Springer (2009)

  43. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60(1–4), 259–268 (1992)

    MathSciNet  MATH  Google Scholar 

  44. Simoncini, V.: Computational methods for linear matrix equations. SIAM Rev. 58(3), 377–441 (2016)

    MathSciNet  MATH  Google Scholar 

  45. Van den Dries, L., Miller, C., et al.: Geometric categories and o-minimal structures. Duke Math. J. 84(2), 497–540 (1996)

    MathSciNet  MATH  Google Scholar 

  46. Vandereycken, B.: Low-rank matrix completion by Riemannian optimization. SIAM J. Optim. 23(2), 1214–1236 (2013)

    MathSciNet  MATH  Google Scholar 

  47. Wang, Y., Yang, J., Yin, W., Zhang, Y.: A new alternating minimization algorithm for total variation image reconstruction. SIAM J. Imag. Sci. 1(3), 248–272 (2008)

    MathSciNet  MATH  Google Scholar 

  48. Wang, Y., Yin, W., Zeng, J.: Global convergence of ADMM in nonconvex nonsmooth optimization. J. Sci. Comput. 78(1), 29–63 (2019)

    MathSciNet  MATH  Google Scholar 

  49. Wen, Z., Yin, W., Zhang, Y.: Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm. Math. Program. Comput. 4(4), 333–361 (2012)

    MathSciNet  MATH  Google Scholar 

  50. Yang, J., Yuan, X.: Linearized augmented Lagrangian and alternating direction methods for nuclear norm minimization. Math. Comput. 82(281), 301–329 (2013)

    MathSciNet  MATH  Google Scholar 

  51. Yu, P., Li, G., Pong, T.K.: Kurdyka–Łojasiewicz exponent via INF-projection. Found. Comput. Math. 22, 1–47 (2021)

    MATH  Google Scholar 

  52. Zeng, C., Wu, C.: On the edge recovery property of noncovex nonsmooth regularization in image restoration. SIAM J. Numer. Anal. 56(2), 1168–1182 (2018)

    MathSciNet  MATH  Google Scholar 

  53. Zeng, C., Wu, C.: On the discontinuity of images recovered by noncovex nonsmooth regularized isotropic models with box constraints. Adv. Comput. Math. 45(2), 589–610 (2019)

    MathSciNet  MATH  Google Scholar 

  54. Zeng, C., Wu, C., Jia, R.: Non-Lipschitz models for image restoration with impulse noise removal. SIAM J. Imag. Sci. 12(1), 420–458 (2019)

    MathSciNet  MATH  Google Scholar 

  55. Zhang, X., Bai, M., Ng, M.K.: Nonconvex-TV based image restoration with impulse noise removal. SIAM J. Imag. Sci. 10(3), 1627–1667 (2017)

    MathSciNet  MATH  Google Scholar 

  56. Zheng, Z., Ng, M., Wu, C.: A globally convergent algorithm for a class of gradient compounded non-Lipschitz models applied to non-additive noise removal. Inverse Prob. 36(12), 125017 (2020)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The author is extremely grateful to the editor and the two anonymous referees for their valuable feedback, which improved this paper significantly. The author is also grateful to Dr. Xianshun Nian and Dr. Guomin Liu for helpful discussions. This work was partially supported by the National Natural Science Foundation of China (12201319).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chao Zeng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

This appendix summarizes some important results on KL theory and gives some examples. The following definition is adopted from Attouch et al. [2, Definition 4.1].

Definition 7.1

(o-minimal structure on \(\mathbb {R}\)) Let \(\mathscr {O} = \{\mathscr {O}_n\}_{n \in \mathbb {N}}\) such that each \(\mathscr {O}_n\) is a collection of subsets of \(\mathbb {R}^n\). The family \(\mathscr {O}\) is an o-minimal structure on \(\mathbb {R}\), if it satisfies the following axioms:

  1. (i)

    Each \(\mathscr {O}_n\) is a boolean algebra. Namely \(\emptyset \in \mathscr {O}_n\) and for each \(\mathscr {A},\mathscr {B}\) in \(\mathscr {O}_n\), \(\mathscr {A} \cup \mathscr {B}\), \(\mathscr {A} \cap \mathscr {B}\), and \(\mathbb {R}^n \setminus \mathscr {A}\) belong to \(\mathscr {O}_n\).

  2. (ii)

    For all \(\mathscr {A}\) in \(\mathscr {O}_n\), \(\mathscr {A} \times \mathbb {R}\) and \(\mathbb {R} \times \mathscr {A}\) belong to \(\mathscr {O}_{n+1}\).

  3. (iii)

    For all \(\mathscr {A}\) in \(\mathscr {O}_{n+1}\), \(\{(x_1,\ldots ,x_n)\in \mathbb {R}^n: (x_1,\ldots ,x_n,x_{n+1}) \in \mathscr {A}\}\) belongs to \(\mathscr {O}_{n}\).

  4. (iv)

    For all \(i \ne j\) in \(\{1,2,\ldots ,n\}\), \(\{(x_1,\ldots ,x_n) \in \mathbb {R}^n: x_i = x_j\}\) belongs to \(\mathscr {O}_n\).

  5. (v)

    The set \(\{(x_1,x_2) \in \mathbb {R}^2: x_1<x_2\}\) belongs to \(\mathscr {O}_2\).

  6. (vi)

    The elements of \(\mathscr {O}_1\) are exactly finite unions of intervals.

Let \(\mathscr {O}\) be an o-minimal structure on \(\mathbb {R}\). We say that a set \(\mathscr {A} \subseteq \mathbb {R}^n\) is definable (on \(\mathscr {O}\)) if \(\mathscr {A} \in \mathscr {O}_n\). A function \(f: \mathbb {R}^n \rightarrow (-\infty , +\infty ]\) is definable if its graph \(\{(\varvec{x},y) \in \mathbb {R}^n \times (-\infty ,+\infty ]: y \in f(\varvec{x})\}\) is definable on \(\mathscr {O}\). We list some known elementary properties of definable functions below.

Property 7.2

(See [2]) Finite sums of definable functions are definable; indicator functions of definable sets are definable; compositions of definable functions or mappings are definable.

It is known that any proper lower semicontinuous function that is definable is a KL function; see [2, Theorem 4.1].

Example 7.3

A class of o-minimal structure is the log-exp structure [45, Example 2.5]. By this structure, the following functions are all definable:

  1. 1.

    semi-algebraic functions; see Definition below.

  2. 2.

    the function \(\mathbb {R} \rightarrow \mathbb {R}\) given by

    $$\begin{aligned} x \mapsto {\left\{ \begin{array}{ll} x^r, &{} x >0 \\ 0, &{} x \le 0, \end{array}\right. } \end{aligned}$$

    where \(r \in \mathbb {R}\).

  3. 3.

    the exponential function: \(\mathbb {R} \rightarrow \mathbb {R}\) given by \(x \mapsto e^x\) and the logarithm function: \((0,\infty ) \rightarrow \mathbb {R}\) given by \(x \mapsto \log (x)\).

Definition 7.4

(See [5, Definition 5]) A subset \(\mathscr {S}\) of \(\mathbb {R}^d\) is a real semi-algebraic set if there exists a finite number of real polynomial functions \(f_{ij},g_{ij}: \mathbb {R}^d \rightarrow \mathbb {R}\) such that

$$\begin{aligned} \mathscr {S}=\bigcup _{j=1}^s\bigcap _{i=1}^t \left\{ \varvec{x}\in \mathbb {R}^{d}: f_{ij}(\varvec{x})=0 \text { and } g_{ij}(\varvec{x})<0 \right\} \end{aligned}$$

A function \(f: \mathbb {R}^{d} \rightarrow (-\infty ,+\infty ]\) is called semi-algebraic if its graph

$$\begin{aligned} \left\{ (\varvec{x},y)\in \mathbb {R}^{d+1}: f(\varvec{x})=y \right\} \end{aligned}$$

is a semi-algebraic subset of \(\mathbb {R}^{d+1}\).

The class of semi-algebraic sets is stable under the following operations: finite unions, finite intersections, complementation and Cartesian products.

Example 7.5

[5, Example 2] There is broad class of semi-algebraic sets and functions arising in optimization.

  1. 1.

    Real polynomial functions.

  2. 2.

    Indicator functions of semi-algebraic sets.

  3. 3.

    Finite sums and product of semi-algebraic functions.

  4. 4.

    Composition of semi-algebraic functions.

  5. 5.

    In matrix theory, all the following are semi-algebraic sets: cone of positive semidefinite matrices, Stiefel manifolds and constant rank matrices.

Example 7.6

Define \(f(\varvec{x}): \mathbb {R}^{n}\rightarrow \mathbb {R}\) by \(f(\varvec{x})=\sum _{i=1}^{n}|x_i|^p\). We prove that f is definable. First, consider the function \(g(t)=|t|\). The graph of g(t) is

$$\begin{aligned}{} & {} \{(t,y)\in \mathbb {R}^2: y = t, t>0 \}\cup \{(t,y)\in \mathbb {R}^2: y = -t, t>0 \}\\{} & {} \quad \cup \{(t,y)\in \mathbb {R}^2: y = t, t=0 \}. \end{aligned}$$

Hence, g(t) is a semi-algebraic function. From Property and Example , we know that \(f(\varvec{x})\) is definable.

Example 7.7

We prove that the function \(f(\varvec{x})\) defined in (47) is a semi-algebraic function. From Example , we see that it suffices to prove that the function \(f_i(\varvec{x}):=b_i \underline{x}_i,i=1,\ldots ,r\) is semi-algebraic. We only prove that \(f_1(\varvec{x})=b_1 \underline{x}_1\) is semi-algebraic, and the other cases are similar. Define

$$\begin{aligned} \mathscr {T}_j:=\{\varvec{x}\in \mathbb {R}^h: x_j=\underline{x}_1\}, \quad j=1,\ldots ,h. \end{aligned}$$

By the definition of \(\underline{\varvec{x}}\), we have

$$\begin{aligned} \mathscr {T}_j=\bigcap _{k=1}^{h}\{\varvec{x}\in \mathbb {R}^h: |x_j|\ge |x_k|\} \end{aligned}$$

and \(\{\varvec{x}\in \mathbb {R}^h: |x_j|\ge |x_k|\}\) can be written as a union of some semi-algebraic sets. Hence, \(\mathscr {T}_j\) is semi-algebraic. The graph of \(f_1(\varvec{x})\) is

$$\begin{aligned} \bigcup _{j=1}^h \left( \left\{ (\varvec{x},y)\in \mathbb {R}^{h+1}:y = b_1 |x_j| \right\} \cap (\mathscr {T}_j\times \mathbb {R}) \right) , \end{aligned}$$

which is a semi-algebraic set.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zeng, C. Proximal linearization methods for Schatten p-quasi-norm minimization. Numer. Math. 153, 213–248 (2023). https://doi.org/10.1007/s00211-022-01335-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00211-022-01335-7

Mathematics Subject Classification

Navigation