Skip to main content
Log in

Dictionary learning with the \({{\ell }_{1/2}}\)-regularizer and the coherence penalty and its convergence analysis

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

The \({{\ell }_{1/2}}\)-regularizer has been studied widely in compressed sensing, but there have been few studies about dictionary learning problems. The dictionary learning method with the \({{\ell }_{1/2}}\)-regularizer aims to learn a dictionary, which requires solving a very challenging nonconvex and nonsmooth optimization problem. In addition, the low mutual coherence of a dictionary is an important property that ensures the optimality of the sparse representation in the dictionary. In this paper, we address a dictionary learning problem involving the \({{\ell }_{1/2}}\)-regularizer and the coherence penalty, which is difficult to solve quickly and efficiently. We employ a decomposition scheme and an alternating optimization, which transforms the overall problem into a set of minimizations of single-vector-variable subproblems. Although the subproblems are nonsmooth and even nonconvex, we propose the use of proximal operator technology to conquer them, which leads to a rapid and efficient dictionary learning algorithm. In a theoretical analysis, we establish the algorithm’s global convergence. Experiments were performed for dictionary learning using both synthetic data and real-world data. For the synthetic data, we demonstrated that our algorithm performed better than state-of-the-art algorithms. Using real-world data, the learned dictionaries were shown to be more efficient than algorithms using \({{\ell }_{1}}\)-norm for sparsity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Elad M (2010) Sparse and redundant representation. Springer, Berlin

    Book  MATH  Google Scholar 

  2. Elad M, Figueiredo M, Ma Y (2010) On the role of sparse and redundant representations in image processing. Proc IEEE 98(6):972–982

    Article  Google Scholar 

  3. Huang K, Aviyente S (2006) Sparse representation for signal classification. Proc Conf Neur Inf Process Syst 19:609–616

  4. Engan K, Aase S, Husoy J (1999). Method of optimal directions for frame design. Proc IEEE Int Conf Acoust Speech Signal Process (ICASSP) 5:2443–2446

    Google Scholar 

  5. Aharon M, Elad M, Bruckstein A (2006) K-svd: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54(11):4311–4322

    Article  MATH  Google Scholar 

  6. Dai W, Xu T, Wang W (2012) Simultaneous codeword optimization (simco) for dictionary update and learning. IEEE Trans Signal Process 60(12):6340–6353

    Article  MathSciNet  Google Scholar 

  7. Li Z, Ding S, Li Y (2015) A fast algorithm for learning overcomplete dictionary for sparse representation based on proximal operators. Neural Comput 27(9):1951–1982

    Article  Google Scholar 

  8. Bao C, Ji H, Quan Y, Shen Z (2014) \({{\ell }_{o}}\)-norm-based dictionary learning by proximal methods with global convergence. IEEE Conf Comput Vis Pattern Recognit (CVPR) 3858–3865

  9. Yaghoobi M, Blumensath T, Davies M (2013) Dictionary learning for sparse approximations with the majorization method. IEEE Trans Signal Process 57(6):2178–2191

    Article  MathSciNet  MATH  Google Scholar 

  10. Tropp JA (2004) Greed is good: algorithmic results for sparse approximation. IEEE Trans Inf Theory 50(10):2231–2242

    Article  MathSciNet  MATH  Google Scholar 

  11. Rakotomamonjy A (2013) Direct optimization of the dictionary learning problem. IEEE Trans Signal Process 61(22):5495–5506

    Article  MathSciNet  Google Scholar 

  12. Li Z, Tang Z, Ding S (2013) Dictionary learning by nonnegative matrix factorization with \({{\ell }_{1/2}}\)-norm sparsity constraint. IEEE Int Conf Cybern (CYBCONF2) Lausanne Switz 63–67

  13. Mailhe B, Barchiesi D, Plumbley MD (2012) INK-SVD: Learning incoherent dictionaries for sparse representations. IEEE Int Conf Acoust Speech Signal Process (ICASSP) 3573–3576

  14. Barchiesi D, Plumbley MD (2013) Learning incoherent dictionaries for sparse approximation using iterative projections and rotations. IEEE Trans Signal Process 61(8):2055–2065

    Article  Google Scholar 

  15. Lin T, Liu S, Zha H (2012) Incoherent dictionary learning for sparse representation. IEEE 21st International Conference on Pattern Recognition (ICPR), pp 1237–1240

  16. Moreau JJ (1962) Fonctions convexes duales et points proximaux dans un espace Hilbertien. Comptes Rendues de lAcademie des Sciences de Paris 255:2897–2899

    MathSciNet  MATH  Google Scholar 

  17. Combettes PL, Pesquet J (2010) Proximal splitting methods in signal processing. arXiv:0912.3522v4

  18. Mallat SG, Zhang Z (1993) Matching pursuits with time–frequency dictionaries. IEEE Trans Signal Process 41(12):3397–3415

    Article  MATH  Google Scholar 

  19. Chen SS, Donoho DL, Saunders MA (2001) Atomic decomposition by basis pursuit. SIAM Rev 43(1):129–159

    Article  MathSciNet  MATH  Google Scholar 

  20. Chartrand R, Yin W (2008) Iteratively reweighted algorithms for compressive sensing. Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), pp 3869–3872

  21. Daubechies I, Defrise M, Mol CD (2004) An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Comm Pure Appl Math 57(11):1413–1457

    Article  MathSciNet  MATH  Google Scholar 

  22. Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imag Sci 2(1):183–202

    Article  MathSciNet  MATH  Google Scholar 

  23. Xu ZB, Guo HL, Wang Y, Zhang H (2012) Representative of \({{\ell }_{1/2}}\) regularization among (0 < q ≤ 1) \({{\ell }_{q}}\) regularizations: an experimental study based on phase diagram. Acta Automatica Sinica 38:1225–1228

    MathSciNet  Google Scholar 

  24. Lin J, Lin S, Wang Y, Xu ZB (2014) \({{\ell }_{1/2}}\) Regularization: convergence of iterative half thresholding algorithm. IEEE Trans Signal Process 62(1):2317–2329

    MathSciNet  Google Scholar 

  25. Xu ZB, Chang X, Xu F, Zhang H (2012) \({{\ell }_{1/2}}\) Regularization: a thresholding representation theory and a fast solver. IEEE Trans Neur Networks Learning Syst 23(7):1013–1027

    Article  Google Scholar 

  26. Fazel M, Hindi H, Boyd SP (2003) Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices. Proc Am Control Conf 3:2156–2162

    Google Scholar 

  27. Hoyer PO (2004) Nonnegative matrix factorization with sparseness constraints. J Mach Learn Res 5:1457–1469

    MATH  Google Scholar 

  28. Attouch H, Bolte J, Svaiter BF (2013) Convergence of descent methods for semialgebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math Program Ser A 137(1–2):91–129

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhenni Li.

Appendix A

Appendix A

In this appendix, we give a proof of Theorem 1. Let \({{\mathbf{Z}}^{\left( n \right)}}=({{\mathbf{W}}^{\left( n \right)}},~{{\mathbf{H}}^{\left( n \right)}})\) denote the sequence generated by Algorithm 1. The cost function \(\phi \left( \mathbf{Z} \right)=\text{Q}\left( \mathbf{Z} \right)+\text{G}\left( \mathbf{W} \right)+\text{F}\left( \mathbf{H} \right)\) is a proper, lower semi-continuous function, where \(\text{Q}\left( \mathbf{Z} \right),\text{G}\left( \mathbf{W} \right),\) and \(\text{F}\left( \mathbf{H} \right)\) have been defined in Sect. 3.

Theorem 2

([28]) Assume 8 \(\phi (z)\) is a proper and lower semicontinuous function with inf \(\phi> - \infty.\) The sequence \({{\{{{z}^{(n)}}\}}_{n\in \mathbb{N}}}\) is a Cauchy sequence, and converges to the critical point of \(\phi (z),\) if the following conditions hold:

  • (V1) Sufficient decrease condition. There exists some positive constant \({{\rho }_{1}}\) such that:

    $$\phi \left( {{z}^{\left( n \right)}} \right)-\phi \left( {{z}^{\left( n+1 \right)}} \right)\ge {{\rho }_{1}}{{z}^{\left( n+1 \right)}}-{{z}^{\left( n \right)}}_{F}^{2},\text{ }\!\!~\!\!\text{ }\!\!~\!\!\text{ }\forall \text{ }\!\!~\!\!\text{ }n=1,\text{ }\!\!~\!\!\text{ }2,\text{ }\!\!~\!\!\text{ }\ldots$$
  • (V2) Relative error condition. There exists some positive constant \({{\rho }_{2}}\) > 0 such that:

    $${{\epsilon }^{\left( n+1 \right)}}_{F}\text{ }\!\!~\!\!\text{ }\ge {{\rho }_{2}}{{z}^{\left( n+1 \right)}}-{{z}^{\left( n \right)}}_{F},\text{ }\!\!~\!\!\text{ }\!\!~\!\!\text{ }{{\epsilon }^{\left( n \right)}}\in \partial \phi \left( {{z}^{\left( n \right)}} \right),\text{ }\!\!~\!\!\text{ }\!\!~\!\!\text{ }\forall \text{ }\!\!~\!\!\text{ }n=1,\text{ }\!\!~\!\!\text{ }2,\text{ }\!\!~\!\!\text{ }\ldots$$
  • (V3) Continuity condition. There exists a subsequence \({{\{{{z}^{({{n}_{k}})}}\}}_{k\in \mathbb{N}}}~\) and \(z\) such that:

    $$z^{{(n_{k} )}} \to \bar{z},~~\phi \left( {z^{{\left( {n_{k} } \right)}} } \right) \to \phi \left( {\bar{z}} \right),~~{\text{as}}~k \to + \infty$$
  • (V4) \(\phi (z)\) is a KL function. \(\phi (z)\) satisfies the KL property in its effective domain.

Lemma 1

The sequence \({{\{{{\mathbf{Z}}^{\left( n \right)}}\}}_{n\in \mathbb{N}}}\) satisfies:

\(\left\{ \begin{matrix} \phi \left( \mathbf{T}_{k}^{\left( n+1 \right)},{{\mathbf{W}}^{\left( n \right)}} \right)\le \phi \left( \mathbf{T}_{k-1}^{\left( n+1 \right)},{{\mathbf{W}}^{\left( n \right)}} \right)-\frac{\mu _{k}^{\left( n \right)}}{2}||\mathbf{h}_{k}^{\left( n+1 \right)}-\mathbf{h}_{k}^{\left( n \right)}||_{F}^{2}, \\ \phi \left( {{\mathbf{H}}^{\left( n+1 \right)}},\mathbf{V}_{k}^{\left( n+1 \right)} \right)\le \phi \left( {{\mathbf{H}}^{\left( n+1 \right)}},\mathbf{V}_{k-1}^{\left( n+1 \right)} \right)-\frac{\gamma _{k}^{\left( n \right)}}{2}||\mathbf{w}_{k}^{\left( n+1 \right)}-\mathbf{w}_{k}^{\left( n \right)}||_{F}^{2}. \\\end{matrix} \right.\) for \(1\le k\le r\), where \(\left\{ {\begin{array}{*{20}c} {{\mathbf{T}}_{k}^{{\left( n \right)}} = ({\mathbf{h}}_{1}^{{\left( n \right){\text{T}}}} , \ldots ~{\mathbf{h}}_{k}^{{\left( n \right){\text{T}}}} ,~{\mathbf{h}}_{{k + 1}}^{{\left( {n - 1} \right){\text{T}}}} , \ldots {\mathbf{h}}_{r}^{{(n - 1){\text{T}}}} )^{{\text{T}}} ,{\mathbf{T}}_{0}^{{\left( n \right)}} = {\mathbf{H}}^{{(n - 1)}} ,~} \\ {{\mathbf{V}}_{k}^{{\left( n \right)}} = \left( {{\mathbf{w}}_{1}^{{\left( n \right)}} , \ldots ~{\mathbf{w}}_{k}^{{\left( n \right)}} ,~{\mathbf{w}}_{{k + 1}}^{{\left( {n - 1} \right)}} , \ldots {\mathbf{w}}_{r}^{{\left( {n - 1} \right)}} } \right),{\mathbf{V}}_{0}^{{\left( n \right)}} = {\mathbf{W}}^{{\left( {n - 1} \right)}} ,} \\ \end{array} } \right.\) and \(\mu _{k}^{\left( n \right)}=\mathbf{w}_{k}^{T}{{\mathbf{w}}_{k}}>0\), \(\gamma _{k}^{\left( n \right)}=\mathbf{h}_{k}^{T}{{\mathbf{h}}_{k}}>0.\).

Therefore, we can obtain:

$$\begin{gathered} \phi \left( {{\mathbf{H}}^{{\left( n \right)}} ,{\mathbf{W}}^{{\left( n \right)}} } \right) - \phi \left( {{\mathbf{H}}^{{\left( {n + 1} \right)}} ,{\mathbf{W}}^{{\left( {n + 1} \right)}} } \right)~~ \hfill \\ \quad \ge \mathop \sum \limits_{{k = 1}}^{r} (\frac{{\mu _{k}^{{\left( n \right)}} }}{2}||{\mathbf{h}}_{k}^{{\left( {n + 1} \right)}} - {\mathbf{h}}_{k}^{{\left( n \right)}} ||_{\user2{F}}^{2} + \frac{{\gamma _{k}^{{\left( n \right)}} }}{2}||{\mathbf{w}}_{k}^{{\left( {n + 1} \right)}} - {\mathbf{w}}_{k}^{{\left( n \right)}} ||_{\user2{F}}^{2} ) \hfill \\ \end{gathered}$$

Lemma 2

Define \(\epsilon _{\mathbf{H}}^{\left( n \right)}={{(\epsilon _{\mathbf{H}}^{1\text{T}},\ldots ~\epsilon _{\mathbf{H}}^{r\text{T}})}^{\text{T}}}\) and \(\epsilon _{\mathbf{W}}^{\left( n \right)}=(\epsilon _{\mathbf{W}}^{1},\ldots ~\epsilon _{\mathbf{W}}^{r})\), where \(\left\{ \begin{matrix} \epsilon _{\mathbf{H}}^{k}={{\nabla }_{{{\mathbf{h}}_{k}}}}Q\left( {{\mathbf{Z}}^{\left( n \right)}} \right)-{{\nabla }_{{{\mathbf{h}}_{k}}}}Q\left( \mathbf{T}_{k}^{\left( n \right)},{{\mathbf{W}}^{\left( n-1 \right)}} \right)-\mu _{k}^{\left( n \right)}\left( \mathbf{h}_{k}^{\left( n \right)}-\mathbf{h}_{k}^{\left( n-1 \right)} \right), \\ \epsilon _{\mathbf{W}}^{k}={{\nabla }_{{{\mathbf{w}}_{k}}}}Q\left( {{\mathbf{Z}}^{\left( n \right)}} \right)-{{\nabla }_{{{\mathbf{w}}_{k}}}}Q\left( {{\mathbf{H}}^{\left( n \right)}},\mathbf{V}_{k}^{\left( n \right)} \right)-\gamma _{k}^{\left( n \right)}\left( \mathbf{w}_{k}^{\left( n \right)}-\mathbf{w}_{k}^{\left( n-1 \right)} \right). \\\end{matrix} \right.\)

Then \(\varepsilon ^{{(n)}}\,:= (\varepsilon _{\mathbf{H}}^{{(n)}} ,\varepsilon _{\mathbf{W}}^{{(n)}} ) \in \partial \phi \left( {\mathbf{Z}^{{(n)}} } \right)\) and there exists a constant \(\rho\) > 0 such that: \({{\left\| {{\epsilon }^{\left( n \right)}} \right\|}_{F}}\ge \rho \,\left\| {{\mathbf{Z}}^{\left( n \right)}} \right.\,-{{\left. {{\mathbf{Z}}^{\left( n-1 \right)}} \right\|}_{F}}\).

Lemma 3

The sequence \({{\{{{\mathbf{Z}}^{\left( n \right)}}\}}_{n\in \mathbb{N}}}\) satisfies the continuity condition (V3).

Theorem 3

([28]) The function f is a proper and lower semicontinuous function. If f is semialgebraic then it satisfies the KL property at any point in dom f.

Lemma 4

\(\text{Q}\left( \mathbf{Z} \right)\), \(\text{G}\left( \mathbf{W} \right)\) and \(\text{F}\left( \mathbf{H} \right)\) are semialgebraic functions. The cost function \(\phi \left( \mathbf{Z} \right)=\text{Q}\left( \mathbf{Z} \right)+\text{G}\left( \mathbf{W} \right)+\text{F}\left( \mathbf{H} \right)\) is a semialgebraic function. Therefore, \(\phi \,(\mathbf{Z})\) satisfies the KL property in its effective domain.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Z., Hayashi, T., Ding, S. et al. Dictionary learning with the \({{\ell }_{1/2}}\)-regularizer and the coherence penalty and its convergence analysis. Int. J. Mach. Learn. & Cyber. 9, 1351–1364 (2018). https://doi.org/10.1007/s13042-017-0649-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-017-0649-9

Keywords

Navigation