Dictionary learning with the $${{\ell }_{1/2}}$$ -regularizer and the coherence penalty and its convergence analysis

Li, Zhenni; Hayashi, Takafumi; Ding, Shuxue; Li, Yujie

doi:10.1007/s13042-017-0649-9

Dictionary learning with the ${{\ell }_{1/2}}$-regularizer and the coherence penalty and its convergence analysis

Original Article
Published: 23 March 2017

Volume 9, pages 1351–1364, (2018)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Zhenni Li¹,
Takafumi Hayashi²,
Shuxue Ding¹ &
…
Yujie Li³

294 Accesses
3 Citations
Explore all metrics

Abstract

The ${{\ell }_{1/2}}$-regularizer has been studied widely in compressed sensing, but there have been few studies about dictionary learning problems. The dictionary learning method with the ${{\ell }_{1/2}}$-regularizer aims to learn a dictionary, which requires solving a very challenging nonconvex and nonsmooth optimization problem. In addition, the low mutual coherence of a dictionary is an important property that ensures the optimality of the sparse representation in the dictionary. In this paper, we address a dictionary learning problem involving the ${{\ell }_{1/2}}$-regularizer and the coherence penalty, which is difficult to solve quickly and efficiently. We employ a decomposition scheme and an alternating optimization, which transforms the overall problem into a set of minimizations of single-vector-variable subproblems. Although the subproblems are nonsmooth and even nonconvex, we propose the use of proximal operator technology to conquer them, which leads to a rapid and efficient dictionary learning algorithm. In a theoretical analysis, we establish the algorithm’s global convergence. Experiments were performed for dictionary learning using both synthetic data and real-world data. For the synthetic data, we demonstrated that our algorithm performed better than state-of-the-art algorithms. Using real-world data, the learned dictionaries were shown to be more efficient than algorithms using ${{\ell }_{1}}$-norm for sparsity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Joint Sparse Regularization for Dictionary Learning

Article 10 June 2019

A Convergent Incoherent Dictionary Learning Algorithm for Sparse Coding

Fast Overcomplete Dictionary Construction with Probabilistic Guarantees

Article 09 October 2019

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Elad M (2010) Sparse and redundant representation. Springer, Berlin
Book MATH Google Scholar
Elad M, Figueiredo M, Ma Y (2010) On the role of sparse and redundant representations in image processing. Proc IEEE 98(6):972–982
Article Google Scholar
Huang K, Aviyente S (2006) Sparse representation for signal classification. Proc Conf Neur Inf Process Syst 19:609–616
Engan K, Aase S, Husoy J (1999). Method of optimal directions for frame design. Proc IEEE Int Conf Acoust Speech Signal Process (ICASSP) 5:2443–2446
Google Scholar
Aharon M, Elad M, Bruckstein A (2006) K-svd: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54(11):4311–4322
Article MATH Google Scholar
Dai W, Xu T, Wang W (2012) Simultaneous codeword optimization (simco) for dictionary update and learning. IEEE Trans Signal Process 60(12):6340–6353
Article MathSciNet Google Scholar
Li Z, Ding S, Li Y (2015) A fast algorithm for learning overcomplete dictionary for sparse representation based on proximal operators. Neural Comput 27(9):1951–1982
Article Google Scholar
Bao C, Ji H, Quan Y, Shen Z (2014) ${{\ell }_{o}}$-norm-based dictionary learning by proximal methods with global convergence. IEEE Conf Comput Vis Pattern Recognit (CVPR) 3858–3865
Yaghoobi M, Blumensath T, Davies M (2013) Dictionary learning for sparse approximations with the majorization method. IEEE Trans Signal Process 57(6):2178–2191
Article MathSciNet MATH Google Scholar
Tropp JA (2004) Greed is good: algorithmic results for sparse approximation. IEEE Trans Inf Theory 50(10):2231–2242
Article MathSciNet MATH Google Scholar
Rakotomamonjy A (2013) Direct optimization of the dictionary learning problem. IEEE Trans Signal Process 61(22):5495–5506
Article MathSciNet Google Scholar
Li Z, Tang Z, Ding S (2013) Dictionary learning by nonnegative matrix factorization with ${{\ell }_{1/2}}$-norm sparsity constraint. IEEE Int Conf Cybern (CYBCONF2) Lausanne Switz 63–67
Mailhe B, Barchiesi D, Plumbley MD (2012) INK-SVD: Learning incoherent dictionaries for sparse representations. IEEE Int Conf Acoust Speech Signal Process (ICASSP) 3573–3576
Barchiesi D, Plumbley MD (2013) Learning incoherent dictionaries for sparse approximation using iterative projections and rotations. IEEE Trans Signal Process 61(8):2055–2065
Article Google Scholar
Lin T, Liu S, Zha H (2012) Incoherent dictionary learning for sparse representation. IEEE 21st International Conference on Pattern Recognition (ICPR), pp 1237–1240
Moreau JJ (1962) Fonctions convexes duales et points proximaux dans un espace Hilbertien. Comptes Rendues de lAcademie des Sciences de Paris 255:2897–2899
MathSciNet MATH Google Scholar
Combettes PL, Pesquet J (2010) Proximal splitting methods in signal processing. arXiv:0912.3522v4
Mallat SG, Zhang Z (1993) Matching pursuits with time–frequency dictionaries. IEEE Trans Signal Process 41(12):3397–3415
Article MATH Google Scholar
Chen SS, Donoho DL, Saunders MA (2001) Atomic decomposition by basis pursuit. SIAM Rev 43(1):129–159
Article MathSciNet MATH Google Scholar
Chartrand R, Yin W (2008) Iteratively reweighted algorithms for compressive sensing. Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), pp 3869–3872
Daubechies I, Defrise M, Mol CD (2004) An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Comm Pure Appl Math 57(11):1413–1457
Article MathSciNet MATH Google Scholar
Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imag Sci 2(1):183–202
Article MathSciNet MATH Google Scholar
Xu ZB, Guo HL, Wang Y, Zhang H (2012) Representative of ${{\ell }_{1/2}}$ regularization among (0 < q ≤ 1) ${{\ell }_{q}}$ regularizations: an experimental study based on phase diagram. Acta Automatica Sinica 38:1225–1228
MathSciNet Google Scholar
Lin J, Lin S, Wang Y, Xu ZB (2014) ${{\ell }_{1/2}}$ Regularization: convergence of iterative half thresholding algorithm. IEEE Trans Signal Process 62(1):2317–2329
MathSciNet Google Scholar
Xu ZB, Chang X, Xu F, Zhang H (2012) ${{\ell }_{1/2}}$ Regularization: a thresholding representation theory and a fast solver. IEEE Trans Neur Networks Learning Syst 23(7):1013–1027
Article Google Scholar
Fazel M, Hindi H, Boyd SP (2003) Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices. Proc Am Control Conf 3:2156–2162
Google Scholar
Hoyer PO (2004) Nonnegative matrix factorization with sparseness constraints. J Mach Learn Res 5:1457–1469
MATH Google Scholar
Attouch H, Bolte J, Svaiter BF (2013) Convergence of descent methods for semialgebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math Program Ser A 137(1–2):91–129
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, The University of Aizu, Tsuruga, Ikki-Machi, Aizu-Wakamatsu, Fukushima, 965-8580, Japan
Zhenni Li & Shuxue Ding
Graduate School of Science and Technology, Niigata University, Niigata, Niigata, 950-2181, Japan
Takafumi Hayashi
Artificial Intelligence Center, AIST Tsukuba, Ibaraki, 305-8560, Japan
Yujie Li

Authors

Zhenni Li
View author publications
You can also search for this author in PubMed Google Scholar
Takafumi Hayashi
View author publications
You can also search for this author in PubMed Google Scholar
Shuxue Ding
View author publications
You can also search for this author in PubMed Google Scholar
Yujie Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhenni Li.

Appendix A

In this appendix, we give a proof of Theorem 1. Let ${{\mathbf{Z}}^{\left( n \right)}}=({{\mathbf{W}}^{\left( n \right)}},~{{\mathbf{H}}^{\left( n \right)}})$ denote the sequence generated by Algorithm 1. The cost function $\phi \left( \mathbf{Z} \right)=\text{Q}\left( \mathbf{Z} \right)+\text{G}\left( \mathbf{W} \right)+\text{F}\left( \mathbf{H} \right)$ is a proper, lower semi-continuous function, where $\text{Q}\left( \mathbf{Z} \right),\text{G}\left( \mathbf{W} \right),$ and $\text{F}\left( \mathbf{H} \right)$ have been defined in Sect. 3.

Theorem 2

([28]) Assume 8 $\phi (z)$ is a proper and lower semicontinuous function with inf $\phi> - \infty.$ The sequence ${{\{{{z}^{(n)}}\}}_{n\in \mathbb{N}}}$ is a Cauchy sequence, and converges to the critical point of $\phi (z),$ if the following conditions hold:

(V1) Sufficient decrease condition. There exists some positive constant ${{\rho }_{1}}$ such that:

$$\phi \left( {{z}^{\left( n \right)}} \right)-\phi \left( {{z}^{\left( n+1 \right)}} \right)\ge {{\rho }_{1}}{{z}^{\left( n+1 \right)}}-{{z}^{\left( n \right)}}_{F}^{2},\text{ }\!\!~\!\!\text{ }\!\!~\!\!\text{ }\forall \text{ }\!\!~\!\!\text{ }n=1,\text{ }\!\!~\!\!\text{ }2,\text{ }\!\!~\!\!\text{ }\ldots$$

(V2) Relative error condition. There exists some positive constant ${{\rho }_{2}}$ > 0 such that:

$${{\epsilon }^{\left( n+1 \right)}}_{F}\text{ }\!\!~\!\!\text{ }\ge {{\rho }_{2}}{{z}^{\left( n+1 \right)}}-{{z}^{\left( n \right)}}_{F},\text{ }\!\!~\!\!\text{ }\!\!~\!\!\text{ }{{\epsilon }^{\left( n \right)}}\in \partial \phi \left( {{z}^{\left( n \right)}} \right),\text{ }\!\!~\!\!\text{ }\!\!~\!\!\text{ }\forall \text{ }\!\!~\!\!\text{ }n=1,\text{ }\!\!~\!\!\text{ }2,\text{ }\!\!~\!\!\text{ }\ldots$$

(V3) Continuity condition. There exists a subsequence ${{\{{{z}^{({{n}_{k}})}}\}}_{k\in \mathbb{N}}}~$ and $z$ such that:

$$z^{{(n_{k} )}} \to \bar{z},~~\phi \left( {z^{{\left( {n_{k} } \right)}} } \right) \to \phi \left( {\bar{z}} \right),~~{\text{as}}~k \to + \infty$$

(V4) $\phi (z)$ is a KL function. $\phi (z)$ satisfies the KL property in its effective domain.

Lemma 1

The sequence ${{\{{{\mathbf{Z}}^{\left( n \right)}}\}}_{n\in \mathbb{N}}}$ satisfies:

$\left\{ \begin{matrix} \phi \left( \mathbf{T}_{k}^{\left( n+1 \right)},{{\mathbf{W}}^{\left( n \right)}} \right)\le \phi \left( \mathbf{T}_{k-1}^{\left( n+1 \right)},{{\mathbf{W}}^{\left( n \right)}} \right)-\frac{\mu _{k}^{\left( n \right)}}{2}||\mathbf{h}_{k}^{\left( n+1 \right)}-\mathbf{h}_{k}^{\left( n \right)}||_{F}^{2}, \\ \phi \left( {{\mathbf{H}}^{\left( n+1 \right)}},\mathbf{V}_{k}^{\left( n+1 \right)} \right)\le \phi \left( {{\mathbf{H}}^{\left( n+1 \right)}},\mathbf{V}_{k-1}^{\left( n+1 \right)} \right)-\frac{\gamma _{k}^{\left( n \right)}}{2}||\mathbf{w}_{k}^{\left( n+1 \right)}-\mathbf{w}_{k}^{\left( n \right)}||_{F}^{2}. \\\end{matrix} \right.$ for $1\le k\le r$, where $\left\{ {\begin{array}{*{20}c} {{\mathbf{T}}_{k}^{{\left( n \right)}} = ({\mathbf{h}}_{1}^{{\left( n \right){\text{T}}}} , \ldots ~{\mathbf{h}}_{k}^{{\left( n \right){\text{T}}}} ,~{\mathbf{h}}_{{k + 1}}^{{\left( {n - 1} \right){\text{T}}}} , \ldots {\mathbf{h}}_{r}^{{(n - 1){\text{T}}}} )^{{\text{T}}} ,{\mathbf{T}}_{0}^{{\left( n \right)}} = {\mathbf{H}}^{{(n - 1)}} ,~} \\ {{\mathbf{V}}_{k}^{{\left( n \right)}} = \left( {{\mathbf{w}}_{1}^{{\left( n \right)}} , \ldots ~{\mathbf{w}}_{k}^{{\left( n \right)}} ,~{\mathbf{w}}_{{k + 1}}^{{\left( {n - 1} \right)}} , \ldots {\mathbf{w}}_{r}^{{\left( {n - 1} \right)}} } \right),{\mathbf{V}}_{0}^{{\left( n \right)}} = {\mathbf{W}}^{{\left( {n - 1} \right)}} ,} \\ \end{array} } \right.$ and $\mu _{k}^{\left( n \right)}=\mathbf{w}_{k}^{T}{{\mathbf{w}}_{k}}>0$, $\gamma _{k}^{\left( n \right)}=\mathbf{h}_{k}^{T}{{\mathbf{h}}_{k}}>0.$.

Therefore, we can obtain:

$$\begin{gathered} \phi \left( {{\mathbf{H}}^{{\left( n \right)}} ,{\mathbf{W}}^{{\left( n \right)}} } \right) - \phi \left( {{\mathbf{H}}^{{\left( {n + 1} \right)}} ,{\mathbf{W}}^{{\left( {n + 1} \right)}} } \right)~~ \hfill \\ \quad \ge \mathop \sum \limits_{{k = 1}}^{r} (\frac{{\mu _{k}^{{\left( n \right)}} }}{2}||{\mathbf{h}}_{k}^{{\left( {n + 1} \right)}} - {\mathbf{h}}_{k}^{{\left( n \right)}} ||_{\user2{F}}^{2} + \frac{{\gamma _{k}^{{\left( n \right)}} }}{2}||{\mathbf{w}}_{k}^{{\left( {n + 1} \right)}} - {\mathbf{w}}_{k}^{{\left( n \right)}} ||_{\user2{F}}^{2} ) \hfill \\ \end{gathered}$$

Lemma 2

Define $\epsilon _{\mathbf{H}}^{\left( n \right)}={{(\epsilon _{\mathbf{H}}^{1\text{T}},\ldots ~\epsilon _{\mathbf{H}}^{r\text{T}})}^{\text{T}}}$ and $\epsilon _{\mathbf{W}}^{\left( n \right)}=(\epsilon _{\mathbf{W}}^{1},\ldots ~\epsilon _{\mathbf{W}}^{r})$, where $\left\{ \begin{matrix} \epsilon _{\mathbf{H}}^{k}={{\nabla }_{{{\mathbf{h}}_{k}}}}Q\left( {{\mathbf{Z}}^{\left( n \right)}} \right)-{{\nabla }_{{{\mathbf{h}}_{k}}}}Q\left( \mathbf{T}_{k}^{\left( n \right)},{{\mathbf{W}}^{\left( n-1 \right)}} \right)-\mu _{k}^{\left( n \right)}\left( \mathbf{h}_{k}^{\left( n \right)}-\mathbf{h}_{k}^{\left( n-1 \right)} \right), \\ \epsilon _{\mathbf{W}}^{k}={{\nabla }_{{{\mathbf{w}}_{k}}}}Q\left( {{\mathbf{Z}}^{\left( n \right)}} \right)-{{\nabla }_{{{\mathbf{w}}_{k}}}}Q\left( {{\mathbf{H}}^{\left( n \right)}},\mathbf{V}_{k}^{\left( n \right)} \right)-\gamma _{k}^{\left( n \right)}\left( \mathbf{w}_{k}^{\left( n \right)}-\mathbf{w}_{k}^{\left( n-1 \right)} \right). \\\end{matrix} \right.$

Then $\varepsilon ^{{(n)}}\,:= (\varepsilon _{\mathbf{H}}^{{(n)}} ,\varepsilon _{\mathbf{W}}^{{(n)}} ) \in \partial \phi \left( {\mathbf{Z}^{{(n)}} } \right)$ and there exists a constant $\rho$ > 0 such that: ${{\left\| {{\epsilon }^{\left( n \right)}} \right\|}_{F}}\ge \rho \,\left\| {{\mathbf{Z}}^{\left( n \right)}} \right.\,-{{\left. {{\mathbf{Z}}^{\left( n-1 \right)}} \right\|}_{F}}$.

Lemma 3

The sequence ${{\{{{\mathbf{Z}}^{\left( n \right)}}\}}_{n\in \mathbb{N}}}$ satisfies the continuity condition (V3).

Theorem 3

([28]) The function f is a proper and lower semicontinuous function. If f is semialgebraic then it satisfies the KL property at any point in dom f.

Lemma 4

$\text{Q}\left( \mathbf{Z} \right)$, $\text{G}\left( \mathbf{W} \right)$ and $\text{F}\left( \mathbf{H} \right)$ are semialgebraic functions. The cost function $\phi \left( \mathbf{Z} \right)=\text{Q}\left( \mathbf{Z} \right)+\text{G}\left( \mathbf{W} \right)+\text{F}\left( \mathbf{H} \right)$ is a semialgebraic function. Therefore, $\phi \,(\mathbf{Z})$ satisfies the KL property in its effective domain.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Z., Hayashi, T., Ding, S. et al. Dictionary learning with the ${{\ell }_{1/2}}$-regularizer and the coherence penalty and its convergence analysis. Int. J. Mach. Learn. & Cyber. 9, 1351–1364 (2018). https://doi.org/10.1007/s13042-017-0649-9

Download citation

Received: 17 February 2016
Accepted: 01 February 2017
Published: 23 March 2017
Issue Date: August 2018
DOI: https://doi.org/10.1007/s13042-017-0649-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dictionary learning with the \({{\ell }_{1/2}}\)-regularizer and the coherence penalty and its convergence analysis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Joint Sparse Regularization for Dictionary Learning

A Convergent Incoherent Dictionary Learning Algorithm for Sparse Coding

Fast Overcomplete Dictionary Construction with Probabilistic Guarantees

References

Author information

Authors and Affiliations

Corresponding author

Appendix A

Theorem 2

Lemma 1

Lemma 2

Lemma 3

Theorem 3

Lemma 4

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Dictionary learning with the \({{\ell }_{1/2}}\)-regularizer and the coherence penalty and its convergence analysis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Joint Sparse Regularization for Dictionary Learning

A Convergent Incoherent Dictionary Learning Algorithm for Sparse Coding

Fast Overcomplete Dictionary Construction with Probabilistic Guarantees

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Appendix A

Appendix A

Theorem 2

Lemma 1

Lemma 2

Lemma 3

Theorem 3

Lemma 4

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation