Abstract
The \({{\ell }_{1/2}}\)-regularizer has been studied widely in compressed sensing, but there have been few studies about dictionary learning problems. The dictionary learning method with the \({{\ell }_{1/2}}\)-regularizer aims to learn a dictionary, which requires solving a very challenging nonconvex and nonsmooth optimization problem. In addition, the low mutual coherence of a dictionary is an important property that ensures the optimality of the sparse representation in the dictionary. In this paper, we address a dictionary learning problem involving the \({{\ell }_{1/2}}\)-regularizer and the coherence penalty, which is difficult to solve quickly and efficiently. We employ a decomposition scheme and an alternating optimization, which transforms the overall problem into a set of minimizations of single-vector-variable subproblems. Although the subproblems are nonsmooth and even nonconvex, we propose the use of proximal operator technology to conquer them, which leads to a rapid and efficient dictionary learning algorithm. In a theoretical analysis, we establish the algorithm’s global convergence. Experiments were performed for dictionary learning using both synthetic data and real-world data. For the synthetic data, we demonstrated that our algorithm performed better than state-of-the-art algorithms. Using real-world data, the learned dictionaries were shown to be more efficient than algorithms using \({{\ell }_{1}}\)-norm for sparsity.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Elad M (2010) Sparse and redundant representation. Springer, Berlin
Elad M, Figueiredo M, Ma Y (2010) On the role of sparse and redundant representations in image processing. Proc IEEE 98(6):972–982
Huang K, Aviyente S (2006) Sparse representation for signal classification. Proc Conf Neur Inf Process Syst 19:609–616
Engan K, Aase S, Husoy J (1999). Method of optimal directions for frame design. Proc IEEE Int Conf Acoust Speech Signal Process (ICASSP) 5:2443–2446
Aharon M, Elad M, Bruckstein A (2006) K-svd: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54(11):4311–4322
Dai W, Xu T, Wang W (2012) Simultaneous codeword optimization (simco) for dictionary update and learning. IEEE Trans Signal Process 60(12):6340–6353
Li Z, Ding S, Li Y (2015) A fast algorithm for learning overcomplete dictionary for sparse representation based on proximal operators. Neural Comput 27(9):1951–1982
Bao C, Ji H, Quan Y, Shen Z (2014) \({{\ell }_{o}}\)-norm-based dictionary learning by proximal methods with global convergence. IEEE Conf Comput Vis Pattern Recognit (CVPR) 3858–3865
Yaghoobi M, Blumensath T, Davies M (2013) Dictionary learning for sparse approximations with the majorization method. IEEE Trans Signal Process 57(6):2178–2191
Tropp JA (2004) Greed is good: algorithmic results for sparse approximation. IEEE Trans Inf Theory 50(10):2231–2242
Rakotomamonjy A (2013) Direct optimization of the dictionary learning problem. IEEE Trans Signal Process 61(22):5495–5506
Li Z, Tang Z, Ding S (2013) Dictionary learning by nonnegative matrix factorization with \({{\ell }_{1/2}}\)-norm sparsity constraint. IEEE Int Conf Cybern (CYBCONF2) Lausanne Switz 63–67
Mailhe B, Barchiesi D, Plumbley MD (2012) INK-SVD: Learning incoherent dictionaries for sparse representations. IEEE Int Conf Acoust Speech Signal Process (ICASSP) 3573–3576
Barchiesi D, Plumbley MD (2013) Learning incoherent dictionaries for sparse approximation using iterative projections and rotations. IEEE Trans Signal Process 61(8):2055–2065
Lin T, Liu S, Zha H (2012) Incoherent dictionary learning for sparse representation. IEEE 21st International Conference on Pattern Recognition (ICPR), pp 1237–1240
Moreau JJ (1962) Fonctions convexes duales et points proximaux dans un espace Hilbertien. Comptes Rendues de lAcademie des Sciences de Paris 255:2897–2899
Combettes PL, Pesquet J (2010) Proximal splitting methods in signal processing. arXiv:0912.3522v4
Mallat SG, Zhang Z (1993) Matching pursuits with time–frequency dictionaries. IEEE Trans Signal Process 41(12):3397–3415
Chen SS, Donoho DL, Saunders MA (2001) Atomic decomposition by basis pursuit. SIAM Rev 43(1):129–159
Chartrand R, Yin W (2008) Iteratively reweighted algorithms for compressive sensing. Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), pp 3869–3872
Daubechies I, Defrise M, Mol CD (2004) An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Comm Pure Appl Math 57(11):1413–1457
Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imag Sci 2(1):183–202
Xu ZB, Guo HL, Wang Y, Zhang H (2012) Representative of \({{\ell }_{1/2}}\) regularization among (0 < q ≤ 1) \({{\ell }_{q}}\) regularizations: an experimental study based on phase diagram. Acta Automatica Sinica 38:1225–1228
Lin J, Lin S, Wang Y, Xu ZB (2014) \({{\ell }_{1/2}}\) Regularization: convergence of iterative half thresholding algorithm. IEEE Trans Signal Process 62(1):2317–2329
Xu ZB, Chang X, Xu F, Zhang H (2012) \({{\ell }_{1/2}}\) Regularization: a thresholding representation theory and a fast solver. IEEE Trans Neur Networks Learning Syst 23(7):1013–1027
Fazel M, Hindi H, Boyd SP (2003) Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices. Proc Am Control Conf 3:2156–2162
Hoyer PO (2004) Nonnegative matrix factorization with sparseness constraints. J Mach Learn Res 5:1457–1469
Attouch H, Bolte J, Svaiter BF (2013) Convergence of descent methods for semialgebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math Program Ser A 137(1–2):91–129
Author information
Authors and Affiliations
Corresponding author
Appendix A
Appendix A
In this appendix, we give a proof of Theorem 1. Let \({{\mathbf{Z}}^{\left( n \right)}}=({{\mathbf{W}}^{\left( n \right)}},~{{\mathbf{H}}^{\left( n \right)}})\) denote the sequence generated by Algorithm 1. The cost function \(\phi \left( \mathbf{Z} \right)=\text{Q}\left( \mathbf{Z} \right)+\text{G}\left( \mathbf{W} \right)+\text{F}\left( \mathbf{H} \right)\) is a proper, lower semi-continuous function, where \(\text{Q}\left( \mathbf{Z} \right),\text{G}\left( \mathbf{W} \right),\) and \(\text{F}\left( \mathbf{H} \right)\) have been defined in Sect. 3.
Theorem 2
([28]) Assume 8 \(\phi (z)\) is a proper and lower semicontinuous function with inf \(\phi> - \infty.\) The sequence \({{\{{{z}^{(n)}}\}}_{n\in \mathbb{N}}}\) is a Cauchy sequence, and converges to the critical point of \(\phi (z),\) if the following conditions hold:
-
(V1) Sufficient decrease condition. There exists some positive constant \({{\rho }_{1}}\) such that:
$$\phi \left( {{z}^{\left( n \right)}} \right)-\phi \left( {{z}^{\left( n+1 \right)}} \right)\ge {{\rho }_{1}}{{z}^{\left( n+1 \right)}}-{{z}^{\left( n \right)}}_{F}^{2},\text{ }\!\!~\!\!\text{ }\!\!~\!\!\text{ }\forall \text{ }\!\!~\!\!\text{ }n=1,\text{ }\!\!~\!\!\text{ }2,\text{ }\!\!~\!\!\text{ }\ldots$$
-
(V2) Relative error condition. There exists some positive constant \({{\rho }_{2}}\) > 0 such that:
$${{\epsilon }^{\left( n+1 \right)}}_{F}\text{ }\!\!~\!\!\text{ }\ge {{\rho }_{2}}{{z}^{\left( n+1 \right)}}-{{z}^{\left( n \right)}}_{F},\text{ }\!\!~\!\!\text{ }\!\!~\!\!\text{ }{{\epsilon }^{\left( n \right)}}\in \partial \phi \left( {{z}^{\left( n \right)}} \right),\text{ }\!\!~\!\!\text{ }\!\!~\!\!\text{ }\forall \text{ }\!\!~\!\!\text{ }n=1,\text{ }\!\!~\!\!\text{ }2,\text{ }\!\!~\!\!\text{ }\ldots$$
-
(V3) Continuity condition. There exists a subsequence \({{\{{{z}^{({{n}_{k}})}}\}}_{k\in \mathbb{N}}}~\) and \(z\) such that:
$$z^{{(n_{k} )}} \to \bar{z},~~\phi \left( {z^{{\left( {n_{k} } \right)}} } \right) \to \phi \left( {\bar{z}} \right),~~{\text{as}}~k \to + \infty$$
-
(V4) \(\phi (z)\) is a KL function. \(\phi (z)\) satisfies the KL property in its effective domain.
Lemma 1
The sequence \({{\{{{\mathbf{Z}}^{\left( n \right)}}\}}_{n\in \mathbb{N}}}\) satisfies:
\(\left\{ \begin{matrix} \phi \left( \mathbf{T}_{k}^{\left( n+1 \right)},{{\mathbf{W}}^{\left( n \right)}} \right)\le \phi \left( \mathbf{T}_{k-1}^{\left( n+1 \right)},{{\mathbf{W}}^{\left( n \right)}} \right)-\frac{\mu _{k}^{\left( n \right)}}{2}||\mathbf{h}_{k}^{\left( n+1 \right)}-\mathbf{h}_{k}^{\left( n \right)}||_{F}^{2}, \\ \phi \left( {{\mathbf{H}}^{\left( n+1 \right)}},\mathbf{V}_{k}^{\left( n+1 \right)} \right)\le \phi \left( {{\mathbf{H}}^{\left( n+1 \right)}},\mathbf{V}_{k-1}^{\left( n+1 \right)} \right)-\frac{\gamma _{k}^{\left( n \right)}}{2}||\mathbf{w}_{k}^{\left( n+1 \right)}-\mathbf{w}_{k}^{\left( n \right)}||_{F}^{2}. \\\end{matrix} \right.\) for \(1\le k\le r\), where \(\left\{ {\begin{array}{*{20}c} {{\mathbf{T}}_{k}^{{\left( n \right)}} = ({\mathbf{h}}_{1}^{{\left( n \right){\text{T}}}} , \ldots ~{\mathbf{h}}_{k}^{{\left( n \right){\text{T}}}} ,~{\mathbf{h}}_{{k + 1}}^{{\left( {n - 1} \right){\text{T}}}} , \ldots {\mathbf{h}}_{r}^{{(n - 1){\text{T}}}} )^{{\text{T}}} ,{\mathbf{T}}_{0}^{{\left( n \right)}} = {\mathbf{H}}^{{(n - 1)}} ,~} \\ {{\mathbf{V}}_{k}^{{\left( n \right)}} = \left( {{\mathbf{w}}_{1}^{{\left( n \right)}} , \ldots ~{\mathbf{w}}_{k}^{{\left( n \right)}} ,~{\mathbf{w}}_{{k + 1}}^{{\left( {n - 1} \right)}} , \ldots {\mathbf{w}}_{r}^{{\left( {n - 1} \right)}} } \right),{\mathbf{V}}_{0}^{{\left( n \right)}} = {\mathbf{W}}^{{\left( {n - 1} \right)}} ,} \\ \end{array} } \right.\) and \(\mu _{k}^{\left( n \right)}=\mathbf{w}_{k}^{T}{{\mathbf{w}}_{k}}>0\), \(\gamma _{k}^{\left( n \right)}=\mathbf{h}_{k}^{T}{{\mathbf{h}}_{k}}>0.\).
Therefore, we can obtain:
Lemma 2
Define \(\epsilon _{\mathbf{H}}^{\left( n \right)}={{(\epsilon _{\mathbf{H}}^{1\text{T}},\ldots ~\epsilon _{\mathbf{H}}^{r\text{T}})}^{\text{T}}}\) and \(\epsilon _{\mathbf{W}}^{\left( n \right)}=(\epsilon _{\mathbf{W}}^{1},\ldots ~\epsilon _{\mathbf{W}}^{r})\), where \(\left\{ \begin{matrix} \epsilon _{\mathbf{H}}^{k}={{\nabla }_{{{\mathbf{h}}_{k}}}}Q\left( {{\mathbf{Z}}^{\left( n \right)}} \right)-{{\nabla }_{{{\mathbf{h}}_{k}}}}Q\left( \mathbf{T}_{k}^{\left( n \right)},{{\mathbf{W}}^{\left( n-1 \right)}} \right)-\mu _{k}^{\left( n \right)}\left( \mathbf{h}_{k}^{\left( n \right)}-\mathbf{h}_{k}^{\left( n-1 \right)} \right), \\ \epsilon _{\mathbf{W}}^{k}={{\nabla }_{{{\mathbf{w}}_{k}}}}Q\left( {{\mathbf{Z}}^{\left( n \right)}} \right)-{{\nabla }_{{{\mathbf{w}}_{k}}}}Q\left( {{\mathbf{H}}^{\left( n \right)}},\mathbf{V}_{k}^{\left( n \right)} \right)-\gamma _{k}^{\left( n \right)}\left( \mathbf{w}_{k}^{\left( n \right)}-\mathbf{w}_{k}^{\left( n-1 \right)} \right). \\\end{matrix} \right.\)
Then \(\varepsilon ^{{(n)}}\,:= (\varepsilon _{\mathbf{H}}^{{(n)}} ,\varepsilon _{\mathbf{W}}^{{(n)}} ) \in \partial \phi \left( {\mathbf{Z}^{{(n)}} } \right)\) and there exists a constant \(\rho\) > 0 such that: \({{\left\| {{\epsilon }^{\left( n \right)}} \right\|}_{F}}\ge \rho \,\left\| {{\mathbf{Z}}^{\left( n \right)}} \right.\,-{{\left. {{\mathbf{Z}}^{\left( n-1 \right)}} \right\|}_{F}}\).
Lemma 3
The sequence \({{\{{{\mathbf{Z}}^{\left( n \right)}}\}}_{n\in \mathbb{N}}}\) satisfies the continuity condition (V3).
Theorem 3
([28]) The function f is a proper and lower semicontinuous function. If f is semialgebraic then it satisfies the KL property at any point in dom f.
Lemma 4
\(\text{Q}\left( \mathbf{Z} \right)\), \(\text{G}\left( \mathbf{W} \right)\) and \(\text{F}\left( \mathbf{H} \right)\) are semialgebraic functions. The cost function \(\phi \left( \mathbf{Z} \right)=\text{Q}\left( \mathbf{Z} \right)+\text{G}\left( \mathbf{W} \right)+\text{F}\left( \mathbf{H} \right)\) is a semialgebraic function. Therefore, \(\phi \,(\mathbf{Z})\) satisfies the KL property in its effective domain.
Rights and permissions
About this article
Cite this article
Li, Z., Hayashi, T., Ding, S. et al. Dictionary learning with the \({{\ell }_{1/2}}\)-regularizer and the coherence penalty and its convergence analysis. Int. J. Mach. Learn. & Cyber. 9, 1351–1364 (2018). https://doi.org/10.1007/s13042-017-0649-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-017-0649-9