Structured Overcomplete Sparsifying Transform Learning with Convergence Guarantees and Applications

Wen, Bihan; Ravishankar, Saiprasad; Bresler, Yoram

doi:10.1007/s11263-014-0761-1

Structured Overcomplete Sparsifying Transform Learning with Convergence Guarantees and Applications

Published: 19 October 2014

Volume 114, pages 137–167, (2015)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Bihan Wen¹,
Saiprasad Ravishankar¹ &
Yoram Bresler¹

3162 Accesses
118 Citations
6 Altmetric
Explore all metrics

Abstract

In recent years, sparse signal modeling, especially using the synthesis model has been popular. Sparse coding in the synthesis model is however, NP-hard. Recently, interest has turned to the sparsifying transform model, for which sparse coding is cheap. However, natural images typically contain diverse textures that cannot be sparsified well by a single transform. Hence, in this work, we propose a union of sparsifying transforms model. Sparse coding in this model reduces to a form of clustering. The proposed model is also equivalent to a structured overcomplete sparsifying transform model with block cosparsity, dubbed OCTOBOS. The alternating algorithm introduced for learning such transforms involves simple closed-form solutions. A theoretical analysis provides a convergence guarantee for this algorithm. It is shown to be globally convergent to the set of partial minimizers of the non-convex learning problem. We also show that under certain conditions, the algorithm converges to the set of stationary points of the overall objective. When applied to images, the algorithm learns a collection of well-conditioned square transforms, and a good clustering of patches or textures. The resulting sparse representations for the images are much better than those obtained with a single learned transform, or with analytical transforms. We show the promising performance of the proposed approach in image denoising, which compares quite favorably with approaches involving a single learned square transform or an overcomplete synthesis dictionary, or gaussian mixture models. The proposed denoising method is also faster than the synthesis dictionary based approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the Global-Local Dichotomy in Sparsity Modeling

Dictionary-Sparse Recovery via Thresholding-Based Algorithms

Article 13 May 2015

Simon Foucart

Towards Dictionaries of Optimal Size: A Bayesian Non Parametric Approach

Article 02 July 2016

Hong Phuong Dang & Pierre Chainais

Notes

Some algorithms (e.g., K-SVD) also update the non-zero coefficients of the sparse code $X$ in the dictionary update step.
In fact, the K-SVD method, although popular, does not have any convergence guarantees.
For each $k$, this is identical to the single transform sparse coding problem.
In the remainder of the paper, when certain indexed variables are enclosed within braces, it means that we are considering the set of variables over the range of all the indices.
For example, when vector $Wy$ has no zeros, then the optimal $\hat{x}$ in (P2) has exactly $n-s\ll Kn$ (for large $K$) zeros—all the zeros are concentrated in a single block of $\hat{x}$.
More precisely, the index of the sparse block is also part of the sparse code. This adds just $\log _2 K$ bits per index to the sparse code.
We need a length $n$ code in a square and invertible sub-transform of $W$, in order to perform signal recovery uniquely.
The weights on the log-determinant and Frobenius norm terms are set to the same value in this paper.
On the other hand, if $\lambda $ is a fixed constant, there is no guarantee that the optimal transforms for scaled and un-scaled $Y$ in (P3) are related.
If the transforms have a different spectral norm, they can be trivially scaled to have spectral norm $1/\sqrt{2}$.
This clustering measure will encourage the shrinking of clusters corresponding to any badly conditioned, or badly scaled transforms.
When two or more clusters are equally optimal, then we pick the one corresponding to the lowest cluster index $k$.
Setting $m=n$ for the case $K=1$, this agrees with previous cost analysis for square transform learning using (P3), which has per-iteration cost of $O(n^{2}N)$ (Ravishankar and Bresler 2013c).
The notion that sparsity $s$ scales with the signal dimension $n$ is rather standard. For example, while $s=1$ may work for representing the $4\times 4$ patches of an image in a DCT dictionary with $n=16$, the same sparsity level of $s=1$ for an $n =256^{2}$ DCT dictionary for a $256\times 256$ (vectorized) image would lead to very poor image representation. Therefore, the sparsity $s$ must increase with the size $n$. A typical assumption is that the sparsity $s$ scales as a fraction (e.g., 5 or 10 %) of the image or, patch size $n$. Otherwise, if $s$ were to increase only sub-linearly with $n$, it would imply that larger (more complex) images are somehow better sparsifiable, which is not true in general.
Most of these (synthesis dictionary learning) algorithms have not been demonstrated to be practically useful in applications such as denoising. Bao et al. (2014) show that their method denoises worse than the K-SVD method (Elad and Aharon 2006).
The exact value of $g^{*}$ may vary with initialization. We will empirically illustrate in Sect. 6.2 that our algorithm is also insensitive to initialization.
The regularizer $Q(W_{k}^{t})$ is non-negative by the arguments in the proof of Lemma 1 in Sect. 2.5.
The lower level sets of a function $\hat{f}: A \subset \mathbb {R}^{n} \mapsto \mathbb {R}$ (where $A$ is unbounded) are bounded if $\lim _{t \rightarrow \infty } \hat{f}(x^{t}) = + \infty $ whenever $\left\{ x^{t} \right\} \subset A $ and $\lim _{t \rightarrow \infty } \left\| x^{t} \right\| = \infty $.
This rule is trivially satisfied due to the way Algorithm A1 is written, except perhaps for the case when the superscript $t=0$. In the latter case, if the rule is applicable, it means that the algorithm has already reached a fixed point (the initial $\left( W^{0}, X^{0}, \Gamma ^{0} \right) $ is a fixed point), and therefore, no more iterations are performed. All aforementioned convergence results hold true for this degenerate case.
The uniqueness of the cluster index for each signal $Y_i$ in the iterations of Algorithm A1 for various data sets was empirically observed.
The $\gamma _{i}$’s need to be set accurately for the modified formulation to work well in practice.
The DCT is a popular analytical transform that has been extensively used in compression standards such as JPEG.
The K-SVD method is a highly popular scheme that has been applied to a wide variety of image processing applications (Elad and Aharon 2006; Mairal et al. 2008a). Mairal et al. (2009) have proposed a non-local method for denoising, that also exploits learned dictionaries. A similar extension of OCTOBOS learning-based denoising using non-local means methodology may potentially provide enhanced performance for OCTOBOS. However, such an extension would distract from the focus on the OCTOBOS model in this work. Hence, we leave its investigation for future work. For the sake of simplicity, we compare our overcomplete transform learning scheme to the corresponding overcomplete synthesis dictionary learning scheme K-SVD in this work.
Note that for the case of the DCT, identity, and random initializations, the same matrix is used to initialize all the $W_{k}$’s.
For two matrices $A$ and $B$ of same size, the cross-gram matrix is computed as $AB^{T}$.
The noise level estimates decrease over the iterations (passes through (P8)). We also found empirically that underestimating the noise standard deviation (during each pass through (P8)) led to better performance.
Our MATLAB implementation of OCTOBOS denoising is not currently optimized for efficiency. Therefore, the speedup here is computed by comparing our unoptimized MATLAB implementation to the corresponding MATLAB implementation (Elad 2009) of K-SVD denoising.
Compare this behavior to the monotone increase with $K$ of the recovery PSNR for image representation (see Sect. 6.4).

References

Agarwal, A., Anandkumar, A., Jain, P., Netrapalli, P., & Tandon, R. (2013). Learning sparsely used overcomplete dictionaries via alternating minimization, arXiv:1310.7991, Preprint.
Agarwal, A., Anandkumar, A., Jain, P., Netrapalli, P., & Tandon, R. (2014). Learning sparsely used overcomplete dictionaries. Journal of Machine Learning Research, 35, 1–15.
Google Scholar
Aharon, M., & Elad, M. (2008). Sparse and redundant modeling of image content using an image-signature-dictionary. SIAM Journal on Imaging Sciences, 1(3), 228–247.
Article MathSciNet MATH Google Scholar
Aharon, M., Elad, M., & Bruckstein, A. (2006). K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing, 54(11), 4311–4322.
Article Google Scholar
Arora, S., Ge, R., & Moitra, A. (2013). New algorithms for learning incoherent and overcomplete dictionaries. arXiv:1308.6273v5.pdf, Preprint
Bao, C., Ji, H., Quan, Y., & Shen, Z. (2014). $\ell _{0}$ Norm based dictionary learning by proximal methods with global convergence. In IEEE Conference on Computer Vision and Pattern Recognition. Online: http://www.math.nus.edu.sg/~matzuows/BJQS.pdf, to appear
Brodatz, P. (1966). Textures: A photographic album for artists and designers. New York: Dover.
Bruckstein, A. M., Donoho, D. L., & Elad, M. (2009). From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Review, 51(1), 34–81.
Article MathSciNet MATH Google Scholar
Candès, E. J., Donoho, D. L. (1999). Curvelets: A surprisingly effective nonadaptive representation for objects with edges. In Curves and surfaces (pp. 105–120). Nashville: Vanderbilt University Press.
Candès, E. J., & Donoho, D. L. (1999). Ridgelets: A key to higher-dimensional intermittency? Philosophical Transactions of the Royal Society of London Series A: Mathematical, Physical and Engineering Sciences, 357(1760), 2495–2509.
Article MATH Google Scholar
Candès, E. J., Eldar, Y. C., Needell, D., & Randall, P. (2011). Compressed sensing with coherent and redundant dictionaries. Applied and Computational Harmonic Analysis, 31(1), 59–73.
Article MathSciNet MATH Google Scholar
Chambolle, A. (2004). An algorithm for total variation minimization and applications. Journal of Mathematical Imaging and Vision, 20(1–2), 89–97.
MathSciNet Google Scholar
Chen, S. S., Donoho, D. L., & Saunders, M. A. (1998). Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing, 20(1), 33–61.
Article MathSciNet Google Scholar
Chen, Y., Pock, T., & Bischof, H. (2012a). Learning $\ell _{1}$-based analysis and synthesis sparsity priors using bi-level optimization. In Proceedings of the Workshop on Analysis Operator Learning vs. Dictionary Learning, NIPS. arXiv:1401.4105
Chen, Y. C., Sastry, C. S., Patel, V. M., Phillips, P. J., & Chellappa, R. (2012b). Rotation invariant simultaneous clustering and dictionary learning. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1053–1056).
Chi, Y. T., Ali, M., Rajwade, A., & Ho, J. (2013). Block and group regularized sparse modeling for dictionary learning. In CVPR (pp. 377–382).
Dabov, K., Foi, A., Katkovnik, V., & Egiazarian, K. (2007). Image denoising by sparse 3D transform-domain collaborative filtering. IEEE Transactions on Image Processing, 16(8), 2080–2095.
Article MathSciNet Google Scholar
Dabov, K., Foi, A., Katkovnik, V., & Egiazarian, K. (2011). BM3D web page. Retrieved 2014, from http://www.cs.tut.fi/~foi/GCF-BM3D/
Dai, W., & Milenkovic, O. (2009). Subspace pursuit for compressive sensing signal reconstruction. IEEE Transactions on Information Theory, 55(5), 2230–2249.
Article MathSciNet Google Scholar
Davis, G., Mallat, S., & Avellaneda, M. (1997). Adaptive greedy approximations. Journal of Constructive Approximation, 13(1), 57–98.
Article MathSciNet MATH Google Scholar
Do, M. N., & Vetterli, M. (2005). The contourlet transform: An efficient directional multiresolution image representation. IEEE Transactions on Image Processing, 14(12), 2091–2106.
Article MathSciNet Google Scholar
Donoho, D. L., & Elad, M. (2003). Optimally sparse representation in general (nonorthogonal) dictionaries via $\ell ^1$ minimization. Proceedings of the National Academy of Sciences, 100(5), 2197–2202.
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. Annals of Statistics, 32, 407–499.
Article MathSciNet MATH Google Scholar
Elad, M. (2009). Michael Elad personal page. http://www.cs.technion.ac.il/~elad/Various/KSVD_Matlab_ToolBox.zip. Accessed 2014.
Elad, M., & Aharon, M. (2006). Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing, 15(12), 3736–3745.
Article MathSciNet Google Scholar
Elad, M., Milanfar, P., & Rubinstein, R. (2007). Analysis versus synthesis in signal priors. Inverse Problems, 23(3), 947–968.
Article MathSciNet MATH Google Scholar
Engan, K., Aase, S., & Hakon-Husoy, J. (1999). Method of optimal directions for frame design. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (pp. 2443–2446).
Giryes, R., Nam, S., Elad, M., Gribonval, R., & Davies, M. (2014). Greedy-like algorithms for the cosparse analysis model. Linear Algebra and its Applications, 441:22–60, Special Issue on Sparse Approximate Solution of Linear Systems.
Gorodnitsky, I. F., George, J., & Rao, B. D. (1995). Neuromagnetic source imaging with FOCUSS: A recursive weighted minimum norm algorithm. Electrocephalography and Clinical Neurophysiology, 95, 231–251.
Article Google Scholar
Harikumar, G., & Bresler, Y. (1996). A new algorithm for computing sparse solutions to linear inverse problems. In ICASSP (pp. 1331–1334).
Hawe, S., Kleinsteuber, M., & Diepold, K. (2013). Analysis operator learning and its application to image reconstruction. IEEE Transactions on Image Processing, 22(6), 2138–2150.
Article MathSciNet Google Scholar
He, D.-C., & Safia, A. (2013). Multiband Texture Database. http://multibandtexture.recherche.usherbrooke.ca/original_brodatz.html. Accessed 2014.
Kong, S., & Wang, D. (2012). A dictionary learning approach for classification: Separating the particularity and the commonality. In Proceedings of the 12th European Conference on Computer Vision (pp 186–199).
Liao, H. Y., & Sapiro, G. (2008). Sparse representations for limited data tomography. In Proceedings of the IEEE International Symposium on Biomedical Imaging (ISBI) (pp 1375–1378).
Liu, Y., Tiebin, M., & Li, S. (2012). Compressed sensing with general frames via optimal-dual-based $\ell _{1}$-analysis. IEEE Transactions on Information Theory, 58(7), 4201–4214.
Article Google Scholar
Mairal, J., Elad, M., & Sapiro, G. (2008a). Sparse representation for color image restoration. IEEE Transactions on Image Processing, 17(1), 53–69.
Article MathSciNet Google Scholar
Mairal, J., Sapiro, G., & Elad, M. (2008b). Learning multiscale sparse representations for image and video restoration. SIAM, Multiscale Modeling and Simulation, 7(1), 214–241.
Article MathSciNet MATH Google Scholar
Mairal, J., Bach, F., Ponce, J., Sapiro, G, & Zisserman, A. (2009). Non-local sparse models for image restoration. In IEEE International Conference on Computer Vision (pp. 2272–2279).
Mairal, J., Bach, F., Ponce, J., & Sapiro, G. (2010). Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research, 11, 19–60.
MathSciNet MATH Google Scholar
Mallat, S. (1999). A wavelet tour of signal processing. Boston: Academic Press.
MATH Google Scholar
Mallat, S. G., & Zhang, Z. (1993). Matching pursuits with time-frequency dictionaries. IEEE Transactions on Signal Processing, 41(12), 3397–3415.
Article MATH Google Scholar
Marcellin, M. W., Gormish, M. J., Bilgin, A., & Boliek, M. P. (2000). An overview of JPEG-2000. In Proceedings of the Data Compression Conference (pp. 523–541).
Nam, S., Davies, M. E., Elad, M., & Gribonval, R. (2011). Cosparse analysis modeling: Uniqueness and algorithms. In ICASSP (pp. 5804–5807).
Natarajan, B. K. (1995). Sparse approximate solutions to linear systems. SIAM Journal on Computing, 24(2), 227–234.
Article MathSciNet MATH Google Scholar
Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583), 607–609.
Article Google Scholar
Ophir, B., Elad, M., Bertin, N., & Plumbley, M. (2011). Sequential minimal eigenvalues: An approach to analysis dictionary learning. In Proceedings of the European Signal Processing Conference (EUSIPCO) (pp. 1465–1469).
Pati, Y., Rezaiifar, R., & Krishnaprasad, P. (1993). Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition. In Asilomar Conference on Signals, Systems and Computers (Vol. 1, pp. 40–44).
Peleg, T., & Elad, M. (2014). A statistical prediction model based on sparse representations for single image super-resolution. IEEE Transactions on Image Processing, 23(6), 2569–2582.
Peyré, G., & Fadili, J. (2011). Learning analysis sparsity priors. In Proceedings of the International Conference on Sampling Theory and Applications (SampTA), Singapore. http://hal.archives-ouvertes.fr/hal-00542016/. Accessed 2014.
Pfister, L. (2013). Tomographic reconstruction with adaptive sparsifying transforms. Master’s Thesis, University of Illinois at Urbana-Champaign.
Pfister, L., & Bresler, Y. (2014). Model-based iterative tomographic reconstruction with adaptive sparsifying transforms. In S. P. I. E. International (Ed.), Symposium on Electronic Imaging: Computational Imaging XII, to appear.
Pratt, W. K., Kane, J., & Andrews, H. C. (1969). Hadamard transform image coding. Proceedings of the IEEE, 57(1), 58–68.
Article Google Scholar
Ramirez, I., Sprechmann, P., & Sapiro, G. (2010). Classification and clustering via dictionary learning with structured incoherence and shared features. In 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3501–3508).
Ravishankar, S., & Bresler, Y. (2011a). MR image reconstruction from highly undersampled k-space data by dictionary learning. IEEE Transactions on Medical Imaging, 30(5), 1028–1041.
Article Google Scholar
Ravishankar, S., & Bresler, Y. (2011b). Multiscale dictionary learning for MRI. In Proceedings of ISMRM (p. 2830).
Ravishankar, S., & Bresler, Y. (2012a). Learning doubly sparse transforms for image representation. In IEEE International Conference on Image Processing (pp 685–688).
Ravishankar, S., & Bresler, Y. (2012b). Learning sparsifying transforms for signal and image processing. In SIAM Conference on Imaging Science (p. 51).
Ravishankar, S., & Bresler, Y. (2013a). Closed-form solutions within sparsifying transform learning. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (pp. 5378–5382).
Ravishankar, S., & Bresler, Y. (2013b). Learning doubly sparse transforms for images. IEEE Transactions on Image Processing, 22(12), 4598–4612.
Article MathSciNet Google Scholar
Ravishankar, S., & Bresler, Y. (2013c). Learning sparsifying transforms. IEEE Transactions on Signal Processing, 61(5), 1072–1086.
Article MathSciNet Google Scholar
Ravishankar, S., & Bresler, Y. (2013d). Sparsifying transform learning for compressed sensing MRI. In Proceedings of the IEEE International Symposium on Biomedical Imaging (pp. 17–20).
Ravishankar, S., & Bresler, Y. (2014). $\ell _0$ Sparsifying transform learning with efficient optimal updates and convergence guarantees. In IEEE Transactions on Signal Processing. (submitted). https://uofi.box.com/s/vrw0i13jbkj6n8xh9u9h. Accessed 2014.
Ravishankar, S., & Bresler, Y. (2014). Online sparsifying transform learning: Part II: Convergence analysis. IEEE Journal of Selected Topics in Signal Process. (accepted). https://uofi.box.com/s/cmqme2avnz5pygobxj3u. Accessed 2014.
Rubinstein, R., & Elad, M. (2011). K-SVD dictionary-learning for analysis sparse models. In Proceedings of SPARS11 (p. 73).
Rubinstein, R., Bruckstein, A. M., & Elad, M. (2010). Dictionaries for sparse representation modeling. Proceedings of the IEEE, 98(6), 1045–1057.
Article Google Scholar
Rubinstein, R., Faktor, T., & Elad, M. (2012). K-SVD dictionary-learning for the analysis sparse model. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5405–5408).
Sadeghi, M., Babaie-Zadeh, M., & Jutten, C. (2013). Dictionary learning for sparse representation: A novel approach. IEEE Signal Processing Letters, 20(12), 1195–1198.
Article Google Scholar
Sahoo, S. K., & Makur, A. (2013). Dictionary training for sparse representation as generalization of k-means clustering. IEEE Signal Processing Letters, 20(6), 587–590.
Article Google Scholar
Skretting, K., & Engan, K. (2010). Recursive least squares dictionary learning algorithm. IEEE Transactions on Signal Processing, 58(4), 2121–2130.
Article MathSciNet Google Scholar
Smith, L. N., & Elad, M. (2013). Improving dictionary learning: Multiple dictionary updates and coefficient reuse. IEEE Signal Processing Letters, 20(1), 79–82.
Article Google Scholar
Spielman, D. A., Wang, H., & Wright, J. (2012). Exact recovery of sparsely-used dictionaries. In Proceedings of the 25th Annual Conference on Learning Theory (pp. 37.1–37.18).
Sprechmann, P., Bronstein, A., & Sapiro, G. (2012a). Learning efficient structured sparse models. In Proceedings of the 29th International Conference on Machine Learning (Vol. 1, pp. 615–622).
Sprechmann, P., Bronstein, A. M., Sapiro, G. (2012b). Learning efficient sparse and low rank models. arXiv:1212.3631, Preprint.
Wang, S., Zhang, L., Liang, Y., Pan, Q. (2012). Semi-coupled dictionary learning with applications to image super-resolution and photo-sketch synthesis. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 2216–2223).
Weiss, Y. (2011). Yair Weiss home page. Retrieved 2014, from http://www.cs.huji.ac.il/~daniez/epllcode.zip.
Xu, Y., Yin, W. (2013). A fast patch-dictionary method for whole-image recovery ftp://ftp.math.ucla.edu/pub/camreport/cam13-38.pdf, UCLA CAM report 13–38.
Yaghoobi, M., Blumensath, T., & Davies, M. (2009). Dictionary learning for sparse approximations with the majorization method. IEEE Transaction on Signal Processing, 57(6), 2178–2191.
Article MathSciNet Google Scholar
Yaghoobi, M., Nam, S., Gribonval, R., & Davies, M. (2011). Analysis operator learning for overcomplete cosparse representations. In European Signal Processing Conference (EUSIPCO) (pp. 1470–1474).
Yaghoobi, M., Nam, S., Gribonval, R., & Davies, M. E. (2012). Noise aware analysis operator learning for approximately cosparse signals. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5409–5412).
Yaghoobi, M., Nam, S., Gribonval, R., & Davies, M. E. (2013). Constrained overcomplete analysis operator learning for cosparse signal modelling. IEEE Transactions on Signal Processing, 61(9), 2341–2355.
Article Google Scholar
Yu, G., Sapiro, G., & Mallat, S. (2012). Solving inverse problems with piecewise linear estimators: From gaussian mixture models to structured sparsity. IEEE Transactions on Image Processing, 21(5), 2481–2499.
Article MathSciNet Google Scholar
Zelnik-Manor, L., Rosenblum, K., & Eldar, Y. C. (2012). Dictionary optimization for block-sparse representations. IEEE Transactions on Signal Processing, 60(5), 2386–2395.
Article MathSciNet Google Scholar
Zoran, D., & Weiss, Y. (2011). From learning models of natural image patches to whole image restoration. In IEEE International Conference on Computer Vision (pp. 479–486).

Download references

Acknowledgments

Part of this work was supported by the National Science Foundation (NSF) under Grants CCF-1018660 and CCF-1320953.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering and the Coordinated Science Laboratory, University of Illinois, Urbana-Champaign, IL, 61801, USA
Bihan Wen, Saiprasad Ravishankar & Yoram Bresler

Authors

Bihan Wen
View author publications
You can also search for this author in PubMed Google Scholar
Saiprasad Ravishankar
View author publications
You can also search for this author in PubMed Google Scholar
Yoram Bresler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bihan Wen.

Additional information

Communicated by Julien Mairal, Francis Bach, and Michael Elad.

Saiprasad Ravishankar and Bihan Wen have contributed equally to this work.

Appendix: Useful Lemmas

Here, we list three results (from (Ravishankar and Bresler 2014)) that are used in our convergence proof. The following result is from the Appendix of (Ravishankar and Bresler 2014).

Lemma 10

Consider a bounded vector sequence $\left\{ \alpha ^{k} \right\} $ with $\alpha ^{k} \in \mathbb {R}^{n}$, that converges to $\alpha ^{*}$. Then, every accumulation point of $\left\{ H_{s}(\alpha ^{k}) \right\} $ belongs to the set $\tilde{H_{s}}(\alpha ^{*})$.

The following result is based on the proof of Lemma 6 of (Ravishankar and Bresler 2014).

Lemma 11

Let $ \left\{ W^{q_t}, X^{q_t} \right\} $ with $W^{q_t} \in \mathbb {R}^{n \times n}$, $X^{q_t} \in \mathbb {R}^{n \times N}$, be a subsequence of $ \left\{ W^{t}, X^{t} \right\} $ converging to the accumulation point $ (W^{*}, X^{*})$. Let $Z \in \mathbb {R}^{n \times N}$ and $L^{-1} = \left( ZZ^{T} + \lambda I \right) ^{-1/2}$, with $\lambda >0$. Further, let $ Q^{q_t} \Sigma ^{q_t} \left( R^{q_t} \right) ^{T}$ denote the full singular value decomposition of $ L^{-1} Z (X^{q_t})^{T}$. Let

$$\begin{aligned} W^{q_{t} + 1} = \frac{R^{q_{t}}}{2} \left( \Sigma ^{q_{t}} + \left( \left( \Sigma ^{q_{t}} \right) ^{2}+2\lambda I \right) ^{\frac{1}{2}}\right) \left( Q^{q_{t}} \right) ^{T}L^{-1} \end{aligned}$$

and suppose that $ \left\{ W^{q_{t} + 1} \right\} $ converges to $W^{**}$. Then,

$$\begin{aligned} W^{**} \in \arg \min _{W} \left\| WZ-X^{*} \right\| _{F}^{2}+\lambda \left\| W \right\| _{F}^{2}- \lambda \log \,\left| \mathrm {det \,} W \right| \end{aligned}$$

(53)

The following result is based on the proof of Lemma 9 of (Ravishankar and Bresler 2014). Note that $\phi (X)$ is the barrier function defined in Section 4.

Lemma 12

Given $Z \in \mathbb {R}^{n \times N_{1}}$, $\lambda >0$, and $s \ge 0$, consider the function $g: \mathbb {R}^{n \times n} \times \mathbb {R}^{n \times N_{1}} \mapsto \mathbb {R}$ defined as $g(W, X) = \left\| WZ-X \right\| _{F}^{2} + \lambda \left\| W \right\| _{F}^{2}$ $ - \lambda \log \,\left| \mathrm {det \,} W \right| + \phi (X)$ for $W \in \mathbb {R}^{n \times n}$ and $X \in \mathbb {R}^{n \times N_{1}}$. Further, let $(\hat{W}, \hat{X})$ be a pair in $ \mathbb {R}^{n \times n} \times \mathbb {R}^{n \times N_{1}} $ satisfying

$$\begin{aligned}&2 \hat{W} Z Z^{T} - 2 \hat{X}Z^{T} + 2 \lambda \hat{W} - \lambda \hat{W}^{-T} = 0 \end{aligned}$$

(54)

$$\begin{aligned}&\hat{X}_{i} \in \tilde{H_{s}}(\hat{W}Z_{i}), \quad \forall \, 1 \le i \le N_{1} \end{aligned}$$

(55)

Then, the following condition holds at $(\hat{W}, \hat{X})$.

$$\begin{aligned} g(\hat{W}+dW, \hat{X}+\Delta X) \ge g(\hat{W}, \hat{X}) \end{aligned}$$

(56)

The condition holds for all sufficiently small $dW \in \mathbb {R}^{n \times n}$ satisfying $\left\| dW \right\| _{F} \le \epsilon '$ for some $\epsilon ' >0$ that depends on $\hat{W}$, and all $\Delta X \in \mathbb {R}^{n \times N}$ in the union of the following regions.

R1.:: The half-space $tr\left\{ (\hat{W}Z - \hat{X})\Delta X^{T} \right\} \le 0$.
R2.:: The local region defined by $\left\| \Delta X \right\| _{\infty } < \min _{i}\left\{ \beta _{s}(\hat{W} Z_{i}) : \left\| \hat{W} Z_{i} \right\| _{0}>s \right\} $.

Furthermore, if we have $ \left\| \hat{W} Z_{i} \right\| _{0} \le s \, \forall \, i$, then $\Delta X$ can be arbitrary.

The following lemma is a slightly modified version of the one in (Ravishankar et al. 2014). We only state the minor modifications to the previous proof, for the following lemma to hold.

The lemma implies Lipschitz continuity (and therefore, continuity) of the function $u(B) \triangleq \left\| B y \,{-}\, H_{s}(By) \right\| _{2}^{2} $ on a bounded set.

Lemma 13

Given $c_{0}>0$, and $y \in \mathbb {R}^{n}$ satisfying $\left\| y \right\| _{2} \le c_{0} $, and a constant $c'>0$, the function $u(B) = \left\| B y \,{-}\, H_{s}(By) \right\| _{2}^{2} $ is uniformly Lipschitz with respect to $B$ on the bounded set $S \triangleq \left\{ B \in \mathbb {R}^{n \times n} : \left\| B \right\| _{2} \le c' \right\} $.

Proof

The proof is identical to that for Lemma 4 in (Ravishankar et al. 2014), except that the conditions $ \left\| y \right\| _{2}=1$ and $\left\| B \right\| _{2} \le 1$ in (Ravishankar et al. 2014) are replaced by the conditions $\left\| y \right\| _{2} \le c_{0} $ and $\left\| B \right\| _{2} \le c'$ for the proof here. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wen, B., Ravishankar, S. & Bresler, Y. Structured Overcomplete Sparsifying Transform Learning with Convergence Guarantees and Applications. Int J Comput Vis 114, 137–167 (2015). https://doi.org/10.1007/s11263-014-0761-1

Download citation

Received: 10 February 2014
Accepted: 01 September 2014
Published: 19 October 2014
Issue Date: September 2015
DOI: https://doi.org/10.1007/s11263-014-0761-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Structured Overcomplete Sparsifying Transform Learning with Convergence Guarantees and Applications

Abstract

Access this article

Similar content being viewed by others

On the Global-Local Dichotomy in Sparsity Modeling

Dictionary-Sparse Recovery via Thresholding-Based Algorithms

Towards Dictionaries of Optimal Size: A Bayesian Non Parametric Approach

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Useful Lemmas

Lemma 10

Lemma 11

Lemma 12

Lemma 13

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Structured Overcomplete Sparsifying Transform Learning with Convergence Guarantees and Applications

Abstract

Access this article

Similar content being viewed by others

On the Global-Local Dichotomy in Sparsity Modeling

Dictionary-Sparse Recovery via Thresholding-Based Algorithms

Towards Dictionaries of Optimal Size: A Bayesian Non Parametric Approach

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Useful Lemmas

Appendix: Useful Lemmas

Lemma 10

Lemma 11

Lemma 12

Lemma 13

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation