Skip to main content
Log in

Structured Overcomplete Sparsifying Transform Learning with Convergence Guarantees and Applications

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

In recent years, sparse signal modeling, especially using the synthesis model has been popular. Sparse coding in the synthesis model is however, NP-hard. Recently, interest has turned to the sparsifying transform model, for which sparse coding is cheap. However, natural images typically contain diverse textures that cannot be sparsified well by a single transform. Hence, in this work, we propose a union of sparsifying transforms model. Sparse coding in this model reduces to a form of clustering. The proposed model is also equivalent to a structured overcomplete sparsifying transform model with block cosparsity, dubbed OCTOBOS. The alternating algorithm introduced for learning such transforms involves simple closed-form solutions. A theoretical analysis provides a convergence guarantee for this algorithm. It is shown to be globally convergent to the set of partial minimizers of the non-convex learning problem. We also show that under certain conditions, the algorithm converges to the set of stationary points of the overall objective. When applied to images, the algorithm learns a collection of well-conditioned square transforms, and a good clustering of patches or textures. The resulting sparse representations for the images are much better than those obtained with a single learned transform, or with analytical transforms. We show the promising performance of the proposed approach in image denoising, which compares quite favorably with approaches involving a single learned square transform or an overcomplete synthesis dictionary, or gaussian mixture models. The proposed denoising method is also faster than the synthesis dictionary based approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. Some algorithms (e.g., K-SVD) also update the non-zero coefficients of the sparse code \(X\) in the dictionary update step.

  2. In fact, the K-SVD method, although popular, does not have any convergence guarantees.

  3. For each \(k\), this is identical to the single transform sparse coding problem.

  4. In the remainder of the paper, when certain indexed variables are enclosed within braces, it means that we are considering the set of variables over the range of all the indices.

  5. For example, when vector \(Wy\) has no zeros, then the optimal \(\hat{x}\) in (P2) has exactly \(n-s\ll Kn\) (for large \(K\)) zeros—all the zeros are concentrated in a single block of \(\hat{x}\).

  6. More precisely, the index of the sparse block is also part of the sparse code. This adds just \(\log _2 K\) bits per index to the sparse code.

  7. We need a length \(n\) code in a square and invertible sub-transform of \(W\), in order to perform signal recovery uniquely.

  8. The weights on the log-determinant and Frobenius norm terms are set to the same value in this paper.

  9. On the other hand, if \(\lambda \) is a fixed constant, there is no guarantee that the optimal transforms for scaled and un-scaled \(Y\) in (P3) are related.

  10. If the transforms have a different spectral norm, they can be trivially scaled to have spectral norm \(1/\sqrt{2}\).

  11. This clustering measure will encourage the shrinking of clusters corresponding to any badly conditioned, or badly scaled transforms.

  12. When two or more clusters are equally optimal, then we pick the one corresponding to the lowest cluster index \(k\).

  13. Setting \(m=n\) for the case \(K=1\), this agrees with previous cost analysis for square transform learning using (P3), which has per-iteration cost of \(O(n^{2}N)\) (Ravishankar and Bresler 2013c).

  14. The notion that sparsity \(s\) scales with the signal dimension \(n\) is rather standard. For example, while \(s=1\) may work for representing the \(4\times 4\) patches of an image in a DCT dictionary with \(n=16\), the same sparsity level of \(s=1\) for an \(n =256^{2}\) DCT dictionary for a \(256\times 256\) (vectorized) image would lead to very poor image representation. Therefore, the sparsity \(s\) must increase with the size \(n\). A typical assumption is that the sparsity \(s\) scales as a fraction (e.g., 5 or 10 %) of the image or, patch size \(n\). Otherwise, if \(s\) were to increase only sub-linearly with \(n\), it would imply that larger (more complex) images are somehow better sparsifiable, which is not true in general.

  15. Most of these (synthesis dictionary learning) algorithms have not been demonstrated to be practically useful in applications such as denoising. Bao et al. (2014) show that their method denoises worse than the K-SVD method (Elad and Aharon 2006).

  16. The exact value of \(g^{*}\) may vary with initialization. We will empirically illustrate in Sect. 6.2 that our algorithm is also insensitive to initialization.

  17. The regularizer \(Q(W_{k}^{t})\) is non-negative by the arguments in the proof of Lemma 1 in Sect. 2.5.

  18. The lower level sets of a function \(\hat{f}: A \subset \mathbb {R}^{n} \mapsto \mathbb {R}\) (where \(A\) is unbounded) are bounded if \(\lim _{t \rightarrow \infty } \hat{f}(x^{t}) = + \infty \) whenever \(\left\{ x^{t} \right\} \subset A \) and \(\lim _{t \rightarrow \infty } \left\| x^{t} \right\| = \infty \).

  19. This rule is trivially satisfied due to the way Algorithm A1 is written, except perhaps for the case when the superscript \(t=0\). In the latter case, if the rule is applicable, it means that the algorithm has already reached a fixed point (the initial \(\left( W^{0}, X^{0}, \Gamma ^{0} \right) \) is a fixed point), and therefore, no more iterations are performed. All aforementioned convergence results hold true for this degenerate case.

  20. The uniqueness of the cluster index for each signal \(Y_i\) in the iterations of Algorithm A1 for various data sets was empirically observed.

  21. The \(\gamma _{i}\)’s need to be set accurately for the modified formulation to work well in practice.

  22. The DCT is a popular analytical transform that has been extensively used in compression standards such as JPEG.

  23. The K-SVD method is a highly popular scheme that has been applied to a wide variety of image processing applications (Elad and Aharon 2006; Mairal et al. 2008a). Mairal et al. (2009) have proposed a non-local method for denoising, that also exploits learned dictionaries. A similar extension of OCTOBOS learning-based denoising using non-local means methodology may potentially provide enhanced performance for OCTOBOS. However, such an extension would distract from the focus on the OCTOBOS model in this work. Hence, we leave its investigation for future work. For the sake of simplicity, we compare our overcomplete transform learning scheme to the corresponding overcomplete synthesis dictionary learning scheme K-SVD in this work.

  24. Note that for the case of the DCT, identity, and random initializations, the same matrix is used to initialize all the \(W_{k}\)’s.

  25. For two matrices \(A\) and \(B\) of same size, the cross-gram matrix is computed as \(AB^{T}\).

  26. The noise level estimates decrease over the iterations (passes through (P8)). We also found empirically that underestimating the noise standard deviation (during each pass through (P8)) led to better performance.

  27. Our MATLAB implementation of OCTOBOS denoising is not currently optimized for efficiency. Therefore, the speedup here is computed by comparing our unoptimized MATLAB implementation to the corresponding MATLAB implementation (Elad 2009) of K-SVD denoising.

  28. Compare this behavior to the monotone increase with \(K\) of the recovery PSNR for image representation (see Sect. 6.4).

References

  • Agarwal, A., Anandkumar, A., Jain, P., Netrapalli, P., & Tandon, R. (2013). Learning sparsely used overcomplete dictionaries via alternating minimization, arXiv:1310.7991, Preprint.

  • Agarwal, A., Anandkumar, A., Jain, P., Netrapalli, P., & Tandon, R. (2014). Learning sparsely used overcomplete dictionaries. Journal of Machine Learning Research, 35, 1–15.

    Google Scholar 

  • Aharon, M., & Elad, M. (2008). Sparse and redundant modeling of image content using an image-signature-dictionary. SIAM Journal on Imaging Sciences, 1(3), 228–247.

    Article  MathSciNet  MATH  Google Scholar 

  • Aharon, M., Elad, M., & Bruckstein, A. (2006). K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing, 54(11), 4311–4322.

    Article  Google Scholar 

  • Arora, S., Ge, R., & Moitra, A. (2013). New algorithms for learning incoherent and overcomplete dictionaries. arXiv:1308.6273v5.pdf, Preprint

  • Bao, C., Ji, H., Quan, Y., & Shen, Z. (2014). \(\ell _{0}\) Norm based dictionary learning by proximal methods with global convergence. In IEEE Conference on Computer Vision and Pattern Recognition. Online: http://www.math.nus.edu.sg/~matzuows/BJQS.pdf, to appear

  • Brodatz, P. (1966). Textures: A photographic album for artists and designers. New York: Dover.

  • Bruckstein, A. M., Donoho, D. L., & Elad, M. (2009). From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Review, 51(1), 34–81.

    Article  MathSciNet  MATH  Google Scholar 

  • Candès, E. J., Donoho, D. L. (1999). Curvelets: A surprisingly effective nonadaptive representation for objects with edges. In Curves and surfaces (pp. 105–120). Nashville: Vanderbilt University Press.

  • Candès, E. J., & Donoho, D. L. (1999). Ridgelets: A key to higher-dimensional intermittency? Philosophical Transactions of the Royal Society of London Series A: Mathematical, Physical and Engineering Sciences, 357(1760), 2495–2509.

    Article  MATH  Google Scholar 

  • Candès, E. J., Eldar, Y. C., Needell, D., & Randall, P. (2011). Compressed sensing with coherent and redundant dictionaries. Applied and Computational Harmonic Analysis, 31(1), 59–73.

    Article  MathSciNet  MATH  Google Scholar 

  • Chambolle, A. (2004). An algorithm for total variation minimization and applications. Journal of Mathematical Imaging and Vision, 20(1–2), 89–97.

    MathSciNet  Google Scholar 

  • Chen, S. S., Donoho, D. L., & Saunders, M. A. (1998). Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing, 20(1), 33–61.

    Article  MathSciNet  Google Scholar 

  • Chen, Y., Pock, T., & Bischof, H. (2012a). Learning \(\ell _{1}\)-based analysis and synthesis sparsity priors using bi-level optimization. In Proceedings of the Workshop on Analysis Operator Learning vs. Dictionary Learning, NIPS. arXiv:1401.4105

  • Chen, Y. C., Sastry, C. S., Patel, V. M., Phillips, P. J., & Chellappa, R. (2012b). Rotation invariant simultaneous clustering and dictionary learning. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1053–1056).

  • Chi, Y. T., Ali, M., Rajwade, A., & Ho, J. (2013). Block and group regularized sparse modeling for dictionary learning. In CVPR (pp. 377–382).

  • Dabov, K., Foi, A., Katkovnik, V., & Egiazarian, K. (2007). Image denoising by sparse 3D transform-domain collaborative filtering. IEEE Transactions on Image Processing, 16(8), 2080–2095.

    Article  MathSciNet  Google Scholar 

  • Dabov, K., Foi, A., Katkovnik, V., & Egiazarian, K. (2011). BM3D web page. Retrieved 2014, from http://www.cs.tut.fi/~foi/GCF-BM3D/

  • Dai, W., & Milenkovic, O. (2009). Subspace pursuit for compressive sensing signal reconstruction. IEEE Transactions on Information Theory, 55(5), 2230–2249.

    Article  MathSciNet  Google Scholar 

  • Davis, G., Mallat, S., & Avellaneda, M. (1997). Adaptive greedy approximations. Journal of Constructive Approximation, 13(1), 57–98.

    Article  MathSciNet  MATH  Google Scholar 

  • Do, M. N., & Vetterli, M. (2005). The contourlet transform: An efficient directional multiresolution image representation. IEEE Transactions on Image Processing, 14(12), 2091–2106.

    Article  MathSciNet  Google Scholar 

  • Donoho, D. L., & Elad, M. (2003). Optimally sparse representation in general (nonorthogonal) dictionaries via \(\ell ^1\) minimization. Proceedings of the National Academy of Sciences, 100(5), 2197–2202.

  • Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. Annals of Statistics, 32, 407–499.

    Article  MathSciNet  MATH  Google Scholar 

  • Elad, M. (2009). Michael Elad personal page. http://www.cs.technion.ac.il/~elad/Various/KSVD_Matlab_ToolBox.zip. Accessed 2014.

  • Elad, M., & Aharon, M. (2006). Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing, 15(12), 3736–3745.

    Article  MathSciNet  Google Scholar 

  • Elad, M., Milanfar, P., & Rubinstein, R. (2007). Analysis versus synthesis in signal priors. Inverse Problems, 23(3), 947–968.

    Article  MathSciNet  MATH  Google Scholar 

  • Engan, K., Aase, S., & Hakon-Husoy, J. (1999). Method of optimal directions for frame design. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (pp. 2443–2446).

  • Giryes, R., Nam, S., Elad, M., Gribonval, R., & Davies, M. (2014). Greedy-like algorithms for the cosparse analysis model. Linear Algebra and its Applications, 441:22–60, Special Issue on Sparse Approximate Solution of Linear Systems.

  • Gorodnitsky, I. F., George, J., & Rao, B. D. (1995). Neuromagnetic source imaging with FOCUSS: A recursive weighted minimum norm algorithm. Electrocephalography and Clinical Neurophysiology, 95, 231–251.

    Article  Google Scholar 

  • Harikumar, G., & Bresler, Y. (1996). A new algorithm for computing sparse solutions to linear inverse problems. In ICASSP (pp. 1331–1334).

  • Hawe, S., Kleinsteuber, M., & Diepold, K. (2013). Analysis operator learning and its application to image reconstruction. IEEE Transactions on Image Processing, 22(6), 2138–2150.

    Article  MathSciNet  Google Scholar 

  • He, D.-C., & Safia, A. (2013). Multiband Texture Database. http://multibandtexture.recherche.usherbrooke.ca/original_brodatz.html. Accessed 2014.

  • Kong, S., & Wang, D. (2012). A dictionary learning approach for classification: Separating the particularity and the commonality. In Proceedings of the 12th European Conference on Computer Vision (pp 186–199).

  • Liao, H. Y., & Sapiro, G. (2008). Sparse representations for limited data tomography. In Proceedings of the IEEE International Symposium on Biomedical Imaging (ISBI) (pp 1375–1378).

  • Liu, Y., Tiebin, M., & Li, S. (2012). Compressed sensing with general frames via optimal-dual-based \(\ell _{1}\)-analysis. IEEE Transactions on Information Theory, 58(7), 4201–4214.

    Article  Google Scholar 

  • Mairal, J., Elad, M., & Sapiro, G. (2008a). Sparse representation for color image restoration. IEEE Transactions on Image Processing, 17(1), 53–69.

    Article  MathSciNet  Google Scholar 

  • Mairal, J., Sapiro, G., & Elad, M. (2008b). Learning multiscale sparse representations for image and video restoration. SIAM, Multiscale Modeling and Simulation, 7(1), 214–241.

    Article  MathSciNet  MATH  Google Scholar 

  • Mairal, J., Bach, F., Ponce, J., Sapiro, G, & Zisserman, A. (2009). Non-local sparse models for image restoration. In IEEE International Conference on Computer Vision (pp. 2272–2279).

  • Mairal, J., Bach, F., Ponce, J., & Sapiro, G. (2010). Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research, 11, 19–60.

    MathSciNet  MATH  Google Scholar 

  • Mallat, S. (1999). A wavelet tour of signal processing. Boston: Academic Press.

    MATH  Google Scholar 

  • Mallat, S. G., & Zhang, Z. (1993). Matching pursuits with time-frequency dictionaries. IEEE Transactions on Signal Processing, 41(12), 3397–3415.

    Article  MATH  Google Scholar 

  • Marcellin, M. W., Gormish, M. J., Bilgin, A., & Boliek, M. P. (2000). An overview of JPEG-2000. In Proceedings of the Data Compression Conference (pp. 523–541).

  • Nam, S., Davies, M. E., Elad, M., & Gribonval, R. (2011). Cosparse analysis modeling: Uniqueness and algorithms. In ICASSP (pp. 5804–5807).

  • Natarajan, B. K. (1995). Sparse approximate solutions to linear systems. SIAM Journal on Computing, 24(2), 227–234.

    Article  MathSciNet  MATH  Google Scholar 

  • Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583), 607–609.

    Article  Google Scholar 

  • Ophir, B., Elad, M., Bertin, N., & Plumbley, M. (2011). Sequential minimal eigenvalues: An approach to analysis dictionary learning. In Proceedings of the European Signal Processing Conference (EUSIPCO) (pp. 1465–1469).

  • Pati, Y., Rezaiifar, R., & Krishnaprasad, P. (1993). Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition. In Asilomar Conference on Signals, Systems and Computers (Vol. 1, pp. 40–44).

  • Peleg, T., & Elad, M. (2014). A statistical prediction model based on sparse representations for single image super-resolution. IEEE Transactions on Image Processing, 23(6), 2569–2582.

  • Peyré, G., & Fadili, J. (2011). Learning analysis sparsity priors. In Proceedings of the International Conference on Sampling Theory and Applications (SampTA), Singapore. http://hal.archives-ouvertes.fr/hal-00542016/. Accessed 2014.

  • Pfister, L. (2013). Tomographic reconstruction with adaptive sparsifying transforms. Master’s Thesis, University of Illinois at Urbana-Champaign.

  • Pfister, L., & Bresler, Y. (2014). Model-based iterative tomographic reconstruction with adaptive sparsifying transforms. In S. P. I. E. International (Ed.), Symposium on Electronic Imaging: Computational Imaging XII, to appear.

  • Pratt, W. K., Kane, J., & Andrews, H. C. (1969). Hadamard transform image coding. Proceedings of the IEEE, 57(1), 58–68.

    Article  Google Scholar 

  • Ramirez, I., Sprechmann, P., & Sapiro, G. (2010). Classification and clustering via dictionary learning with structured incoherence and shared features. In 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3501–3508).

  • Ravishankar, S., & Bresler, Y. (2011a). MR image reconstruction from highly undersampled k-space data by dictionary learning. IEEE Transactions on Medical Imaging, 30(5), 1028–1041.

    Article  Google Scholar 

  • Ravishankar, S., & Bresler, Y. (2011b). Multiscale dictionary learning for MRI. In Proceedings of ISMRM (p. 2830).

  • Ravishankar, S., & Bresler, Y. (2012a). Learning doubly sparse transforms for image representation. In IEEE International Conference on Image Processing (pp 685–688).

  • Ravishankar, S., & Bresler, Y. (2012b). Learning sparsifying transforms for signal and image processing. In SIAM Conference on Imaging Science (p. 51).

  • Ravishankar, S., & Bresler, Y. (2013a). Closed-form solutions within sparsifying transform learning. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (pp. 5378–5382).

  • Ravishankar, S., & Bresler, Y. (2013b). Learning doubly sparse transforms for images. IEEE Transactions on Image Processing, 22(12), 4598–4612.

    Article  MathSciNet  Google Scholar 

  • Ravishankar, S., & Bresler, Y. (2013c). Learning sparsifying transforms. IEEE Transactions on Signal Processing, 61(5), 1072–1086.

    Article  MathSciNet  Google Scholar 

  • Ravishankar, S., & Bresler, Y. (2013d). Sparsifying transform learning for compressed sensing MRI. In Proceedings of the IEEE International Symposium on Biomedical Imaging (pp. 17–20).

  • Ravishankar, S., & Bresler, Y. (2014). \(\ell _0\) Sparsifying transform learning with efficient optimal updates and convergence guarantees. In IEEE Transactions on Signal Processing. (submitted). https://uofi.box.com/s/vrw0i13jbkj6n8xh9u9h. Accessed 2014.

  • Ravishankar, S., & Bresler, Y. (2014). Online sparsifying transform learning: Part II: Convergence analysis. IEEE Journal of Selected Topics in Signal Process. (accepted). https://uofi.box.com/s/cmqme2avnz5pygobxj3u. Accessed 2014.

  • Rubinstein, R., & Elad, M. (2011). K-SVD dictionary-learning for analysis sparse models. In Proceedings of SPARS11 (p. 73).

  • Rubinstein, R., Bruckstein, A. M., & Elad, M. (2010). Dictionaries for sparse representation modeling. Proceedings of the IEEE, 98(6), 1045–1057.

    Article  Google Scholar 

  • Rubinstein, R., Faktor, T., & Elad, M. (2012). K-SVD dictionary-learning for the analysis sparse model. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5405–5408).

  • Sadeghi, M., Babaie-Zadeh, M., & Jutten, C. (2013). Dictionary learning for sparse representation: A novel approach. IEEE Signal Processing Letters, 20(12), 1195–1198.

    Article  Google Scholar 

  • Sahoo, S. K., & Makur, A. (2013). Dictionary training for sparse representation as generalization of k-means clustering. IEEE Signal Processing Letters, 20(6), 587–590.

    Article  Google Scholar 

  • Skretting, K., & Engan, K. (2010). Recursive least squares dictionary learning algorithm. IEEE Transactions on Signal Processing, 58(4), 2121–2130.

    Article  MathSciNet  Google Scholar 

  • Smith, L. N., & Elad, M. (2013). Improving dictionary learning: Multiple dictionary updates and coefficient reuse. IEEE Signal Processing Letters, 20(1), 79–82.

    Article  Google Scholar 

  • Spielman, D. A., Wang, H., & Wright, J. (2012). Exact recovery of sparsely-used dictionaries. In Proceedings of the 25th Annual Conference on Learning Theory (pp. 37.1–37.18).

  • Sprechmann, P., Bronstein, A., & Sapiro, G. (2012a). Learning efficient structured sparse models. In Proceedings of the 29th International Conference on Machine Learning (Vol. 1, pp. 615–622).

  • Sprechmann, P., Bronstein, A. M., Sapiro, G. (2012b). Learning efficient sparse and low rank models. arXiv:1212.3631, Preprint.

  • Wang, S., Zhang, L., Liang, Y., Pan, Q. (2012). Semi-coupled dictionary learning with applications to image super-resolution and photo-sketch synthesis. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 2216–2223).

  • Weiss, Y. (2011). Yair Weiss home page. Retrieved 2014, from http://www.cs.huji.ac.il/~daniez/epllcode.zip.

  • Xu, Y., Yin, W. (2013). A fast patch-dictionary method for whole-image recovery ftp://ftp.math.ucla.edu/pub/camreport/cam13-38.pdf, UCLA CAM report 13–38.

  • Yaghoobi, M., Blumensath, T., & Davies, M. (2009). Dictionary learning for sparse approximations with the majorization method. IEEE Transaction on Signal Processing, 57(6), 2178–2191.

    Article  MathSciNet  Google Scholar 

  • Yaghoobi, M., Nam, S., Gribonval, R., & Davies, M. (2011). Analysis operator learning for overcomplete cosparse representations. In European Signal Processing Conference (EUSIPCO) (pp. 1470–1474).

  • Yaghoobi, M., Nam, S., Gribonval, R., & Davies, M. E. (2012). Noise aware analysis operator learning for approximately cosparse signals. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5409–5412).

  • Yaghoobi, M., Nam, S., Gribonval, R., & Davies, M. E. (2013). Constrained overcomplete analysis operator learning for cosparse signal modelling. IEEE Transactions on Signal Processing, 61(9), 2341–2355.

    Article  Google Scholar 

  • Yu, G., Sapiro, G., & Mallat, S. (2012). Solving inverse problems with piecewise linear estimators: From gaussian mixture models to structured sparsity. IEEE Transactions on Image Processing, 21(5), 2481–2499.

    Article  MathSciNet  Google Scholar 

  • Zelnik-Manor, L., Rosenblum, K., & Eldar, Y. C. (2012). Dictionary optimization for block-sparse representations. IEEE Transactions on Signal Processing, 60(5), 2386–2395.

    Article  MathSciNet  Google Scholar 

  • Zoran, D., & Weiss, Y. (2011). From learning models of natural image patches to whole image restoration. In IEEE International Conference on Computer Vision (pp. 479–486).

Download references

Acknowledgments

Part of this work was supported by the National Science Foundation (NSF) under Grants CCF-1018660 and CCF-1320953.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bihan Wen.

Additional information

Communicated by Julien Mairal, Francis Bach, and Michael Elad.

Saiprasad Ravishankar and Bihan Wen have contributed equally to this work.

Appendix: Useful Lemmas

Appendix: Useful Lemmas

Here, we list three results (from (Ravishankar and Bresler 2014)) that are used in our convergence proof. The following result is from the Appendix of (Ravishankar and Bresler 2014).

Lemma 10

Consider a bounded vector sequence \(\left\{ \alpha ^{k} \right\} \) with \(\alpha ^{k} \in \mathbb {R}^{n}\), that converges to \(\alpha ^{*}\). Then, every accumulation point of \(\left\{ H_{s}(\alpha ^{k}) \right\} \) belongs to the set \(\tilde{H_{s}}(\alpha ^{*})\).

The following result is based on the proof of Lemma 6 of (Ravishankar and Bresler 2014).

Lemma 11

Let \( \left\{ W^{q_t}, X^{q_t} \right\} \) with \(W^{q_t} \in \mathbb {R}^{n \times n}\), \(X^{q_t} \in \mathbb {R}^{n \times N}\), be a subsequence of \( \left\{ W^{t}, X^{t} \right\} \) converging to the accumulation point \( (W^{*}, X^{*})\). Let \(Z \in \mathbb {R}^{n \times N}\) and \(L^{-1} = \left( ZZ^{T} + \lambda I \right) ^{-1/2}\), with \(\lambda >0\). Further, let \( Q^{q_t} \Sigma ^{q_t} \left( R^{q_t} \right) ^{T}\) denote the full singular value decomposition of \( L^{-1} Z (X^{q_t})^{T}\). Let

$$\begin{aligned} W^{q_{t} + 1} = \frac{R^{q_{t}}}{2} \left( \Sigma ^{q_{t}} + \left( \left( \Sigma ^{q_{t}} \right) ^{2}+2\lambda I \right) ^{\frac{1}{2}}\right) \left( Q^{q_{t}} \right) ^{T}L^{-1} \end{aligned}$$

and suppose that \( \left\{ W^{q_{t} + 1} \right\} \) converges to \(W^{**}\). Then,

$$\begin{aligned} W^{**} \in \arg \min _{W} \left\| WZ-X^{*} \right\| _{F}^{2}+\lambda \left\| W \right\| _{F}^{2}- \lambda \log \,\left| \mathrm {det \,} W \right| \end{aligned}$$
(53)

The following result is based on the proof of Lemma 9 of (Ravishankar and Bresler 2014). Note that \(\phi (X)\) is the barrier function defined in Section 4.

Lemma 12

Given \(Z \in \mathbb {R}^{n \times N_{1}}\), \(\lambda >0\), and \(s \ge 0\), consider the function \(g: \mathbb {R}^{n \times n} \times \mathbb {R}^{n \times N_{1}} \mapsto \mathbb {R}\) defined as \(g(W, X) = \left\| WZ-X \right\| _{F}^{2} + \lambda \left\| W \right\| _{F}^{2}\) \( - \lambda \log \,\left| \mathrm {det \,} W \right| + \phi (X)\) for \(W \in \mathbb {R}^{n \times n}\) and \(X \in \mathbb {R}^{n \times N_{1}}\). Further, let \((\hat{W}, \hat{X})\) be a pair in \( \mathbb {R}^{n \times n} \times \mathbb {R}^{n \times N_{1}} \) satisfying

$$\begin{aligned}&2 \hat{W} Z Z^{T} - 2 \hat{X}Z^{T} + 2 \lambda \hat{W} - \lambda \hat{W}^{-T} = 0 \end{aligned}$$
(54)
$$\begin{aligned}&\hat{X}_{i} \in \tilde{H_{s}}(\hat{W}Z_{i}), \quad \forall \, 1 \le i \le N_{1} \end{aligned}$$
(55)

Then, the following condition holds at \((\hat{W}, \hat{X})\).

$$\begin{aligned} g(\hat{W}+dW, \hat{X}+\Delta X) \ge g(\hat{W}, \hat{X}) \end{aligned}$$
(56)

The condition holds for all sufficiently small \(dW \in \mathbb {R}^{n \times n}\) satisfying \(\left\| dW \right\| _{F} \le \epsilon '\) for some \(\epsilon ' >0\) that depends on \(\hat{W}\), and all \(\Delta X \in \mathbb {R}^{n \times N}\) in the union of the following regions.

R1.:

The half-space \(tr\left\{ (\hat{W}Z - \hat{X})\Delta X^{T} \right\} \le 0\).

R2.:

The local region defined by \(\left\| \Delta X \right\| _{\infty } < \min _{i}\left\{ \beta _{s}(\hat{W} Z_{i}) : \left\| \hat{W} Z_{i} \right\| _{0}>s \right\} \).

Furthermore, if we have \( \left\| \hat{W} Z_{i} \right\| _{0} \le s \, \forall \, i\), then \(\Delta X\) can be arbitrary.

The following lemma is a slightly modified version of the one in (Ravishankar et al. 2014). We only state the minor modifications to the previous proof, for the following lemma to hold.

The lemma implies Lipschitz continuity (and therefore, continuity) of the function \(u(B) \triangleq \left\| B y \,{-}\, H_{s}(By) \right\| _{2}^{2} \) on a bounded set.

Lemma 13

Given \(c_{0}>0\), and \(y \in \mathbb {R}^{n}\) satisfying \(\left\| y \right\| _{2} \le c_{0} \), and a constant \(c'>0\), the function \(u(B) = \left\| B y \,{-}\, H_{s}(By) \right\| _{2}^{2} \) is uniformly Lipschitz with respect to \(B\) on the bounded set \(S \triangleq \left\{ B \in \mathbb {R}^{n \times n} : \left\| B \right\| _{2} \le c' \right\} \).

Proof

The proof is identical to that for Lemma 4 in (Ravishankar et al. 2014), except that the conditions \( \left\| y \right\| _{2}=1\) and \(\left\| B \right\| _{2} \le 1\) in (Ravishankar et al. 2014) are replaced by the conditions \(\left\| y \right\| _{2} \le c_{0} \) and \(\left\| B \right\| _{2} \le c'\) for the proof here. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wen, B., Ravishankar, S. & Bresler, Y. Structured Overcomplete Sparsifying Transform Learning with Convergence Guarantees and Applications. Int J Comput Vis 114, 137–167 (2015). https://doi.org/10.1007/s11263-014-0761-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-014-0761-1

Keywords

Navigation