Elsevier

Neurocomputing

Volume 200, 5 August 2016, Pages 88-109
Neurocomputing

Image/video compressive sensing recovery using joint adaptive sparsity measure

https://doi.org/10.1016/j.neucom.2016.03.013Get rights and content

Highlights

  • A new sparsity measure called joint adaptive sparsity measure (JASM) is established.

  • The proposed JASM enforces both local and nonlocal 3D sparsity in transform domain.

  • Two novel techniques for high-fidelity CS image/video recovery via JASM are proposed.

  • Extensive experimental results validate the effectiveness of the proposed methods.

Abstract

Compressive sensing (CS) is a recently emerging technique and an extensively studied problem in signal and image processing, which enables joint sampling and compression into a unified approach. Recently, local smoothness and nonlocal self-similarity have both led to superior sparsity priors for CS image restoration. In this paper, first, a new sparsity measure called joint adaptive sparsity measure (JASM) is introduced. The proposed JASM enforces both local sparsity and nonlocal 3D sparsity in transform domain, concurrently, providing a powerful mechanism for characterizing the structured sparsities of natural image. More precisely, the local sparsity depicts the local smoothness redundancies exploited by an adaptively learned sparsifying basis, and the nonlocal 3D sparsity corresponds to the nonlocal self-similarity constraint achieved by a new proposed nonlocal statistical sparse modeling. Then, two novel techniques for high-fidelity CS image and video recovery via JASM are proposed. The proposed methods are formulated in the form of minimization functional under regularization-based framework which is solved via an efficient alternating minimization algorithm based on split Bregman framework. Comprehensive experimental results are reported to manifest the effectiveness of the proposed methods compared with the current state-of-the-art methods in CS image/video restoration.

Introduction

Due to great efforts by Candès et al. [1], [2] and Donoho [3], compressive sensing (CS)—also called compressed sensing or compressive sampling—suggests a new framework for simultaneous sampling and compression of signals at a rate significantly below the Nyquist rate. It also permits that under certain conditions, the original signal can be reconstructed properly from a small set of measurements using sparsity-promoting nonlinear recovery algorithms.

Suppose we wish to recover a real value finite length signal uRn from a finite length observation fRm (with mn) and there is a linear projection between themf=Φu+e,where ΦRm×n is a sensing matrix and eRm×1 denotes the additive noise. Since the number of unknowns is much more than the observations, clearly we are not able to recover every u from f and it is generally considered as an ill-posed problem. However, if u is sufficiently sparse in the sense that it can be written as a superposition of a small number of vectors taken from a known (sparsifying) transform domain basis (t=n) or frame (t>n) ΨRt×n or even adaptively learned sparsifying (ALS) basis, such that Ψu contains only a small set of significant entries (e.g., s<mn nonzero coefficients), then the exact recovery of u is possible. In order to solve the reconstruction problem with a reasonable accuracy and robustness to the noise, the estimation of u is formulated as an unconstrained Lagrangian optimization problem as:minu{12fΦu22+λR(u)}.The term R(u) could be various choices, e.g., up, Ψup where p{0,1}, total variation TV(u) [4] or Bregman distance [5]. The optimization problem given in Eq. (2) incorporates the prior information about the original signal. The first term in Eq. (2) is a penalty that represents the closeness of the solution to the observed scene and quantifies the “prediction error” with respect to the measurements. The second term in Eq. (2) is a regularization term that represents a priori sparse information of the original scene and also it is designed to penalize an estimate that would not exhibit the expected properties. Also, λ is a regularization parameter that balances the contribution of both terms. This minimizing problem can be solved easily by an iterative shrinkage/thresholding (IST) method (e.g., [6], [7]) or Bregman iterative algorithms (e.g., [8], [9]).

Much efforts have been made to develop an effective regularization term R(u), to reflect the image prior knowledge. The classical smoothing regularization terms, such as the quadratic Tikhonov regularization [10] and the total variation (TV) [4] regularization, utilize local structural patterns and are built based on the assumption that images are locally smooth except at edges. More specifically, these models favor piecewise constant image structures, and hence tend to smooth much the image details. Nonetheless, they cannot deal well with image details and fine structures (resulting in staircase artifacts and contrast losses), since they only exploit the local statistics, neglecting the nonlocal statistics of images [11], [12].

Stemming from the sparsity and the local statistics, images are often composed of localized patterns (e.g., textures and structures) that repeat themselves at distant locations in the image domain. Hence, nonlocal regularizers can effectively model long-range dependencies and yield improvements in reconstruction results. Inspired by the success of nonlocal means (NLM) filtering for image denoising [13], many nonlocal regularization-based methods have also been proposed for various image processing applications [11], [12], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], and also CS image restoration [28], [29], [30], [31], [32], [33], [34].

In recent works, the sparsity and the nonlocal self-similarity properties are usually combined into the final cost functional of image restoration solution to achieve better performance. In [28], a nonlocal total variation (NLTV) regularization model for CS image recovery is proposed, which is solved efficiently with Bregman iteration method. A combinational regularization parameter, using a reweighted TV and a weighted-based nonlocal sparse constraint, for CS image recovery is proposed in [29]. The work in [30] proposed an adaptive sparsity regularization term for CS image recovery process, which incorporated the local piecewise autoregressive model and a weighted-based nonlocal self-similarity constraint. In [31], the sparsity regularization parameters (which are locally estimated), together with a weighted-based nonlocal self-similarity constraint, are incorporated into the overall cost functional of image restoration solution to improve the image quality. A model-assisted adaptive recovery of CS (MARX-PC) is proposed in [32], which exploits both the local structural sparsity and nonlocal self-similarity, leading to an efficient CS recovery scheme. In [33], a nonlocal low-rank regularization approach toward exploiting the structured sparsity for CS image recovery is proposed. The proposed model in [33] consists of two components: patch grouping for characterizing the self-similarity of the signal and low-rank approximation for sparsity enforcement. The work in [34] proposed a strategy for CS image recovery via collaborative sparsity (RCoS) modeling. The local 2D sparsity and the nonlocal 3D sparsity are simultaneously imposed in RCoS enabling a natural image to be highly sparse in an adaptive hybrid space-transform domain.

Recently, the idea of CS for imaging (single pixel camera [35], [36]) has been extended to the conventional predictive/distributed video coding, to develop highly desirable compressive video sensing (CVS)/distributed compressive video sensing (DCVS). CVS employs both data acquiring (video sensing) and compression into a unified task which emerges a new procedure to directly acquiring compressed video data via random projection (without temporally storing the complete raw data) for each individual frame in a low complexity encoder. In this case, the majority of computational burden is shifted from the encoder side to the decoder side, which is more suitable to deploy in modern video applications, e.g., video surveillance systems and wireless multimedia sensor networks.

Several CVS recovery methods have already been proposed. Wakin et al. [37] proposed an intuitive (motion JPEG motivated) approach which extends compressive image sensing to video applications by considering each frame of the video sequence independently, and recovers each frame using the 2D discrete wavelet transform (2D DWT), individually. Since compressed image sensing techniques explore the spatial redundancy within an image, this simple extension fails to address the temporal redundancy in video. To enhance the signal sparsity in both spatial and temporal domains and achieve higher sampling efficiency, several frames can be jointly considered as a signal and recovered under a 3D transform (e.g., 3D DWT) [37]. Park and Wakin [38] proposed a multi-scale recovery approach, where several CS measurements are taken independently for each frame, and also the motion estimation is applied at the decoding step. The recovered video at coarse scales (low spatial resolution) is used to estimate motion which is then used to enhance the recovery at finer scales (high spatial resolutions). The same approach based on using two-step to iteratively update the estimates for the images in the video and the inter-frame motion was proposed in [39]. Also, Cossalter et al. [40] considered the motion estimation in their proposed joint compressive video coding and analysis scheme. Stanković et al. [41] and Prades-Nobet et al. [42] proposed a block-based selective video sampling scheme which firstly divides frames of the video sequence into key and non-key frames; then each frame is divided into the small non-overlapping blocks of equal sizes. In the decoding process, each block is approximated by a linear combination of blocks of previously reconstructed frames. Zheng and Jacobs [43] explored the sparsity of small inter-frame difference to remove the temporal redundancy. A multi-hypothesis (MH) prediction approach for CVS was proposed in [44], where different MH predictions of the current frame are generated from one or more previously reconstructed reference frames, and then combined to yield a composite prediction superior to any of the constituent single-hypothesis predictions. Ma et al. [45] proposed a CVS recovery method by introducing a modification of the approximated message passing algorithm and incorporating the 3D dual-tree complex wavelet transform during the recovering. In [46], each frame of a compressed-sensed video sequence is reconstructed iteratively using Karhunen–Loève transform (KLT) bases trained from adjacent previously reconstructed frame(s). There also exist other research works about CVS recovery based on dictionary learning (DL) [47], [48], [49], [50]. In [50], we proposed a block-based CVS recovery method where key frames are reconstructed using ALS basis via ℓ0 minimization method of [51]. For recovering of non-key frame, its prediction is achieved by using the previous reconstructed frame (to exploit the temporal redundancy) and incorporated into an optimization problem to refine the frame. Shu et al. [52] proposed a 3D CS approach, which decodes a video from incomplete compressive measurements by exploiting its 3D piecewise smoothness and temporal low-rank property. Yang et al. [53] proposed a Gaussian mixture model (GMM)-based inversion algorithm for CVS recovery from temporally compressed video measurements. The GMM is used to represent each 3D patch in a data set, with the assumption that the subspace of each patches lives on a union of subspaces and each patch is drawn from one subspace. Hosseini and Palataniotis [54] proposed an alternative model to the TV regularization to regulate the spatial and temporal redundancy in CVS by means of a tensorial decomposition.

Inspired by the promising results of the above-mentioned techniques, in this paper, the nonlocal self-similarity and the local sparsity prior are both incorporated in a combinational regularization term, to adopt in a regularization-based framework for CS image/video restoration. The main contributions of this paper are listed as follows.

  • We introduce a new sparsity measure, called joint adaptive sparsity measure (JASM). The proposed JASM coincidently enforces both the local sparsity constraint and the nonlocal 3D sparsity in transform domain, in a unified manner, suggesting a powerful mechanism for characterizing the structured sparsities of natural images. In fact, the local sparsity depicts the local smoothness redundancies exploited by ALS basis, and the nonlocal 3D sparsity corresponds to the nonlocal self-similarity constraint achieved by a nonlocal statistical sparse modeling closely related to the ones proposed in [12], [34]. Unlike those introduced in [12], [34], our used nonlocal statistical sparse modeling has some functional modifications which makes it much more superior for CS image recovery—as introduced and examined in 3 Joint adaptive sparsity measure (JASM), 5.3 Discussions: comparison between ℓ, respectively.

  • We propose two novel techniques for high-fidelity CS image and video restoration using JASM. The proposed CS image recovery problem via JASM (we refer to it as CS-JASM) is formulated in the form of minimization functional under regularization-based framework. Based on split Bregman framework [8], [9], a powerful method for solving various variational models, an efficient alternating minimization algorithm is developed to solve the above severely underdetermined inverse problem efficiently. The proposed CVS recovery method (we refer to it as CVS-JASM) splits the video sequence into the key and non-key frames followed by dividing each frame into small non-overlapping blocks of equal sizes. The key frames are recovered using the proposed CS-JASM method, in order to exploit the spatial redundancy. For recovery of the non-key frames, a prediction of the current frame is initialized, by using the previous reconstructed frame to exploit the temporal redundancy. The prediction is employed in a proper optimization problem to recover the current non-key frame. Furthermore, we investigate the effectiveness of three well-known DL algorithms for adopting the best one in our proposed scheme.

Extensive numerical results on benchmark test images/video sequences clearly demonstrate that our proposed methods substantially outperform many of the conventional and state-of-the-art techniques for CS image/video restoration.

The rest of this paper is organized as follows. Section 2 provides a brief background on sparse representation and DL. Also, three well-known techniques of DL, accompanied by a recently proposed modeling for nonlocal self-similarity are introduced, briefly. The proposed regularization term, JASM, and its relation to previous works is described and discussed in Section 3. Section 4 shows how JASM is incorporated into the framework of image/video CS recovery, and gives the implementation details of solving the ensuing optimization problems. Numerical results and comparisons for our proposed methods are given in Section 5 and finally, Section 6 concludes the paper.

Section snippets

Sparse representation and dictionary learning

One crucial problem in a sparse-representation problem is how to choose an efficient dictionary. There are many pre-specified (non-adaptive analytically designed) sparsifying dictionaries (basis or frame), e.g., Fourier transform, discrete cosine transform, wavelets, ridgelets, curvelets, contourlets and shearlets. In spite of being simple and having fast computation, the analytically designed dictionaries are not able to efficiently (sparsely) represent a given class of signals, and they lack

Joint adaptive sparsity measure (JASM)

As stated previously, using only the local sparsity constraint αp in Eq. (2) may not lead to an enough accurate CS image restoration. An alternative approach for superior incorporating the prior knowledge about images is via sparse representation and nonlocal self-similarity which has led to highly competent works of sparsity-based CS image restoration (see e.g., [31], [34]). As can be seen in Fig. 1, the distribution of transform coefficients Θu is characterized by a very sharp peak at zero

Encoding

As mentioned before, the proposed method, firstly, divides the video sequence into the key and non-key frames followed by dividing each frame/image, of size Ir×Ic, into small non-overlapping blocks of equal sizes (i.e., size B×B), and then the same sensing matrix ΦB1

Experimental results

In this section, we evaluate the performance of the proposed method, and compare it with benchmark methods. To evaluate our simulation results, we use two applicable quality assessors, the peak signal-to-noise ratio (PSNR) in dB and the structural similarity (SSIM) [72]. The performances of our experiments are evaluated on the luminance component of the test images shown in Fig. 2. Also, the performances of our experiments for CVS recovery are evaluated on the luminance component of eight

Conclusion

The motivation of this paper is to introduce a novel sparsity measure called joint adaptive sparsity measure (JASM), and a new strategy for high-fidelity CS image/video restoration via JASM. The proposed JASM efficiently characterizes the intrinsic sparsities of natural images by exploiting both local sparsity and nonlocal 3D sparsity, simultaneously. In order to obtain the ALS basis, we investigated the effectiveness of three well-known DL algorithms of MOD, K-SVD and MDU. We found out that

Acknowledgments

The authors would like to thank the editors and the anonymous reviewers, whose comments helped improve this paper greatly. They also would like to express their gratitude to Prof. W. Dong (Xidian University) and Dr. J. Zhang (Peking University) for many fruitful discussions; and the authors of [12], [32], [34], [44], [51], [52], [54], [73], [74], [75] for sharing the source code of their papers used in Section 5.

Nasser Eslahi received the B.S. and M.S. degrees in electrical engineering from Imam Khomeini International University and Babol University of Technology, Iran, in 2012 and 2015, respectively.

He is currently a research assistant in Machine Vision & Image Processing Laboratory, Babol University of Technology, Babol, Iran. His research interests include image and video processing, statistical signal processing, sparse representation/approximation, compressive sensing, image inverse problems and

References (76)

  • B. Xiao et al.

    Photo-sketch synthesis and recognition based on subspace learning

    Neurocomputing

    (2010)
  • J. Yu et al.

    Image clustering based on sparse patch alignment framework

    Pattern Recognit.

    (2014)
  • N.G. Kingsbury

    Complex wavelets for shift invariant analysis and filtering of signals

    J. Appl. Comput. Harmon. Anal.

    (2001)
  • E. Candès et al.

    Near-optimal signal recovery from random projectionsUniversal encoding strategies?

    IEEE Trans. Inf. Theory

    (2006)
  • E. Candès et al.

    Robust uncertainty principlesexact signal reconstruction from highly incomplete frequency information

    IEEE Trans. Inf. Theory

    (2006)
  • D.L. Donoho

    Compressed sensing

    IEEE Trans. Inf. Theory

    (2006)
  • I. Daubechies et al.

    An iterative thresholding algorithm for linear inverse problems with a sparsity constraint

    Commun. Pure Appl. Math.

    (2004)
  • A. Beck et al.

    A fast iterative shrinkage-thresholding algorithm for linear inverse problems

    SIAM J. Imaging Sci.

    (2009)
  • T. Goldstein et al.

    The split Bregman method for ℓ1 regularized problems

    SIAM J. Imaging Sci.

    (2009)
  • W. Yin et al.

    Bregman iterative algorithms for ℓ1 minimization with applications to compressed sensing

    SIAM J. Imaging Sci.

    (2008)
  • A.N. Tikhonov et al.

    Solutions of Ill-Posed Problems

    (1977)
  • X. Li

    Image recovery via hybrid sparse representationa deterministic annealing approach

    IEEE J. Sel. Top. Signal Process.

    (2011)
  • J. Zhang et al.

    Image restoration using joint statistical modeling in space-transform domain

    IEEE Trans. Circuits Syst. Video Technol.

    (2014)
  • A. Buades et al.

    A review of image denoising algorithms, with a new one

    SIAM Multiscale Model. Simul.

    (2005)
  • J. Mairal, F. Bach, J. Ponce, G. Sapiro, A. Zisserman, Non-local sparse models for image restoration, in: International...
  • G. Peyrè et al.

    Non-local regularization of inverse problems

    Inverse Prob. Image

    (2011)
  • M. Jung et al.

    Nonlocal Mumford–Shah regularizers for color image restoration

    IEEE Trans. Image Process.

    (2011)
  • W. Dong et al.

    Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regularization

    IEEE Trans. Image Process.

    (2011)
  • W. Dong et al.

    Sparse representation based image interpolation with nonlocal autoregressive modeling

    IEEE Trans. Image Process.

    (2013)
  • W. Dong et al.

    Nonlocally centralized sparse representation for image restoration

    IEEE Trans. Image Process.

    (2013)
  • J. Jiang et al.

    Mixed noise removal by weighted encoding with sparse nonlocal regularization

    IEEE Trans. Image Process.

    (2014)
  • Y. Romano et al.

    Single image interpolation via adaptive nonlocal sparsity-based modeling

    IEEE Trans. Image Process.

    (2014)
  • J. Yu et al.

    Click prediction for web image reranking using multimodal sparse coding

    IEEE Trans. Image Process.

    (2014)
  • N. Eslahi, H. Mahdavinataj, A. Aghagolzadeh, Mixed Gaussian-impulse noise removal from highly corrupted images via...
  • H. Wang et al.

    Image super-resolution using non-local Gaussian process regression

    Neurocomputing

    (2016)
  • X. Zhang et al.

    Bregmanized nonlocal regularization for deconvolution and sparse reconstruction

    SIAM J. Imaging Sci.

    (2010)
  • W. Dong et al.

    Compressive sensing via reweighted TV and nonlocal sparsity regularisation

    IET Electron. Lett.

    (2013)
  • W. Dong, G. Shi, X. Li, L. Zhang, X. Wu, Image reconstruction with locally adaptive sparsity and nonlocal robust...
  • Cited by (20)

    • Joint group and residual sparse coding for image compressive sensing

      2020, Neurocomputing
      Citation Excerpt :

      Elad et al. [19] proposed a patch-based sparse representation algorithm for image denoising, leading to state-of-the-art denoising performance. Motivated by [19], many patch-based sparse coding methods for image CS have been proposed [20–23]. For instance, Dong et al. [20] combined patch sparsity estimation with weighted nonlocal self-similarity constraint to balance the adaptation and robustness of the proposed algorithm.

    • Group-based sparse representation for image compressive sensing reconstruction with non-convex regularization

      2018, Neurocomputing
      Citation Excerpt :

      Due to the superior property of CS, it has been widely applied to various areas, such as MRI image [4], remoting sensing [5], single-pixel camera [6] and sensor networks [7]. As a basic image inverse problem in the filed of image restoration, maybe the hottest topic is image CS reconstruction, which has attracted a lot of research interest in the past few years [11–38]. Image CS reconstruction aims to reconstruct high quality image from fewer measurements, which may even be far below the traditional Nyquist sampling rate.

    • Accelerated fMRI reconstruction using Matrix Completion with Sparse Recovery via Split Bregman

      2016, Neurocomputing
      Citation Excerpt :

      Thus, reconstruction using lesser samples is extremely advantageous. This is to note that CS based recovery is being extensively used in many other applications such as in other medical imaging modalities [14,15] and in videos [16,17]. Conventional fMRI scanners reconstruct fMRI brain volumes (consisting of image slices captured in axial, sagittal, or coronal planes) by applying direct inverse Fourier transform (IFT) to the k-space scanner captured data.

    View all citing articles on Scopus

    Nasser Eslahi received the B.S. and M.S. degrees in electrical engineering from Imam Khomeini International University and Babol University of Technology, Iran, in 2012 and 2015, respectively.

    He is currently a research assistant in Machine Vision & Image Processing Laboratory, Babol University of Technology, Babol, Iran. His research interests include image and video processing, statistical signal processing, sparse representation/approximation, compressive sensing, image inverse problems and convex optimization.

    Ali Aghagolzadeh received the B.S. degree in electrical and electronic engineering from Tabriz University, Tabriz, Iran, in 1985. He received the M.S. and the Ph.D. degrees in electrical engineering from the Illinois Institute of Technology, Chicago, IL, USA, and Purdue University, West Lafayette, IN, USA, in 1988 and 1991, respectively.

    He is currently a Professor with the Department of Electrical and Computer Engineering, Babol University of Technology, Babol, Iran. His research interests include image processing, video coding and compression, information theory, and computer vision.

    Seyed Mehdi Hosseini Andargoli received the B.S. degree in electronics engineering from Shahed University, Tehran, Iran, in 2004 and the M.S. and Ph.D. degrees in telecommunication systems engineering from K. N. Toosi University of Technology, Tehran, in 2009 and 2011, respectively.

    He is currently an Assistant Professor with the Department of Electrical and Computer Engineering, Babol University of Technology, Babol, Iran. His research interests include signal processing, compressive sensing, convex optimization, resource allocation of cellular networks, cognitive radio networks, relay networks, sensor networks, and MIMO-OFDM systems.

    View full text