Image/video compressive sensing recovery using joint adaptive sparsity measure
Introduction
Due to great efforts by Candès et al. [1], [2] and Donoho [3], compressive sensing (CS)—also called compressed sensing or compressive sampling—suggests a new framework for simultaneous sampling and compression of signals at a rate significantly below the Nyquist rate. It also permits that under certain conditions, the original signal can be reconstructed properly from a small set of measurements using sparsity-promoting nonlinear recovery algorithms.
Suppose we wish to recover a real value finite length signal from a finite length observation (with ) and there is a linear projection between themwhere is a sensing matrix and denotes the additive noise. Since the number of unknowns is much more than the observations, clearly we are not able to recover every u from f and it is generally considered as an ill-posed problem. However, if u is sufficiently sparse in the sense that it can be written as a superposition of a small number of vectors taken from a known (sparsifying) transform domain basis or frame or even adaptively learned sparsifying (ALS) basis, such that contains only a small set of significant entries (e.g., nonzero coefficients), then the exact recovery of u is possible. In order to solve the reconstruction problem with a reasonable accuracy and robustness to the noise, the estimation of u is formulated as an unconstrained Lagrangian optimization problem as:The term could be various choices, e.g., , where , total variation TV(u) [4] or Bregman distance [5]. The optimization problem given in Eq. (2) incorporates the prior information about the original signal. The first term in Eq. (2) is a penalty that represents the closeness of the solution to the observed scene and quantifies the “prediction error” with respect to the measurements. The second term in Eq. (2) is a regularization term that represents a priori sparse information of the original scene and also it is designed to penalize an estimate that would not exhibit the expected properties. Also, λ is a regularization parameter that balances the contribution of both terms. This minimizing problem can be solved easily by an iterative shrinkage/thresholding (IST) method (e.g., [6], [7]) or Bregman iterative algorithms (e.g., [8], [9]).
Much efforts have been made to develop an effective regularization term , to reflect the image prior knowledge. The classical smoothing regularization terms, such as the quadratic Tikhonov regularization [10] and the total variation (TV) [4] regularization, utilize local structural patterns and are built based on the assumption that images are locally smooth except at edges. More specifically, these models favor piecewise constant image structures, and hence tend to smooth much the image details. Nonetheless, they cannot deal well with image details and fine structures (resulting in staircase artifacts and contrast losses), since they only exploit the local statistics, neglecting the nonlocal statistics of images [11], [12].
Stemming from the sparsity and the local statistics, images are often composed of localized patterns (e.g., textures and structures) that repeat themselves at distant locations in the image domain. Hence, nonlocal regularizers can effectively model long-range dependencies and yield improvements in reconstruction results. Inspired by the success of nonlocal means (NLM) filtering for image denoising [13], many nonlocal regularization-based methods have also been proposed for various image processing applications [11], [12], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], and also CS image restoration [28], [29], [30], [31], [32], [33], [34].
In recent works, the sparsity and the nonlocal self-similarity properties are usually combined into the final cost functional of image restoration solution to achieve better performance. In [28], a nonlocal total variation (NLTV) regularization model for CS image recovery is proposed, which is solved efficiently with Bregman iteration method. A combinational regularization parameter, using a reweighted TV and a weighted-based nonlocal sparse constraint, for CS image recovery is proposed in [29]. The work in [30] proposed an adaptive sparsity regularization term for CS image recovery process, which incorporated the local piecewise autoregressive model and a weighted-based nonlocal self-similarity constraint. In [31], the sparsity regularization parameters (which are locally estimated), together with a weighted-based nonlocal self-similarity constraint, are incorporated into the overall cost functional of image restoration solution to improve the image quality. A model-assisted adaptive recovery of CS (MARX-PC) is proposed in [32], which exploits both the local structural sparsity and nonlocal self-similarity, leading to an efficient CS recovery scheme. In [33], a nonlocal low-rank regularization approach toward exploiting the structured sparsity for CS image recovery is proposed. The proposed model in [33] consists of two components: patch grouping for characterizing the self-similarity of the signal and low-rank approximation for sparsity enforcement. The work in [34] proposed a strategy for CS image recovery via collaborative sparsity (RCoS) modeling. The local 2D sparsity and the nonlocal 3D sparsity are simultaneously imposed in RCoS enabling a natural image to be highly sparse in an adaptive hybrid space-transform domain.
Recently, the idea of CS for imaging (single pixel camera [35], [36]) has been extended to the conventional predictive/distributed video coding, to develop highly desirable compressive video sensing (CVS)/distributed compressive video sensing (DCVS). CVS employs both data acquiring (video sensing) and compression into a unified task which emerges a new procedure to directly acquiring compressed video data via random projection (without temporally storing the complete raw data) for each individual frame in a low complexity encoder. In this case, the majority of computational burden is shifted from the encoder side to the decoder side, which is more suitable to deploy in modern video applications, e.g., video surveillance systems and wireless multimedia sensor networks.
Several CVS recovery methods have already been proposed. Wakin et al. [37] proposed an intuitive (motion JPEG motivated) approach which extends compressive image sensing to video applications by considering each frame of the video sequence independently, and recovers each frame using the 2D discrete wavelet transform (2D DWT), individually. Since compressed image sensing techniques explore the spatial redundancy within an image, this simple extension fails to address the temporal redundancy in video. To enhance the signal sparsity in both spatial and temporal domains and achieve higher sampling efficiency, several frames can be jointly considered as a signal and recovered under a 3D transform (e.g., 3D DWT) [37]. Park and Wakin [38] proposed a multi-scale recovery approach, where several CS measurements are taken independently for each frame, and also the motion estimation is applied at the decoding step. The recovered video at coarse scales (low spatial resolution) is used to estimate motion which is then used to enhance the recovery at finer scales (high spatial resolutions). The same approach based on using two-step to iteratively update the estimates for the images in the video and the inter-frame motion was proposed in [39]. Also, Cossalter et al. [40] considered the motion estimation in their proposed joint compressive video coding and analysis scheme. Stanković et al. [41] and Prades-Nobet et al. [42] proposed a block-based selective video sampling scheme which firstly divides frames of the video sequence into key and non-key frames; then each frame is divided into the small non-overlapping blocks of equal sizes. In the decoding process, each block is approximated by a linear combination of blocks of previously reconstructed frames. Zheng and Jacobs [43] explored the sparsity of small inter-frame difference to remove the temporal redundancy. A multi-hypothesis (MH) prediction approach for CVS was proposed in [44], where different MH predictions of the current frame are generated from one or more previously reconstructed reference frames, and then combined to yield a composite prediction superior to any of the constituent single-hypothesis predictions. Ma et al. [45] proposed a CVS recovery method by introducing a modification of the approximated message passing algorithm and incorporating the 3D dual-tree complex wavelet transform during the recovering. In [46], each frame of a compressed-sensed video sequence is reconstructed iteratively using Karhunen–Loève transform (KLT) bases trained from adjacent previously reconstructed frame(s). There also exist other research works about CVS recovery based on dictionary learning (DL) [47], [48], [49], [50]. In [50], we proposed a block-based CVS recovery method where key frames are reconstructed using ALS basis via ℓ0 minimization method of [51]. For recovering of non-key frame, its prediction is achieved by using the previous reconstructed frame (to exploit the temporal redundancy) and incorporated into an optimization problem to refine the frame. Shu et al. [52] proposed a 3D CS approach, which decodes a video from incomplete compressive measurements by exploiting its 3D piecewise smoothness and temporal low-rank property. Yang et al. [53] proposed a Gaussian mixture model (GMM)-based inversion algorithm for CVS recovery from temporally compressed video measurements. The GMM is used to represent each 3D patch in a data set, with the assumption that the subspace of each patches lives on a union of subspaces and each patch is drawn from one subspace. Hosseini and Palataniotis [54] proposed an alternative model to the TV regularization to regulate the spatial and temporal redundancy in CVS by means of a tensorial decomposition.
Inspired by the promising results of the above-mentioned techniques, in this paper, the nonlocal self-similarity and the local sparsity prior are both incorporated in a combinational regularization term, to adopt in a regularization-based framework for CS image/video restoration. The main contributions of this paper are listed as follows.
- •
We introduce a new sparsity measure, called joint adaptive sparsity measure (JASM). The proposed JASM coincidently enforces both the local sparsity constraint and the nonlocal 3D sparsity in transform domain, in a unified manner, suggesting a powerful mechanism for characterizing the structured sparsities of natural images. In fact, the local sparsity depicts the local smoothness redundancies exploited by ALS basis, and the nonlocal 3D sparsity corresponds to the nonlocal self-similarity constraint achieved by a nonlocal statistical sparse modeling closely related to the ones proposed in [12], [34]. Unlike those introduced in [12], [34], our used nonlocal statistical sparse modeling has some functional modifications which makes it much more superior for CS image recovery—as introduced and examined in 3 Joint adaptive sparsity measure (JASM), 5.3 Discussions: comparison between ℓ, respectively.
- •
We propose two novel techniques for high-fidelity CS image and video restoration using JASM. The proposed CS image recovery problem via JASM (we refer to it as CS-JASM) is formulated in the form of minimization functional under regularization-based framework. Based on split Bregman framework [8], [9], a powerful method for solving various variational models, an efficient alternating minimization algorithm is developed to solve the above severely underdetermined inverse problem efficiently. The proposed CVS recovery method (we refer to it as CVS-JASM) splits the video sequence into the key and non-key frames followed by dividing each frame into small non-overlapping blocks of equal sizes. The key frames are recovered using the proposed CS-JASM method, in order to exploit the spatial redundancy. For recovery of the non-key frames, a prediction of the current frame is initialized, by using the previous reconstructed frame to exploit the temporal redundancy. The prediction is employed in a proper optimization problem to recover the current non-key frame. Furthermore, we investigate the effectiveness of three well-known DL algorithms for adopting the best one in our proposed scheme.
Extensive numerical results on benchmark test images/video sequences clearly demonstrate that our proposed methods substantially outperform many of the conventional and state-of-the-art techniques for CS image/video restoration.
The rest of this paper is organized as follows. Section 2 provides a brief background on sparse representation and DL. Also, three well-known techniques of DL, accompanied by a recently proposed modeling for nonlocal self-similarity are introduced, briefly. The proposed regularization term, JASM, and its relation to previous works is described and discussed in Section 3. Section 4 shows how JASM is incorporated into the framework of image/video CS recovery, and gives the implementation details of solving the ensuing optimization problems. Numerical results and comparisons for our proposed methods are given in Section 5 and finally, Section 6 concludes the paper.
Section snippets
Sparse representation and dictionary learning
One crucial problem in a sparse-representation problem is how to choose an efficient dictionary. There are many pre-specified (non-adaptive analytically designed) sparsifying dictionaries (basis or frame), e.g., Fourier transform, discrete cosine transform, wavelets, ridgelets, curvelets, contourlets and shearlets. In spite of being simple and having fast computation, the analytically designed dictionaries are not able to efficiently (sparsely) represent a given class of signals, and they lack
Joint adaptive sparsity measure (JASM)
As stated previously, using only the local sparsity constraint in Eq. (2) may not lead to an enough accurate CS image restoration. An alternative approach for superior incorporating the prior knowledge about images is via sparse representation and nonlocal self-similarity which has led to highly competent works of sparsity-based CS image restoration (see e.g., [31], [34]). As can be seen in Fig. 1, the distribution of transform coefficients Θu is characterized by a very sharp peak at zero
Encoding
As mentioned before, the proposed method, firstly, divides the video sequence into the key and non-key frames followed by dividing each frame/image, of size , into small non-overlapping blocks of equal sizes (i.e., size ), and then the same sensing matrix ΦB1
Experimental results
In this section, we evaluate the performance of the proposed method, and compare it with benchmark methods. To evaluate our simulation results, we use two applicable quality assessors, the peak signal-to-noise ratio (PSNR) in dB and the structural similarity (SSIM) [72]. The performances of our experiments are evaluated on the luminance component of the test images shown in Fig. 2. Also, the performances of our experiments for CVS recovery are evaluated on the luminance component of eight
Conclusion
The motivation of this paper is to introduce a novel sparsity measure called joint adaptive sparsity measure (JASM), and a new strategy for high-fidelity CS image/video restoration via JASM. The proposed JASM efficiently characterizes the intrinsic sparsities of natural images by exploiting both local sparsity and nonlocal 3D sparsity, simultaneously. In order to obtain the ALS basis, we investigated the effectiveness of three well-known DL algorithms of MOD, K-SVD and MDU. We found out that
Acknowledgments
The authors would like to thank the editors and the anonymous reviewers, whose comments helped improve this paper greatly. They also would like to express their gratitude to Prof. W. Dong (Xidian University) and Dr. J. Zhang (Peking University) for many fruitful discussions; and the authors of [12], [32], [34], [44], [51], [52], [54], [73], [74], [75] for sharing the source code of their papers used in Section 5.
Nasser Eslahi received the B.S. and M.S. degrees in electrical engineering from Imam Khomeini International University and Babol University of Technology, Iran, in 2012 and 2015, respectively.
He is currently a research assistant in Machine Vision & Image Processing Laboratory, Babol University of Technology, Babol, Iran. His research interests include image and video processing, statistical signal processing, sparse representation/approximation, compressive sensing, image inverse problems and
References (76)
- et al.
Nonlinear total variation based noise removal algorithms
Physica D
(1992) The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming
Comput. Math. Math. Phys.
(1967)- et al.
Single image super-resolution using combined total variation regularization by split Bregman iteration
Neurocomputing
(2014) - et al.
A novel Bayesian-based nonlocal reconstruction for freehand 3D ultrasound imaging
Neurocomputing
(2015) - et al.
A self-learning image super-resolution method via sparse representation and non-local similarity
Neurocomputing
(2016) - et al.
A learning-based method for compressive image recovery
J. Vis. Commun. Image Represent.
(2013) - et al.
Motion-aware decoding of compressed-sensing video
IEEE Trans. Circuits Syst. Video Technol.
(2013) - et al.
Dictionary learning based reconstruction for distributed compressed video sensing
J. Vis. Commun. Image Represent.
(2013) - et al.
Image compressive sensing recovery using adaptively learned sparsifying basis via ℓ0 minimization
Signal Process.
(2014) - et al.
Image reconstruction algorithm from compressed sensing measurement by dictionary learning
Neurocomputing
(2015)
Photo-sketch synthesis and recognition based on subspace learning
Neurocomputing
Image clustering based on sparse patch alignment framework
Pattern Recognit.
Complex wavelets for shift invariant analysis and filtering of signals
J. Appl. Comput. Harmon. Anal.
Near-optimal signal recovery from random projectionsUniversal encoding strategies?
IEEE Trans. Inf. Theory
Robust uncertainty principlesexact signal reconstruction from highly incomplete frequency information
IEEE Trans. Inf. Theory
Compressed sensing
IEEE Trans. Inf. Theory
An iterative thresholding algorithm for linear inverse problems with a sparsity constraint
Commun. Pure Appl. Math.
A fast iterative shrinkage-thresholding algorithm for linear inverse problems
SIAM J. Imaging Sci.
The split Bregman method for ℓ1 regularized problems
SIAM J. Imaging Sci.
Bregman iterative algorithms for ℓ1 minimization with applications to compressed sensing
SIAM J. Imaging Sci.
Solutions of Ill-Posed Problems
Image recovery via hybrid sparse representationa deterministic annealing approach
IEEE J. Sel. Top. Signal Process.
Image restoration using joint statistical modeling in space-transform domain
IEEE Trans. Circuits Syst. Video Technol.
A review of image denoising algorithms, with a new one
SIAM Multiscale Model. Simul.
Non-local regularization of inverse problems
Inverse Prob. Image
Nonlocal Mumford–Shah regularizers for color image restoration
IEEE Trans. Image Process.
Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regularization
IEEE Trans. Image Process.
Sparse representation based image interpolation with nonlocal autoregressive modeling
IEEE Trans. Image Process.
Nonlocally centralized sparse representation for image restoration
IEEE Trans. Image Process.
Mixed noise removal by weighted encoding with sparse nonlocal regularization
IEEE Trans. Image Process.
Single image interpolation via adaptive nonlocal sparsity-based modeling
IEEE Trans. Image Process.
Click prediction for web image reranking using multimodal sparse coding
IEEE Trans. Image Process.
Image super-resolution using non-local Gaussian process regression
Neurocomputing
Bregmanized nonlocal regularization for deconvolution and sparse reconstruction
SIAM J. Imaging Sci.
Compressive sensing via reweighted TV and nonlocal sparsity regularisation
IET Electron. Lett.
Cited by (20)
Joint group and residual sparse coding for image compressive sensing
2020, NeurocomputingCitation Excerpt :Elad et al. [19] proposed a patch-based sparse representation algorithm for image denoising, leading to state-of-the-art denoising performance. Motivated by [19], many patch-based sparse coding methods for image CS have been proposed [20–23]. For instance, Dong et al. [20] combined patch sparsity estimation with weighted nonlocal self-similarity constraint to balance the adaptation and robustness of the proposed algorithm.
LL<inf>p</inf> norm regularization based group sparse representation for image compressed sensing recovery
2019, Signal Processing: Image CommunicationGroup-based sparse representation for image compressive sensing reconstruction with non-convex regularization
2018, NeurocomputingCitation Excerpt :Due to the superior property of CS, it has been widely applied to various areas, such as MRI image [4], remoting sensing [5], single-pixel camera [6] and sensor networks [7]. As a basic image inverse problem in the filed of image restoration, maybe the hottest topic is image CS reconstruction, which has attracted a lot of research interest in the past few years [11–38]. Image CS reconstruction aims to reconstruct high quality image from fewer measurements, which may even be far below the traditional Nyquist sampling rate.
Accelerated fMRI reconstruction using Matrix Completion with Sparse Recovery via Split Bregman
2016, NeurocomputingCitation Excerpt :Thus, reconstruction using lesser samples is extremely advantageous. This is to note that CS based recovery is being extensively used in many other applications such as in other medical imaging modalities [14,15] and in videos [16,17]. Conventional fMRI scanners reconstruct fMRI brain volumes (consisting of image slices captured in axial, sagittal, or coronal planes) by applying direct inverse Fourier transform (IFT) to the k-space scanner captured data.
Particle swarm optimization based multilevel MRI compression using compressive sensing
2022, Bulletin of Electrical Engineering and Informatics
Nasser Eslahi received the B.S. and M.S. degrees in electrical engineering from Imam Khomeini International University and Babol University of Technology, Iran, in 2012 and 2015, respectively.
He is currently a research assistant in Machine Vision & Image Processing Laboratory, Babol University of Technology, Babol, Iran. His research interests include image and video processing, statistical signal processing, sparse representation/approximation, compressive sensing, image inverse problems and convex optimization.
Ali Aghagolzadeh received the B.S. degree in electrical and electronic engineering from Tabriz University, Tabriz, Iran, in 1985. He received the M.S. and the Ph.D. degrees in electrical engineering from the Illinois Institute of Technology, Chicago, IL, USA, and Purdue University, West Lafayette, IN, USA, in 1988 and 1991, respectively.
He is currently a Professor with the Department of Electrical and Computer Engineering, Babol University of Technology, Babol, Iran. His research interests include image processing, video coding and compression, information theory, and computer vision.
Seyed Mehdi Hosseini Andargoli received the B.S. degree in electronics engineering from Shahed University, Tehran, Iran, in 2004 and the M.S. and Ph.D. degrees in telecommunication systems engineering from K. N. Toosi University of Technology, Tehran, in 2009 and 2011, respectively.
He is currently an Assistant Professor with the Department of Electrical and Computer Engineering, Babol University of Technology, Babol, Iran. His research interests include signal processing, compressive sensing, convex optimization, resource allocation of cellular networks, cognitive radio networks, relay networks, sensor networks, and MIMO-OFDM systems.