Dictionary learning based reconstruction for distributed compressed video sensing

doi:10.1016/j.jvcir.2013.08.007

Journal of Visual Communication and Image Representation

Volume 24, Issue 8, November 2013, Pages 1232-1242

https://doi.org/10.1016/j.jvcir.2013.08.007 Get rights and content

Highlights

•
Leveraging more realistic video signal models that go beyond simple sparsity.
•
A novel undersampling correlation noise model for subsampled video signals.
•
To learn a dictionary that efficiently describes the video contents and structures.
•
A maximum-likelihood (ML) dictionary learning based reconstruction for DCVS.
•
Signal recovery is performed within ML learning, not as an independent task.

Abstract

Distributed compressed video sensing (DCVS) is a framework that integrates both compressed sensing and distributed video coding characteristics to achieve a low-complexity video coding. However, how to design an efficient reconstruction by leveraging more realistic signal models that go beyond simple sparsity is still an open challenge. In this paper, we propose a novel “undersampled” correlation noise model to describe compressively sampled video signals, and present a maximum-likelihood dictionary learning based reconstruction algorithm for DCVS, in which both the correlation and sparsity constraints are included in a new probabilistic model. Moreover, the signal recovery in our algorithm is performed during the process of dictionary learning, instead of being employed as an independent task. Experimental results show that our proposal compares favorably with other existing methods, with 0.1–3.5 dB improvements in the average PSNR, and a 2–9 dB gain for non-key frames when key frames are subsampled at an increased rate.

Introduction

Distributed video coding (DVC) [1] refers to a special video coding paradigm that encodes frames of a video sequence independently and decodes them jointly. As the temporal redundancies are exploited by the decoder exclusively, the computational burden is shifted from the encoder to the decoder, which makes DVC potentially applicable to many fields, e.g., wireless multimedia sensor networks (WMSN), video conferencing with mobile devices and surveillance systems. However, it still requires enormous data collection followed by data compression and thus, wastes valuable resources. Compressed sensing (CS) [2], [3], [4] is an innovative concept that has attracted considerable research interest in the signal processing community. It provides a new way to collect data incorporating both acquisition and compression, and consequently helps reduce the required number of measurements and transcend hardware limitations. Hence, the advantage of CS makes it a natural fit for DVC, due to the great reduction of sampling rate, power consumption and computational complexity.

Benefit from CS and DVC, distributed compressed video sensing (DCVS) [5], [6], [7], [8], [9], [10], [11] has recently emerged as a new way to directly capture video data via random projections at a low-complexity encoder, while performing joint reconstruction at a more complex decoder. The main challenge of DCVS is how to utilize the spatial/temporal redundancy in video at the decoder to achieve sparse representation and efficient reconstruction. One of the earlier works addressing DCVS was presented by Prades-Nebot et al. [5], in which a video sequence is divided into key frames and non-key (NK) frames. Key frames are intra encoded and decoded using traditional video compression standards; while NK frames are projected and recovered using CS techniques, with an adaptive redundant dictionary built by picking blocks from previously reconstructed frames. A similar method was proposed in [6], introduced as an inter-frame sparsity model. However, in these schemes, it is still required to capture huge amounts of raw video data for key frames, which are encoded using conventional compression algorithms.

Another DCVS framework was proposed in [7], [8], wherein the dictionary learning algorithm K-SVD [12] is directly employed by extracting samples from previous recovered frames together with the side information. As soon as the trained dictionary is obtained, NK frames are reconstructed by using the conventional sparse recovery algorithms. In this method, sparse representation and reconstruction are designed as independent tasks. However, this has a negative impact in terms of consuming resources, as the sparse coefficient calculation has already been included in the process of dictionary learning. Besides, a scalable framework of DCVS was presented in [9] to achieve optimal quality of service. In [10], an initialization and several stopping criteria were proposed for NK frames to speed up the convex optimization, and in [11] a measurement compression scheme by using the channel coding was proposed. Note that there also exist other literatures about CS-based video coding [13], [14], [15], [16], [17], e.g., a new dictionary generation scheme using an iterative fashion between reconstructing and filtering [15] and an adaptive-ADMM algorithm for CS with partial known support and signal value information [17] were proposed in our previous work. Nevertheless, most of these techniques, which are aimed to explore temporal/spatial redundancy at the encoder and achieve higher sampling efficiency, are not suited for DVC as far as limited resource is concerned.

In this paper, we propose a dictionary learning based reconstruction algorithm for DCVS. Our goal is to improve the reconstruction performance by leveraging more realistic signal models that go beyond simple sparsity and compressibility (by including the video signal structure), while retaining very low computation complexity at the encoder. One of our contributions is to introduce a novel correlation noise model (CNM) between the original video frame and its side information (SI) when video sequences are compressively sampled at a rate that is far below the Nyquist rate. To distinguish from the conventional notation in standard DVC, we denote our model as the “undersampled” CNM. To be specific, a new statistical model is presented in this work to characterize the error pattern of the correlation noise, and then offers an efficient way to describe the temporal correlation in undersampled videos. Another main contribution of this paper is that we propose a dictionary learning based reconstruction scheme, wherein we try to learn a dictionary that efficiently describes the content of video frames, and simultaneously permits to capture the correlation in sequences by including the CNM constraint. In this respect, we concentrate on the problem of two views and develop a maximum likelihood (ML) method. In our algorithm, the ML optimization is cast as an energy minimization problem, which can then be solved by iterating reconstruction and dictionary update. Consequently, our recovery method can achieve an efficient sparse representation for DCVS, and at the same time obtain the corresponding coefficients to recover video signals. In other words, both the dictionary learning and reconstruction are performed under the correlation constraint in order to achieve a good visual quality. To the best of our knowledge, there is no literature available to analyze CNM when the video sequence is compressively sampled, or to formulate the dictionary learning for DCVS with the prior on CNM.

Lastly, it is worth noting that in this paper we mainly focus on developing a dictionary learning based reconstruction algorithm for DCVS, which provides a novel fully low-complexity video compression paradigm and an alternative scheme adaptive to the environment where raw video data is not available, instead of competing compression performance against the current compression standards or DVC schemes, which need raw data available for encoding.

The rest of this paper is organized as follows. The overview of background is given in Section 2. The proposed ML dictionary learning method is described in Section 3. Section 4 presents the DCVS reconstruction with dictionary learning. Simulation results are described in Section 5, followed by conclusions in Section 6.

Section snippets

Compressed sensing

Suppose that f is a discrete signal of length n, and let x be its coefficients in some orthonormal basis $Ψ \in R^{n \times n}$ . Signal f is said to be k-sparse with respect to Ψ if only its kcoefficients are non-zero. According to the CS theory, a k-sparse signal can be acquired through the linear random projections y = Φf, where $y \in R^{m}$ is the sampled vector with m < n and Φ is an m × n measurement matrix that is incoherent with Ψ. Here we define the measurement rate (MR) for the signal as $MR = m / n .$

More specifically,

Problem formulation

The conventional DCVS structure is employed in our paper (to be shown in Fig. 1), wherein the key frame f_K is projected and reconstructed using the orthonormal basis Ψ and the traditional CS recovery algorithm. For the NK frame f_NK, it is first split into several non-overlapping b × b blocks. Each block is vectorized as $f_{NK, b} \in R^{n}$ ( $n = b^{2}$ ) and projected using the random measurement matrix $Φ \in R^{m \times n}$ , i.e., $y_{NK, b} = Φ f_{NK, b}$ . Then the measurement $y_{NK, b} \in R^{m}$ is transmitted to the decoder.

Now, we begin to

DCVS reconstruction with ML dictionary learning

We are now ready to present the DCVS reconstruction architecture based on ML dictionary learning. As shown in Fig. 1, the general structure of DCVS is employed. The measurements of key frames and NK frames are transmitted independently, wherein the quantization and entropy coding of measurements are not considered, since they are beyond the scope of this paper. It can be easily implied that, by emerging the CS and DVC technologies, a significant low-complexity video coding will be easily

Simulation results

In this paper, several video sequences (Y frames for each) with QCIF (176 × 144) and CIF (352 × 288) resolutions are employed to evaluate the proposed ML dictionary learning based reconstruction algorithm. Processing is carried out only on the luminance component. In our simulations, the DCVS structure described in Section 4 is used with the GOP size of 2. At the encoder, each NK frame f_NK is split into several non-overlapping 16 × 16 blocks, and all blocks are projected independently using the

Conclusion

In this paper, we propose a dictionary learning based reconstruction algorithm for DCVS. We try to improve the reconstruction performance by leveraging more realistic signal models that go beyond simple sparsity and compressibility by including the video signal structure. In this work, we present a novel undersampled CNM to efficiently describe the correlation structure existed in video, and a maximum likelihood dictionary learning method is proposed, wherein a novel probabilistic model is

Acknowledgments

This work has been supported by the National Natural Science Foundation of China (Nos. 61271173 and 60802032), the Fundamental Research Funds for the Central Universities (No. K5051201045), the 111 Project (No. B08038), and also supported by the ISN State Key Laboratory.

References (26)

B.A. Olshausen et al.
Sparse coding with an overcomplete basis set: a strategy employed by V1?
Vision Research
(1997)
B. Girod et al.
Distributed video coding
Proceedings of the IEEE
(2005)
E.J. Candes et al.
Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information
IEEE Transactions on Information Theory
(2006)
D.L. Donoho
Compressed sensing
IEEE Transactions on Information Theory
(2006)
E.J. Candes et al.
Stable signal recovery from incomplete and inaccurate measurements
Communications on Pure and Applied Mathematics
(2006)
J. Prades-Nebot, Y. Ma, T. Huang, Distributed video coding using compressive sampling, in: Proceedings of the Picture...
T.T. Do, Yi Chen, D.T. Nguyen, N. Nguyen, Lu Gan, T.D. Tran, Distributed compressed video sensing, in: IEEE...
Hung-Wei Chen, Li-Wei Kang, Chun-Shien Lu, Dynamic measurement rate allocation for distributed compressive video...
Hung-Wei Chen, Li-Wei Kang, Chun-Shien Lu, Dictionary learning-based distributed compressive video sensing, in: Picture...
W. Xu, Z. He, K. Niu, J. Lin, Sub-sampling framework of distributed video coding, in: IEEE International Symposium on...

Li-Wei Kang, Chun-Shien Lu, Distributed compressive video sensing, in: IEEE International Conference on Acoustics,...

X. Hao, B. Zhuang, A. Cai, Measurement compression in distributed compressive video sensing, in: IEEE International...

M. Aharon et al.

The K-SVD: an algorithm for desigining of overcomplete dictionaries for sparse representation

IEEE Transactions on Signal Processing

(2006)

Cited by (15)

Low-cost and high-efficiency privacy-protection scheme for distributed compressive video sensing in wireless multimedia sensor networks
2020, Journal of Network and Computer Applications
Citation Excerpt :
To improve the reconstruction performance, both frame-based and block-based measuring methods were applied in KF and NKF separately (Do et al., 2009). For maintaining sampling consistency, both KF and NKF adopted BCS (Liu et al., 2013a, 2014, 2015a; Tian et al., 2016; Van Chien et al., 2017; Chen et al., 2018; Xu et al., 2018; Yang et al., 2018; Zheng et al., 2019). Essentially, applying BCS is to divide each frame of videos into small blocks, and then sample individual vector-reshaped blocks successively.
As a new video coding technology, distributed compressive video sensing (DCVS) uses compressed sensing (CS) independent encoding and joint decoding. Since DCVS breaks through the constraint of traditional video coding, it is suitable for resource-constrained wireless multimedia sensor networks (WMSNs). However, two major issues related to DCVS in WMSNs need to be solved urgently: one is to balance the storage burden of encoder and recovery quality of decoder; the other is to provide privacy-protection for video coding and transmission. We intend to break out of the existing limitations and design a new scheme which can simultaneously ensure privacy protection and high-efficiency coding for DCVS in WMSNs. Firstly, the two-pattern adaptive group of pictures selection is adopted to distinguish key frames and non-key frames. Secondly, the deterministic binary block diagonal measurement matrix is optimized to reduce sampling complexity. Thirdly, the scrambling-substitution-diffusion encryption method is proposed to resist various typical or potential attacks. Numerous experiments demonstrate that our scheme can not only perform valid and high-efficiency video coding, but also meet the demands of real-time and secure data transmission in WMSNs.
A novel framework for compressed sensing based scalable video coding
2017, Signal Processing: Image Communication
Citation Excerpt :
Alternatively, motivated by the theory of Compressed Sensing (CS) [8–10], several new video codecs [11–35] have been proposed in the last few years.
Considering high throughput values as specified by modern video processing standards, Scalable Video Coding (SVC) systems intended for such standards are generally implemented by means of dedicated hardware. However, the high computational complexity associated with the current Compressed Sensing (CS) based video coding schemes makes their hardware realization considerably challenging. In this paper, we present a novel CS based SVC framework that is amenable to real-time VLSI implementation. At the encoder, after applying the Three-Dimensional Discrete Wavelet Transform (3-D DWT) on the input video frames, a novel Adaptive Measurement Scheme (AMS) in CS is introduced, which is applied on the high frequency sub-bands of the 3-D DWT frames. The proposed AMS along with 3-D DWT not only achieves scalability and better compression ratio, but also reduces the overall computational complexity of the system. We have also proposed an Enhanced Approximate Message Passing (EAMP) algorithm to reconstruct the high frequency sub-bands from the CS measurements at the decoder. The proposed EAMP procedure combines the benefits of Approximate Message Passing (AMP) and Iterative Hard Thresholding (IHT) algorithms thereby simultaneously achieving sparsity measurement trade-off and good reconstruction quality. We have carried out the detailed complexity analysis and simulations to demonstrate the superiority of the proposed framework over the existing schemes.
Feature discovering for image classification via wavelet-like pattern decomposition
2016, Journal of Visual Communication and Image Representation
In this paper, we propose a feature discovering method incorporated with a wavelet-like pattern decomposition strategy to address the image classification problem. In each level, we design a discriminative feature discovering dictionary learning (DFDDL) model to exploit the representative visual samples from each class and further decompose the commonality and individuality visual patterns simultaneously. The representative samples reflect the discriminative visual cues per class, which are beneficial for the classification task. Furthermore, the commonality visual elements capture the communal visual patterns across all classes. Meanwhile, the class-specific discriminative information can be collected by the learned individuality visual elements. To further discover the more discriminative feature information from each class, we then integrate the DFDDL into a wavelet-like hierarchical architecture. Due to the designed hierarchical strategy, the discriminative power of feature representation can be promoted. In the experiment, the effectiveness of proposed method is verified on the challenging public datasets.
Image/video compressive sensing recovery using joint adaptive sparsity measure
2016, Neurocomputing
Citation Excerpt :
In [46], each frame of a compressed-sensed video sequence is reconstructed iteratively using Karhunen–Loève transform (KLT) bases trained from adjacent previously reconstructed frame(s). There also exist other research works about CVS recovery based on dictionary learning (DL) [47–50]. In [50], we proposed a block-based CVS recovery method where key frames are reconstructed using ALS basis via ℓ0 minimization method of [51].
Compressive sensing (CS) is a recently emerging technique and an extensively studied problem in signal and image processing, which enables joint sampling and compression into a unified approach. Recently, local smoothness and nonlocal self-similarity have both led to superior sparsity priors for CS image restoration. In this paper, first, a new sparsity measure called joint adaptive sparsity measure (JASM) is introduced. The proposed JASM enforces both local sparsity and nonlocal 3D sparsity in transform domain, concurrently, providing a powerful mechanism for characterizing the structured sparsities of natural image. More precisely, the local sparsity depicts the local smoothness redundancies exploited by an adaptively learned sparsifying basis, and the nonlocal 3D sparsity corresponds to the nonlocal self-similarity constraint achieved by a new proposed nonlocal statistical sparse modeling. Then, two novel techniques for high-fidelity CS image and video recovery via JASM are proposed. The proposed methods are formulated in the form of minimization functional under regularization-based framework which is solved via an efficient alternating minimization algorithm based on split Bregman framework. Comprehensive experimental results are reported to manifest the effectiveness of the proposed methods compared with the current state-of-the-art methods in CS image/video restoration.
Optimal-correlation-based reconstruction for distributed compressed video sensing
2015, Journal of Visual Communication and Image Representation
Citation Excerpt :
Pudlewski et al. briefly discussed challenges involved in the transmission of video over a WMSN [15] and presented a cross-layer system that jointly controls the video encoding rate, the transmission rate, and the channel coding rate to maximize the received video quality [16]. Besides, a dictionary generation scheme for CS-based video sampling and a dictionary learning based DCVS reconstruction method were proposed in our previous work [17,18] respectively, and more recently, an adaptive alternating direction method of multipliers with its application to compressed video sensing was presented in [19,20]. Another contribution of this paper is a two-phase Bregman [22–25] based iterative algorithm for solving the optimization problem.
Distributed compressed video sensing (DCVS) is a framework that integrates both compressed sensing and distributed video coding characteristics to achieve a low-complexity video coding. However, how to design an efficient joint reconstruction by leveraging more realistic signal models is still an open challenge. In this paper, we present a novel optimal-correlation-based reconstruction method for compressively sampled videos from multiple measurement vectors. In our method, the sparsity is mainly exploited through inter-signal correlations rather than the traditional frequency transform, wherein the optimization is not only over the signal space to satisfy data consistency but also over all possible linear correlation models to achieve minimum-l₁-norm correlation noise. Additionally, a two-phase Bregman iterative based algorithm is outlined for solving the optimization problem. Simulation results show that our proposal can achieve an improved reconstruction performance in comparison to the conventional approaches, and especially, offer a 0.7–9.9 dB gain in the average PSNR for DCVS.
Survey on compressed sensing reconstruction method for 3D data
2023, Concurrency and Computation: Practice and Experience

View all citing articles on Scopus

View full text

Dictionary learning based reconstruction for distributed compressed video sensing

Highlights

Abstract

Introduction

Section snippets

Compressed sensing

Problem formulation

DCVS reconstruction with ML dictionary learning

Simulation results

Conclusion

Acknowledgments

Vision Research

Distributed video coding

Proceedings of the IEEE

Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information

IEEE Transactions on Information Theory

Compressed sensing

IEEE Transactions on Information Theory

Stable signal recovery from incomplete and inaccurate measurements

Communications on Pure and Applied Mathematics

The K-SVD: an algorithm for desigining of overcomplete dictionaries for sparse representation

IEEE Transactions on Signal Processing