Dictionary learning based reconstruction for distributed compressed video sensing

https://doi.org/10.1016/j.jvcir.2013.08.007Get rights and content

Highlights

  • Leveraging more realistic video signal models that go beyond simple sparsity.

  • A novel undersampling correlation noise model for subsampled video signals.

  • To learn a dictionary that efficiently describes the video contents and structures.

  • A maximum-likelihood (ML) dictionary learning based reconstruction for DCVS.

  • Signal recovery is performed within ML learning, not as an independent task.

Abstract

Distributed compressed video sensing (DCVS) is a framework that integrates both compressed sensing and distributed video coding characteristics to achieve a low-complexity video coding. However, how to design an efficient reconstruction by leveraging more realistic signal models that go beyond simple sparsity is still an open challenge. In this paper, we propose a novel “undersampled” correlation noise model to describe compressively sampled video signals, and present a maximum-likelihood dictionary learning based reconstruction algorithm for DCVS, in which both the correlation and sparsity constraints are included in a new probabilistic model. Moreover, the signal recovery in our algorithm is performed during the process of dictionary learning, instead of being employed as an independent task. Experimental results show that our proposal compares favorably with other existing methods, with 0.1–3.5 dB improvements in the average PSNR, and a 2–9 dB gain for non-key frames when key frames are subsampled at an increased rate.

Introduction

Distributed video coding (DVC) [1] refers to a special video coding paradigm that encodes frames of a video sequence independently and decodes them jointly. As the temporal redundancies are exploited by the decoder exclusively, the computational burden is shifted from the encoder to the decoder, which makes DVC potentially applicable to many fields, e.g., wireless multimedia sensor networks (WMSN), video conferencing with mobile devices and surveillance systems. However, it still requires enormous data collection followed by data compression and thus, wastes valuable resources. Compressed sensing (CS) [2], [3], [4] is an innovative concept that has attracted considerable research interest in the signal processing community. It provides a new way to collect data incorporating both acquisition and compression, and consequently helps reduce the required number of measurements and transcend hardware limitations. Hence, the advantage of CS makes it a natural fit for DVC, due to the great reduction of sampling rate, power consumption and computational complexity.

Benefit from CS and DVC, distributed compressed video sensing (DCVS) [5], [6], [7], [8], [9], [10], [11] has recently emerged as a new way to directly capture video data via random projections at a low-complexity encoder, while performing joint reconstruction at a more complex decoder. The main challenge of DCVS is how to utilize the spatial/temporal redundancy in video at the decoder to achieve sparse representation and efficient reconstruction. One of the earlier works addressing DCVS was presented by Prades-Nebot et al. [5], in which a video sequence is divided into key frames and non-key (NK) frames. Key frames are intra encoded and decoded using traditional video compression standards; while NK frames are projected and recovered using CS techniques, with an adaptive redundant dictionary built by picking blocks from previously reconstructed frames. A similar method was proposed in [6], introduced as an inter-frame sparsity model. However, in these schemes, it is still required to capture huge amounts of raw video data for key frames, which are encoded using conventional compression algorithms.

Another DCVS framework was proposed in [7], [8], wherein the dictionary learning algorithm K-SVD [12] is directly employed by extracting samples from previous recovered frames together with the side information. As soon as the trained dictionary is obtained, NK frames are reconstructed by using the conventional sparse recovery algorithms. In this method, sparse representation and reconstruction are designed as independent tasks. However, this has a negative impact in terms of consuming resources, as the sparse coefficient calculation has already been included in the process of dictionary learning. Besides, a scalable framework of DCVS was presented in [9] to achieve optimal quality of service. In [10], an initialization and several stopping criteria were proposed for NK frames to speed up the convex optimization, and in [11] a measurement compression scheme by using the channel coding was proposed. Note that there also exist other literatures about CS-based video coding [13], [14], [15], [16], [17], e.g., a new dictionary generation scheme using an iterative fashion between reconstructing and filtering [15] and an adaptive-ADMM algorithm for CS with partial known support and signal value information [17] were proposed in our previous work. Nevertheless, most of these techniques, which are aimed to explore temporal/spatial redundancy at the encoder and achieve higher sampling efficiency, are not suited for DVC as far as limited resource is concerned.

In this paper, we propose a dictionary learning based reconstruction algorithm for DCVS. Our goal is to improve the reconstruction performance by leveraging more realistic signal models that go beyond simple sparsity and compressibility (by including the video signal structure), while retaining very low computation complexity at the encoder. One of our contributions is to introduce a novel correlation noise model (CNM) between the original video frame and its side information (SI) when video sequences are compressively sampled at a rate that is far below the Nyquist rate. To distinguish from the conventional notation in standard DVC, we denote our model as the “undersampled” CNM. To be specific, a new statistical model is presented in this work to characterize the error pattern of the correlation noise, and then offers an efficient way to describe the temporal correlation in undersampled videos. Another main contribution of this paper is that we propose a dictionary learning based reconstruction scheme, wherein we try to learn a dictionary that efficiently describes the content of video frames, and simultaneously permits to capture the correlation in sequences by including the CNM constraint. In this respect, we concentrate on the problem of two views and develop a maximum likelihood (ML) method. In our algorithm, the ML optimization is cast as an energy minimization problem, which can then be solved by iterating reconstruction and dictionary update. Consequently, our recovery method can achieve an efficient sparse representation for DCVS, and at the same time obtain the corresponding coefficients to recover video signals. In other words, both the dictionary learning and reconstruction are performed under the correlation constraint in order to achieve a good visual quality. To the best of our knowledge, there is no literature available to analyze CNM when the video sequence is compressively sampled, or to formulate the dictionary learning for DCVS with the prior on CNM.

Lastly, it is worth noting that in this paper we mainly focus on developing a dictionary learning based reconstruction algorithm for DCVS, which provides a novel fully low-complexity video compression paradigm and an alternative scheme adaptive to the environment where raw video data is not available, instead of competing compression performance against the current compression standards or DVC schemes, which need raw data available for encoding.

The rest of this paper is organized as follows. The overview of background is given in Section 2. The proposed ML dictionary learning method is described in Section 3. Section 4 presents the DCVS reconstruction with dictionary learning. Simulation results are described in Section 5, followed by conclusions in Section 6.

Section snippets

Compressed sensing

Suppose that f is a discrete signal of length n, and let x be its coefficients in some orthonormal basis ΨRn×n. Signal f is said to be k-sparse with respect to Ψ if only its kcoefficients are non-zero. According to the CS theory, a k-sparse signal can be acquired through the linear random projections y = Φf, where yRm is the sampled vector with m < n and Φ is an m × n measurement matrix that is incoherent with Ψ. Here we define the measurement rate (MR) for the signal asMR=m/n.

More specifically,

Problem formulation

The conventional DCVS structure is employed in our paper (to be shown in Fig. 1), wherein the key frame fK is projected and reconstructed using the orthonormal basis Ψ and the traditional CS recovery algorithm. For the NK frame fNK, it is first split into several non-overlapping b × b blocks. Each block is vectorized as fNK,bRn (n=b2) and projected using the random measurement matrix ΦRm×n, i.e., yNK,b=ΦfNK,b. Then the measurement yNK,bRm is transmitted to the decoder.

Now, we begin to

DCVS reconstruction with ML dictionary learning

We are now ready to present the DCVS reconstruction architecture based on ML dictionary learning. As shown in Fig. 1, the general structure of DCVS is employed. The measurements of key frames and NK frames are transmitted independently, wherein the quantization and entropy coding of measurements are not considered, since they are beyond the scope of this paper. It can be easily implied that, by emerging the CS and DVC technologies, a significant low-complexity video coding will be easily

Simulation results

In this paper, several video sequences (Y frames for each) with QCIF (176 × 144) and CIF (352 × 288) resolutions are employed to evaluate the proposed ML dictionary learning based reconstruction algorithm. Processing is carried out only on the luminance component. In our simulations, the DCVS structure described in Section 4 is used with the GOP size of 2. At the encoder, each NK frame fNK is split into several non-overlapping 16 × 16 blocks, and all blocks are projected independently using the

Conclusion

In this paper, we propose a dictionary learning based reconstruction algorithm for DCVS. We try to improve the reconstruction performance by leveraging more realistic signal models that go beyond simple sparsity and compressibility by including the video signal structure. In this work, we present a novel undersampled CNM to efficiently describe the correlation structure existed in video, and a maximum likelihood dictionary learning method is proposed, wherein a novel probabilistic model is

Acknowledgments

This work has been supported by the National Natural Science Foundation of China (Nos. 61271173 and 60802032), the Fundamental Research Funds for the Central Universities (No. K5051201045), the 111 Project (No. B08038), and also supported by the ISN State Key Laboratory.

References (26)

  • B.A. Olshausen et al.

    Sparse coding with an overcomplete basis set: a strategy employed by V1?

    Vision Research

    (1997)
  • B. Girod et al.

    Distributed video coding

    Proceedings of the IEEE

    (2005)
  • E.J. Candes et al.

    Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information

    IEEE Transactions on Information Theory

    (2006)
  • D.L. Donoho

    Compressed sensing

    IEEE Transactions on Information Theory

    (2006)
  • E.J. Candes et al.

    Stable signal recovery from incomplete and inaccurate measurements

    Communications on Pure and Applied Mathematics

    (2006)
  • J. Prades-Nebot, Y. Ma, T. Huang, Distributed video coding using compressive sampling, in: Proceedings of the Picture...
  • T.T. Do, Yi Chen, D.T. Nguyen, N. Nguyen, Lu Gan, T.D. Tran, Distributed compressed video sensing, in: IEEE...
  • Hung-Wei Chen, Li-Wei Kang, Chun-Shien Lu, Dynamic measurement rate allocation for distributed compressive video...
  • Hung-Wei Chen, Li-Wei Kang, Chun-Shien Lu, Dictionary learning-based distributed compressive video sensing, in: Picture...
  • W. Xu, Z. He, K. Niu, J. Lin, Sub-sampling framework of distributed video coding, in: IEEE International Symposium on...
  • Li-Wei Kang, Chun-Shien Lu, Distributed compressive video sensing, in: IEEE International Conference on Acoustics,...
  • X. Hao, B. Zhuang, A. Cai, Measurement compression in distributed compressive video sensing, in: IEEE International...
  • M. Aharon et al.

    The K-SVD: an algorithm for desigining of overcomplete dictionaries for sparse representation

    IEEE Transactions on Signal Processing

    (2006)
  • Cited by (15)

    • Low-cost and high-efficiency privacy-protection scheme for distributed compressive video sensing in wireless multimedia sensor networks

      2020, Journal of Network and Computer Applications
      Citation Excerpt :

      To improve the reconstruction performance, both frame-based and block-based measuring methods were applied in KF and NKF separately (Do et al., 2009). For maintaining sampling consistency, both KF and NKF adopted BCS (Liu et al., 2013a, 2014, 2015a; Tian et al., 2016; Van Chien et al., 2017; Chen et al., 2018; Xu et al., 2018; Yang et al., 2018; Zheng et al., 2019). Essentially, applying BCS is to divide each frame of videos into small blocks, and then sample individual vector-reshaped blocks successively.

    • A novel framework for compressed sensing based scalable video coding

      2017, Signal Processing: Image Communication
      Citation Excerpt :

      Alternatively, motivated by the theory of Compressed Sensing (CS) [8–10], several new video codecs [11–35] have been proposed in the last few years.

    • Feature discovering for image classification via wavelet-like pattern decomposition

      2016, Journal of Visual Communication and Image Representation
    • Image/video compressive sensing recovery using joint adaptive sparsity measure

      2016, Neurocomputing
      Citation Excerpt :

      In [46], each frame of a compressed-sensed video sequence is reconstructed iteratively using Karhunen–Loève transform (KLT) bases trained from adjacent previously reconstructed frame(s). There also exist other research works about CVS recovery based on dictionary learning (DL) [47–50]. In [50], we proposed a block-based CVS recovery method where key frames are reconstructed using ALS basis via ℓ0 minimization method of [51].

    • Optimal-correlation-based reconstruction for distributed compressed video sensing

      2015, Journal of Visual Communication and Image Representation
      Citation Excerpt :

      Pudlewski et al. briefly discussed challenges involved in the transmission of video over a WMSN [15] and presented a cross-layer system that jointly controls the video encoding rate, the transmission rate, and the channel coding rate to maximize the received video quality [16]. Besides, a dictionary generation scheme for CS-based video sampling and a dictionary learning based DCVS reconstruction method were proposed in our previous work [17,18] respectively, and more recently, an adaptive alternating direction method of multipliers with its application to compressed video sensing was presented in [19,20]. Another contribution of this paper is a two-phase Bregman [22–25] based iterative algorithm for solving the optimization problem.

    • Survey on compressed sensing reconstruction method for 3D data

      2023, Concurrency and Computation: Practice and Experience
    View all citing articles on Scopus
    View full text