Elsevier

Pattern Recognition Letters

Volume 45, 1 August 2014, Pages 46-54
Pattern Recognition Letters

An adaptive rank-sparsity K-SVD algorithm for image sequence denoising,☆☆

https://doi.org/10.1016/j.patrec.2014.03.003Get rights and content

Highlights

  • We propose an algorithm for removing Gaussian noise from a given image sequence.

  • We formulate it as an optimization problem on a propagated dictionary.

  • The propagated dictionary is adaptively trained by a rank-sparsity representation.

  • Restoration of signals is adaptively determined in terms of the noise level.

Abstract

In this paper, we propose an algorithm for the removal of additive white Gaussian noise (AWGN) from a given image sequence. By extending a frame in the spatial and temporal dimensions, the sequence is transformed into the volumetric data in which each frame includes both the spatial and temporal correlation. Image sequence denoising is then formulated as an optimization problem that can be iteratively solved by constructing a rank-sparsity representation on a propagated dictionary. The proposed algorithm effectively trains this dictionary by adaptively determining the required number of iterations. Restoration of the volumetric data is adaptively determined in terms of the noise level. The results on some standard data sets show that the proposed algorithm outperforms the K-singular value decomposition (K-SVD) algorithm and the sparse K-SVD algorithm. If a sequence is characterized by global motion (the moving objects in a scene with similar trajectories, i.e., they moves as a unit) or high motion activity, the performance of the proposed algorithm is comparable to that of block-matching and 4-D filtering (BM4D) and video block-matching and 4-D filtering (V-BM4D).

Introduction

Denoising is a fundamental problem of image processing. In recent decades, many approaches have been investigated from diverse points of view [16], [13], [10], [7], [18], [1], [23], [27], [28]. Image sequence (video) denoising is the extended version of this problem, because an image sequence always encloses the inherent temporal correlation between frames (images). However, in practice, some approaches ignore the temporal correlation enclosed in an image sequence and process each frame separately [2], [4]. Other image sequence denoising methods explore the high temporal correlation in an image sequence to achieve better performance [6], [12], [29].

In general, many approaches of image sequence denoising can be divided into two categories depending on the utilization of temporal correlation. The first kind is the motion compensated filters that treat the motion compensation and filtering as two independent problems [5]. Motion compensation is either explicitly applied during preprocessing or implicitly incorporated into the filtering [5]. After preprocessing, the temporal nonstationarity in an image sequence is removed. Then, the estimated trajectories can be applied for denoising either in the signal domain [14] or transformed domain [34], [17]. For these motion compensated filters, a motion compensation is assumed to be helpful when dealing with the dynamic nature of the image sequence. Therefore, they are expected to outperform their non-motion compensated counterparts. However, this is not always true. A motion compensation may be unnecessary or even counterproductive for denoising, because it may propose some inaccurate trajectories that will lead to blur and information loss, especially in an image sequence that contains high levels of noise. Moreover, a motion compensation is itself a difficult problem that adds an additional computational cost.

The other category of image sequence denoising methods is the spatio-temporal approaches that attempt to use the temporal correlation without motion compensation. Most of the spatio-temporal approaches are extended from classic 2-D filters [5], such as the techniques proposed by Buades et al. [6], [12], [29], [30]. These spatio-temporal filters tend to be less sensitive to nonstationarity in both space and time, because they take advantage of the correlation in both directions. This fact implies that it is crucial to make full use of both the spatial and temporal correlation to maximize performance. Spatio-temporal filters can also adapt their parameters for denoising. Because there is no one set of parameters that can fit all sequences, even at a fixed noise level [29], many approaches use adaptive statistical estimation [2], [4], adaptive selection of neighborhood size [3], or adaptive smoothing [19] to achieve better results.

One of the most successful non-motion compensated spatio-temporal filters is the approach reported by Protter and Elad [29]. This technique extends the work by [16], [15] with several modifications. The results reported by Protter and Elad [29] demonstrated that a propagated dictionary can help speed-up the algorithm and lead to an improved denoising performance. This is because the similarity between two adjacent frames can reduce the number of iterations (denoted by K) required to train the dictionary. A further conclusion is that K should not be constant, but rather depends on the noise level [29]. However, no quantitative result is given.

The recent interest in denoising is related to low rank representation (LRR), which shows an excellent performance on many benchmark data sets [21], [20], [31]. LRR is able to automatically correct corrupted data [21], so it is mainly applied to reveal the actual segmentation of data (in the presence of noise or noise free) that are drawn from a union of multiple subspaces [21], [20], [31], [22]. Compared with sparse representation (SR), LRR is more robust to noise and outliers, because LRR is better at capturing the global structure of data [21], [20]. At the same time, the applications based on low-rank and sparse matrix decompositions have been reported for object detection [32], image classification [33], image inpainting [11], and dynamic magnetic resonance imaging restoration [26].

In this paper, we propose an algorithm similar to the K-singular value decomposition (K-SVD) algorithm, based on the foundational work by [16], [15], [29], to remove additive white Gaussian noise (AWGN) from image sequences. We propose three extensions to the original algorithm. The first is the rank-sparsity representation produced by solving an optimization problem, in which the representation is combined from LRR and SR. This is motivated by the conclusion that the exact solution to the problem of decomposing a matrix into the sum of a low-rank matrix and a sparse matrix can be found by minimizing the sum of the nuclear norm and the l1 norm [8], [9]. Unlike some other methods [32], [8], [9], [29], the low-rank matrix and the sparse matrix are not separately used for different purposes. In fact, we propose that the sum of the low-rank matrix and the sparse matrix can be regarded as the hybrid representation matrix of signals on a specific dictionary, i.e., the identity matrix. Thus, signals are assumed to be linearly restored by the hybrid representation matrix on a redundant dictionary that is adaptively learned from the noisy signals. This is similar to the assumption by the authors of [16], [15], [29]. But we are interested in the rank-sparsity representation matrix of signals for training dictionaries.

The second extension relates to the adaptivity of K. We describe a method that experimental determines K in terms of the similarity between two adjacent frames in the transformed volumetric data. On the contrary, K is a constant in [16], [15], [29].

The last extension relates to the adaptivity of λ, the parameter that balances the method of signal restoration. According to the noise level, λ is adaptively determined, unlike [16], [15], [29] where it is constant.

The rest of this paper is organized as follows: some related work is presented in Section 2, and the proposed method is discussed in Section 3. Section 4 contains the experimental results obtained by the proposed algorithm. Finally, our conclusions are given in Section 5.

Section snippets

Related work

In this section, we will present some fundamental preliminaries. For clarity, denote a matrix A=a1am, where ai (1im) is the ith column vector of A. Denote a column vector v=v1vnT. For a given index I={i1,,ip}, denote a sub-matrix of A and a sub-vector of v by AI=ai1aip and vI=vi1vipT, respectively, where pmin(m,n). Define AI,J as a sub-matrix of A, including the rows and columns indexed by I and J, respectively, where J={j1,,jq}, and qm.

The proposed method

In this section, we introduce the proposed method. We transform an image sequence into the volumetric data in which each frame includes both the spatial and temporal correlation, and then compute the similarity between two adjacent frames to adaptively determine K. We then formulate the problem of denoising as an optimization problem by the rank-sparsity representation on an adaptively learned dictionary. Finally, we propose a K-SVD-like algorithm to solve the optimization problem, and restore

Experiments

In this section, we compare the performance of the proposed algorithm with that of several other methods on some standard test data sets. For simplicity, unless stated otherwise, the parameters of the proposed algorithm in this section were set as: n=8×8, M=2000, k=256, Δt=3, ρ=1.15. K and λ were determined by (8), (14), respectively. The performance of each algorithm was evaluated in terms of the PSNR results and the visual quality of the restored frames. For fairness, the source codes (or

Conclusion

In this paper, we have proposed an adaptive spatio-temporal filter for image sequence denoising. The proposed method makes full use of both the spatial and temporal correlation in an image sequence by constructing the volumetric data. The similarity between two adjacent frames of the volumetric data is rather high, even in the presence of high noise. This similarity motivates the technique of training a propagated dictionary for each frame by using the rank-sparsity representation. Restoration

References (34)

  • A. Buades et al.

    Image denoising methods: a new nonlocal principle

    SIAM Rev.

    (2010)
  • E.J. Candès et al.

    Robust principal component analysis?

    J. ACM JACM

    (2011)
  • V. Chandrasekaran et al.

    Rank-sparsity incoherence for matrix decomposition

    SIAM J. Optim.

    (2011)
  • P. Chatterjee et al.

    Clustering-based denoising with locally learned dictionaries

    IEEE Trans. Image Process.

    (2009)
  • D.Q. Chen et al.

    Image inpainting based on low-rank and joint-sparse matrix recovery

    Electron. Lett.

    (2013)
  • K. Dabov, A. Foi, K. Egiazarian, Video denoising by sparse 3D transform-domain collaborative filtering, in: Proc. 15th...
  • K. Dabov et al.

    Image denoising by sparse 3-d transform-domain collaborative filtering

    IEEE Trans. Image Process.

    (2007)
  • Cited by (13)

    • Denoising atomic resolution 4D scanning transmission electron microscopy data with tensor singular value decomposition

      2020, Ultramicroscopy
      Citation Excerpt :

      Application of NLPCA on 3D atomic resolution STEM EDS spectrum image data has been reported before [35], and the parameters optimized for STEM EDS data were used to denoise 4D STEM data. BM4D was proposed and widely applied to MRI data in Ref. [38], and adapted V-BM4D which was designed to handle time sequences [60] has been applied to denoise both for real-life photos [61] and microscopy images [62]. Considered that our data has different feature sizes and periodicity than MRI data, we have optimized the denoising parameters of BM4D on our own data.

    • Kernel transform learning

      2017, Pattern Recognition Letters
      Citation Excerpt :

      The problem with K-SVD is that it is slow, since it requires computing the SVD in every iteration and updating the coefficients via orthogonal matching pursuit. Dictionary learning finds applications in inverse problems like denoising [19,20] and reconstruction [21]. It also finds a variety of applications in computer vision where the learnt coefficients are used as features [22].

    • Color video denoising using epitome and sparse coding

      2015, Expert Systems with Applications
      Citation Excerpt :

      Existing denoising methods can be categorized into spatial and transform domain, respectively where the spatial domain (Aharon & Elad, 2008; Benoît et al., 2011; Cheung, Frey, & Jojic, 2008; Elad & Aharon, 2006; Jojic et al., 2003; Mairal et al., 2008; Peyré, 2009; Protter & Elad, 2009) utilizes pixel information to denoise, while the transform domain (Blu & Luisier, 2007; Dabov, Foi, & Egiazarian, 2007; Dai, Au, Pang, & Zou, 2013; Dai et al., 2010; Eksioglu, 2014; Varghese & Wang, 2010; Wang, Yang, & Fu, 2010; Wu, Cao, Tao, & Zhuang, 2013; Yang & Ren, 2011) make use of spatial frequency spectrum to reduce the noise. Some of these research works focus on spatial–temporal approaches without the motion compensation cues (Boulanger et al., 2010; Dabov et al., 2007; Kuang, Zhang, & Yi, 2014; Protter & Elad, 2009; Rubinstein, Zibulevsky, & Elad, 2010), while the rest utilize the motion compensation filters (Wang et al., 2010; Yang & Ren, 2011). This paper is primarily focused on video denoising, and therefore only the related work in this area will be reported.

    • DSD: document sparse-based denoising algorithm

      2019, Pattern Analysis and Applications
    View all citing articles on Scopus

    This paper has been recommended for acceptance by C. Luengo.

    ☆☆

    This work was supported by National Basic Research Program of China (973 Program) under Grant No. 2011CB302201, and Specialized Research Fund for the Doctoral Program of Higher Education of China under Grants Nos. 20100181120030 and 20120181130007.

    View full text