Elsevier

Signal Processing

Volume 139, October 2017, Pages 49-61
Signal Processing

A gradient-based approach to optimization of compressed sensing systems

https://doi.org/10.1016/j.sigpro.2017.04.005Get rights and content

Highlights

  • A new framework to incoherent dictionary design is proposed and a gradient descent-based algorithm is derived to obtain the optimal dictionary.

  • Based on a parametric technique, a gradient descent-based algorithm is derived to design the robust sensing matrix.

  • The expression of derivative for each of the two algorithms is explicitly derived.

  • The validity of the proposed approaches is confirmed with experiments carried out using synthetic data and real images.

Abstract

This paper deals with a gradient-based approach to optimizing compressed sensing systems. An alternative measure is proposed for incoherent sparsifying dictionary design. An iterative procedure is developed for searching the optimal dictionary, in which the dictionary update is executed using a gradient descent-based algorithm. The optimal sensing matrix problem is investigated in terms of minimizing HGF2, where H is the target of Gram matrix of desired coherence property. Unlike the traditional approaches, G is taken as the Gram of the normalized equivalent dictionary of the system, ensuring that HGF2 has the designated physical meaning. A gradient descent-based algorithm is derived for solving the optimal sensing matrix problem. The validity of the proposed approaches is confirmed with experiments carried out using synthetic data and real images.

Introduction

Compressive sensing or compressed sensing (CS) has attracted a lot of attention during the last ten years or so [1], [2], [3], [4]. CS-based techniques have found many applications in the areas such as image compression, signal detection/classification, and more [4], [5]. By nature, CS is a mathematical framework that deals with accurate recovery of a signal vector y from a lower dimension measurement vector z.

Under the CS framework, the measurement z ∈ ℜM × 1 consists of linear projections of the signal vector y ∈ ℜN × 1 via z=Φywhere MN is assumed, Φ ∈ ℜM × N is called a sensing matrix or projection matrix. The original signal is assumed to be in the following form y=Ψswhere Ψ ∈ ℜN × L is called a dictionary and s ∈ ℜL × 1 is the coefficient vector of y in Ψ. When N < L, Ψ is said to be overcomplete, which is assumed throughout this paper.

A CS system refers to the linear Eqs. (1) and (2) plus an algorithm that can yield an estimate of the original signal y based on the measurement z. By substituting y in (1) with (2), z can be rewritten as z=ΦΨsAswhere the matrix A is sometimes referred to as equivalent dictionary of the CS system. The ultimate goal of a CS system is actually to recover s (hence y) from the measurement z.

As M < L, solving z=As for s is a undetermined problem since it has an infinite number of solutions. To have a unique solution, extra constraints on the linear system have to be given. One of such constraints is related to the concepts of spark and signal sparsity. The spark of a matrix Q ∈ ℜM × L, denoted as spark(Q), is defined as the smallest number of columns in Q that are linearly dependent. The lp-norm of vector v ∈ ℜN × 1 is defined as vp(k=1N|v(k)|p)1/p,p1For convenience, ||v||0 is used to denote the number of non-zero elements in v (though it is not a norm in a strict sense). A vector y given by (2) is said κ-sparse in Ψ if s0=κ.

It was shown in [6] that any κ-sparse signal y0=Ψs0 can be exactly recovered from its measurement z=Φy0 by solving s0=argminss0,s.t.z=Asas long as spark(A) > 2κ. Such a problem is usually addressed using the orthogonal matching pursuit (OMP)-related techniques [7], [8]. Furthermore, it can be shown that the solution to the above problem is the same as the one to the l1-based minimization below s0=argminss1,s.t.z=Aswhile the latter can be solved efficiently using algorithms such as basis pursuit (BP) [9] and the l1/l2-based optimization techniques [10]. Recently, to further enhance the ability to deal with the sparsity issue, a two-level l1 minimization was proposed in [11] for compressed sensing using a non-convex and piecewise linear penalty.

In this paper, OMP-based algorithms are used and hence designing a CS system means to determine Φ and Ψ for a class of signals.

As discussed previously, signal sparsity is an essential prerequisite for a CS. This is regarding sparse representation of signals, which is a widely utilized technique for modelling natural signals and has found potential applications, including in image compression and denoising [12]. The key issue is for a given class of signals {yj}j=1J to find a dictionary Ψ such that yj is as close to Ψsj as possible for a coefficient vector sj constrained with ||sj||0κ, where κ is a prescribed sparsity level. This is usually referred to as sparsifying dictionary learning, classically formulated as {Ψ˜,S˜}argminΨ,SYΨSF2s.t.S(:,j)0κjwhere yj=Y(:,j),sj=S(:,j),j and ||.||F denotes the Frobenius norm.

Such a problem is difficult to be solved as it is non-convex in Ψ and S, and ||.||0 is non-smooth and highly unstable. A popularly used approach is based on the alternating minimization strategy, leading to a two-stage iterative procedure in which the kth iteration is carried out with

  • Sparse coding - update S using Sk, where Sk is the solution of (6) with Ψ=Ψk1. This problem can be solved using the OMP based techniques [7], [8].

  • Updating dictionary - with Ψ=Ψk, where Ψk is the solution of (6) with S=Sk.

Many algorithms of this class differentiate each other mainly in the 2nd stage that is to update Ψ. The very first algorithm is perhaps the method of optimal direction (MOD) [13], in which the dictionary Ψ is simply taken as the solution of minΨYΨSF2Ψ˜=YST(SST)1 where T denotes the transpose operator, and then multiplied by a diagonal matrix Dsc such that Ψ=Ψ˜Dsc is normalized in l2-norm in order to avoid the ambiguity in this procedure. The most popularly used method for solving (6) is the K-singular value decomposition (K-SVD) [14], in which the atoms of the dictionary are updated one by one while the sparse structure of the given S is kept unchanged. Such an algorithm usually yields a better performance than the MOD as the non-zero elements in S are simultaneously updated. The same idea was recently used to improve the MOD in [15]. In [16], an algorithm, named as sequential generalization of K-mean (SGK), was proposed, which uses the same strategy of updating atoms one by one but without considering the structure of S.

As seen, the larger the spark(A), the bigger the signal space among which the CS systems can guarantee an exact recovery. For a given Ψ, the spark of the equivalent dictionary is determined by the sensing matrix Φ. It would be of great interest to design Φ such that spark(A) is maximized. Unfortunately, spark(A) is not tractable. As shown in [6], any κ-sparse signal s0 (in A) can be exactly recovered from z=As0 via (4) as long as κ<12[1+1μ(A)]where μ(A) is the mutual coherence of matrix A ∈ ℜM × L, which is defined as maximum of the coherence factors between the column vectors of A: μ(A)max1ijL|A(:,i)TA(:,j)|A(:,i)2A(:,j)2

It is due to (7) that the prevailing approaches to optimal sensing matrix design are all based on mutual coherence-related properties of the equivalent dictionary A.

The Gram of a matrix Q is defined as the product QTQ. Note that the off-diagonal entries of QTQ are closely related to the coherence factors of Q. This suggests that the coherence behavior of Q can be studied via its Gram. An averaged mutual coherence was proposed in [17] as μav(A)(i,j)Sav|G¯(i,j)|Navwhere G¯(i,j) is the (i, j)th element of the Gram of A¯=ASsc with Ssc a diagonal scaling matrix such that A¯(:,k)2=1,k,Sav{(i,j):μ¯|G¯(i,j)|<1} with 0μ¯<1 a prescribed parameter and Nav is the number of elements in the index set Sav. Such a measure is a good performance indicator but is difficult to minimize with respect to Φ.

Most of the existing approaches to optimal sensing matrix design can be unified as {Φ˜,H˜}argminΦ,HHATAF2s.t.HStar,A=ΦΨwhere Star is a non-empty subset of symmetric matrices with diagonal elements all equal to one, containing a collection of target Grams which have some desired properties.

The very first target Gram used for optimal sensing matrix design is IL, the identity matrix of dimension L (See [5], [18], [19], [20]). Another popularly used target Gram is based on the concept of equiangular tight frames (ETFs). The set of columns of a (column normalized) matrix Q is said an ETF if |Q(:,i)TQ(:,j)|,ij are all equal and such a matrix yields the smallest mutual coherence [21], [22]. Therefore, it is desired to make the Gram of the equivalent dictionary as close as possible to that of an ETF. Since it is very difficult to characterize the set of ETF Grams, the latter is practically regularized as Staretf{HL×L:H=HT,H(k,k)=1,kmaxij|H(i,j)|ξ}where the parameter ξ > 0, called prescribed coherence level, is a positive constant not less than μ, controlling the searching space for the optimal H. When ξ is larger than the Welch bound μ [21]: μ̲LMM(L1)μ(Q)1which holds for any Q of dimension M × L, the ideal ETF Grams of dimension L (with all possible ranks) are confined in Staretf.

When Star takes Staretf, the corresponding optimal sensing matrix problem (10) is usually addressed using the alternating minimization based techniques [20], [23], [24], [25], [26], [27], among which one approach differs from another just in the way how the sensing matrix Φ is updated for a given H. As the problem is highly non-convex, numerical methods are usually applied. A QR factorization based method was proposed in [23], while an iterative algorithm was given in [27].

The gradient-based numerical methods have been considered as a class of efficient algorithms popularly used in engineering designs, where the closed-form solutions are difficult to be obtained. Such an approach was adopted for optimal sensing matrix design in [18], [24].

It has been observed that the sensing matrix obtained from (10) with H=IL or searched from Staretf performs well only when the sparse representation error EYΨS is very small.

When Y is projected via the sensing matrix Φ, the corresponding measurements Z=ΦY are of a form Z=AS+ΦEwhere EΦE is the sparse representation error of Z in A.

The first attempt to consider robust CS system design was given in [5], where the dictionary and sensing matrix are alternatively updated using an iterative procedure. In such a procedure, the dictionary is the solution of a linear combination of EF2 and EF2 for a given sensing matrix, while the sensing matrix updating is done with a different measure. Following the same lines, an alternative method was proposed in [25] with refined algorithms for designing each of the dictionary and sensing matrix. The resultant CS systems from both all demonstrate a very good performance against the sparse representation error.

Though Y0ΨS is a satisfactory approximate of Y, E can be very big if Φ is not properly chosen. In order to reconstruct Y0 with higher accuracy, E should be taken into account in designing sensing matrix. Therefore, it is desired to choose Φ in the same way as suggested by (10) and at the same time, to reduce E as much as possible. To deal with this multi-target problem, Li et al. proposed an approach to robust sensing matrix design in [26] in which the following measure is investigated ϱeHATAF2+αΦEF2where H belongs to a subset Star defined before, A=ΦΨ is the equivalent dictionary, and α ≥ 0 is a parameter controlling the coupling of the measurement sparse representation error E in the cost function.

The optimal robust sensing matrix problem under this framework was formulated in [26] as {Φ˜,H˜}argminΦ,Hϱes.t.HStarwhere Ψ and E are given. Star, taking IL and Staretf, was considered respectively. Experiments showed that the obtained sensing matrix is very robust against the sparse representation error, comparable to [5] and [25].

The main problems and corresponding contributions in this paper are as follows.

  • Traditionally, the sparsifying dictionary design is to solve (6). In such an approach, the main concern is to minimize the sparse representation error and the atoms of the obtained dictionary may be very coherent. As understood, it is very difficult to make the equivalent dictionary A have a small mutual coherence μ(A) if the dictionary Ψ is very coherent.1 Therefore, it is important to design the sparsifying dictionary Ψ such that the sparse representation error is minimized with minimal coherence. This is the first problem to be investigated in this paper. A new approach to incoherent sparsifying dictionary design is proposed for that purpose and a gradient descent-based algorithm is derived to obtain the corresponding optimal sparsifying dictionary.

  • Denote Q¯QDQ, where DQ is a diagonal matrix such that Q¯(:,k)2=1. Q¯ is said the normalized version of matrix Q. It follows from (8) that Q has the same coherence properties as its normalized version Q¯ does. Note that the term HATAF2 in the classical mutual coherence based approaches, specified in (10) and (14), is intended to measure the coherence difference between the target Gram and that of the equivalent dictionary A=ΦΨ. As H is assumed to have its diagonal elements all equal to one, this term has the assigned physical meaning only when A is normalized. This means that the optimal sensing matrix design problems (10) and (14) should be solved subject to A(:,k)2=1,k. The second problem to be considered in this paper is to investigate the optimal robust sensing matrix design problem (10) but with A replaced by its normalized version A¯. Based on a parametric technique, a gradient descent-based algorithm is derived to solve such a problem.

The outline of this paper is given as follows. A novel framework for designing incoherent dictionary is proposed in Section 2, where an alternating minimization-based algorithm is derived to solve for the optimal dictionary Ψd. Section 3 is devoted to designing robust sensing matrix, which is of the same form as (10) but with A replaced by its normalized version A¯. Such a problem is considered for H being searched within Staretf and H fixed to be the Gram of the dictionary Ψd. Experiments are carried out in Section 4 to demonstrate the superiority of the proposed approaches to the existing ones. Some concluding remarks are given in Section 5 to end this paper.

Section snippets

A novel framework for designing incoherent sparsifying dictionary

As argued in the previous section, in order to enhance the performance of a CS system characterized with (Φ, Ψ) the equivalent dictionary A=ΦΨ should be as incoherent as possible. Since it is very difficult to make the equivalent dictionary have a good coherence behavior by adjusting Φ if Ψ is very coherent due to the relationship of dimensionality M < N < L, it is desired to design the dictionary Ψ with its coherence behavior taken into account. This is the motivation and objective of this

Revisit of the mutual coherence-based optimal sensing matrix

In this section, we consider the problem of designing sensing matrix with a sparsifying dictionary Ψ given, say Ψ=Ψd.

Experimental results

In this section, we evaluate the performance of the proposed algorithms and compare them with some of existing works.

Concluding remarks

In this paper, we investigated the problem of designing robust CS systems, in terms of optimizing the sparsifying dictionary and the sensing matrix. The two problems were addressed using an alternating minimization-based approach, where both the dictionary and sensing matrix are updated via a gradient descent-based algorithm. The expressions for the corresponding derivatives were derived. The validity of the proposed approaches was clearly demonstrated with experiments.

In the proposed CS

Acknowledgement

This work was supported by the National Natural Science Foundation of China (grant numbers 61571174, 61503339), and Zhejiang Provincial Natural Science Foundation of China (grant number LY15F010010).

References (28)

  • D.L. Donoho et al.

    Optimally sparse representation in general (nonorthonormal) dictionaries via l1 minimization

    Proc. Natl. Acad. Sci.

    (2003)
  • J.A. Tropp

    Greed is good: algorithmic results for sparse approximation

    IEEE Trans. Inf. Theory

    (2004)
  • J.A. Tropp et al.

    Signal recovery from random measurements via orthogonal matching pursuit

    IEEE Trans. Inf. Theory

    (2007)
  • E.J. Candes et al.

    An introduction to compressive sampling

    IEEE Signal Process. Mag.

    (2008)
  • Cited by (18)

    • Construction of unit norm tight frames inspired by the Paulsen problem

      2022, Digital Signal Processing: A Review Journal
    • Sensor position optimization for flux mapping in a nuclear reactor using compressed sensing

      2021, Annals of Nuclear Energy
      Citation Excerpt :

      The orthogonal gradient descent method for sensing matrix optimization is used here to achieve optimal incoherence. An iterative procedure is developed in (Li et al., 2017) for searching the optimal dictionary, in which the dictionary update is executed using a gradient descent based algorithm. Another method of sensing matrix optimization is proposed in (Oey, 2018) by introducing the relative sparse representation error (SRE) parameter and incorporates it into the optimization problem.

    • Optimized structured sparse sensing matrices for compressive sensing

      2019, Signal Processing
      Citation Excerpt :

      Toward that end, it is important to develop a quantized (even 1-bit) sparse sensing matrix. We finally note that it remains an open problem to certify certain properties (such as the RIP) for the optimized sensing matrices [9–19], which empirically outperforms a random one that satisfies the RIP. Works in these directions are ongoing.

    View all citing articles on Scopus
    View full text