Elsevier

Signal Processing

Volume 137, August 2017, Pages 287-297
Signal Processing

Stability and robustness of the l2/lq-minimization for block sparse recovery

https://doi.org/10.1016/j.sigpro.2017.02.012Get rights and content

Highlights

  • We give a new sufficient condition to exactly recover block sparse vectors via the l2/lq-minimization.

  • We investigate the instance optimality of the encoder-decoder pairs (A,Δl2/lq).

  • We establish the stability and robustness estimate of the l2/lq-minimization.

Abstract

This paper focuses on block sparse recovery with the l2/lq-minimization for 0 < q ≤ 1. We first give the lq stable block Null Space Property (NSP), a new sufficient condition to exactly recover block sparse signals via the l2/lq-minimization, and it is weaker than the block Restricted Isometry Property (RIP). Second, we propose the lp, q(0 < qp) robust block NSP and generalize the instance optimality and quotient property to the block sparse case. Furthermore, we show that Gaussian random matrices and random matrices whose columns are drawn uniformly from the sphere satisfy the block quotient property with high probability. Finally, we obtain the stability estimate of the decoder Δl2/lqϵ for y=Ax+e with a priori ‖e2 ≤ ϵ based on the robust block NSP. In addition, for arbitrary measurement error, we also obtain the robustness estimate of the decoder Δl2/lq for y=Ax+e without requiring the knowledge of noise level, which provides a practical advantage when the estimates of measurement noise levels are absent. The results demonstrate that the l2/lq-minimization can perform well for block sparse recovery, and remains not only stable but also robust for reconstructing noisy signals when the measurement matrices satisfy the robust block NSP and the block quotient property.

Introduction

Compressed sensing [4], [6], [14] is a scheme which shows that some signals can be reconstructed from fewer measurements compared to the classical Nyquist-Shannon sampling method. Suppose the observed data yRn, one will recover the signal xRN via the linear system y=Ax+e,where ARn×N (n < N) is a real matrix, called the measurement matrix, and eRn is a noise vector. To extract the information x, one applies a decoder Δ to y which is, typically, a nonlinear operator mapping from Rn to RN. The vector x*=Δ(Ax+e)is viewed as an approximation to x. The central question of compressed sensing is: What are the good encoder-decoder pairs (A, Δ) [2]? It is natural to hope that there exists an encoder-decoder pair (A, Δ), such that the measure error xΔ(Ax+e)q is as small as possible, where, xq={(i=1N|xi|q)1/q,0<q<;maxi=1,,N|xi|,q=.

To measure the performance of an encoder-decoder pair (A, Δ), one used Gelfand width [2], [13] to characterize the degree of approximation of xΔ(Ax) in the noise-free case. Then, as for a general matrix ARn×N, does there exist a decoder Δ, such that x=Δ(Ax)? In fact, in classic compressed sensing problem, there exists an essential decoder Δ0: Δ0(y):=argminxRNx0s.t.y=Ax,which can lead to x=Δ0(Ax) for all s-sparse vectors xRN whenever s<n2 [16]. Here, ‖x0 denotes the number of non-zero entries of the vector x, an s-sparse vector xRN is defined by ‖x0s < <N.

However, the l0-minimization (4) is a nonconvex and NP-hard optimization problem [32]. To overcome this problem, one proposed the decoder Δq or lq-minimization for 0 < q ≤ 1 [5], [6], [15], [17], [23], [27], [39]. Δq(y):=argminxRNxqs.t.y=Ax.When q=1, [3] proved that the solutions to (5) are equivalent to those of (4) provided that the measurement matrices satisfy the Restricted Isometry Property (RIP) [5] with some definite Restrict Isometry Constant (RIC) δs ∈ (0, 1), here δs is defined as the smallest constants satisfying (1δs)x22Ax22(1+δs)x22for any s-sparse vectors xRN. For 0 < q < 1, recently, [34] also proved that the solutions to (5) are equivalent to those of (4) as long as q is smaller than a definite constant q0 < 1, where q0 depends on A and y. The lq(0 < q < 1)-minimization is a natural extension from the l1-minimization to the l0-minimization, because, compared to the l1-norm ‖x1, xqq for 0 < q < 1 can make a closer approximation to ‖x0 and remain to induce sparsity. Besides, it was shown that the lq(0 < q < 1)-minimization often needs less restrictive the RIP requirements, but still could guarantee perfect recovery for smaller q [8], [29], [36] compared to the l1-minimization. Numerical experiments [7] also showed that the lq(0 < q < 1)-minimization recovers sparse signals from fewer linear measurements than does the l1-minimization. Of course, there exist other kind of nonconvex examples which replace the l0-minimization (4), typically, the smoothly clipped absolute deviation (SCAD) [21], the minimax concave penalty (MCP) [44] and the capped l1-norm [45]. These examples can often induce better sparsity and reduce the bias, and they are relatively easy to be implemented compared to the lq(0 < q < 1)-minimization from the algorithmic point of view. But they need theoretical guarantees to determine an appropriate regularization parameter based on the corresponding iterative thresholding algorithm [42]. However, the lq(0 < q < 1)-minimization strategy offers more theoretical advantages compared to the above mentioned nonconvex relaxed models in compressed sensing. Therefore, this paper mainly focuses on the lq(0 < q < 1)-minimization in what following.

In addition to recovering sparse vectors from error-free measurement, one requires that the decoder should be robust to noise and stable with regards to the compressibility of xRN [25]. That is y=Ax+e with ‖e2 ≤ ϵ, the decoder reads as: Δqϵ(y):=argminxRNxqs.t.yAx2ϵ,which provides a stability estimate of the form (see [3] for q=1 and [23] for a general statement with 0 < q < 1.) xΔqϵ(Ax+e)2Cs1/q1/2σs(x)q+Dϵ,where σs(x)q denotes the best s-term approximation error of xRN in lq (quasi-)norm where q > 0, i.e., σs(x)q:=infz0sxzq.Here and in the rest of the present paper, C and D denote positive absolute constants whose values may change from instance to instance.

The inequality (8) was obtained based on the RIP. However, using the Null Space Property (NSP) [25], [26] (see [25] for q=1, and [26] for 0 < q < 1), one also gets the results similar to (8). Furthermore, if we set e=0, then the inequality (8) reads as xΔq(Ax)2Cs1/q1/2σs(x)q.This motivates the concept of the instance optimality. Let An,N denote the set of all encoder-decoder pairs (A, Δ), that is An,N={(A,Δ)|ARn×N,Δ:RnRN}.Then the instance optimality is defined as follows.

Definition 1

Let 0 < qp, for s[N]={1,2,,N} and for all xRN, there exist a pair (A,Δ)An,N and a constant C > 0, if xΔ(Ax)pCs1/q1/pσs(x)q,then the encoder-decoder pair (A, Δ) is called the (p, q) instance optimality of order s.

From Definition 1, we can conclude that an s-signal xRN can be exactly recover if there exists a pair (A, Δ) that satisfies the instance optimality. We can also see from (8) that the encoder-decoder pair (A, Δq) satisfies (2, q) instance optimality when the matrix A has the RIP or the NSP in terms of different q ≤ 1.

We see that the above stability result (8) requires a proper estimate of the level ϵ of measurement error, but sometimes, it does not need to estimate the bound of ‖e2, that is, a priori ‖e2 ≤ ϵ is not available in many practical scenarios. Therefore, one hopes that the result (8) could be improved, for this purpose, one proposed the following quotient property.

Definition 2

Given a matrix ARn×N, if there is a constant α > 0, such that αB2nA(BqN),then the matrix A is said to satisfy the lq quotient property with constant α, where BqN denotes the unit ball relative to lq norm (q ≥ 1) or quasi-norm (0 < q < 1), that is BqN:={xRN,xq1}.

Using the quotient property, one can obtain the following robustness estimate (see [25], [43] for q=1 and [35] for 0 < q < 1), xΔq(Ax+e)2Cs1/q1/2σs(x)q+De2.Obviously, (12) indicates that the lq-minimization can perform well for arbitrary measurement error without estimating the upper bound of ‖e2.

However, the above classic compressed sensing only considers the sparsity of the reconstructed signal, but it does not take into account any further structure. In many practical applications, the reconstructed signal is not only sparse but also non-zero entries are aligned to some blocks rather than being arbitrarily spread throughout the vector. These signals are called the block sparse signals and arise in several areas of signal processing and machine learning, for example, color imaging [31], DNA microarrays [33], equalization of sparse communication channels [11], multi-response linear regression [37], image annotation [28], etc.

To define the block sparsity, it is necessary to introduce some further notations. Suppose that xRN is split into m blocks, x[1],x[2],,x[m], which are of length d1,d2,,dm, respectively, that is x=[x1,,xd1x[1],xd1+1,,xd1+d2x[2],,xNdm+1,,xNx[m]]T,and N=i=1mdi. A vector xRN is called block s-sparse over I={d1,d2,,dm} if x[i] is nonzero for at most s indices i [19]. Obviously, di=1 for each i, the block sparsity reduces to the conventional definition of a sparse vector.

Furthermore, we introduce the following notations, x2,q={(i=1mx[i]2q)1/q,0<q<;maxi=1,,mx[i]2,q=;i=1mI(x[i]2),q=0,where I(x) is an indicator function that obtains the value 1 if x > 0 and 0 otherwise. So a block s-sparse vector x can be defined by ‖x2, 0s, and x0=i=1mx[i]0. The same as ‖xq, for 1 ≤ q ≤ ∞, ‖x2, q is a norm, while for 0 < q < 1, it is a quasi-norm and satisfies the q-triangle inequality: x+y2,qqx2,qq+y2,qq,x,yRN.Obviously, for an m-block signal xRN whose structure is like (13), then x2,2=x2 and for 0 < qp, we have x2,px2,qm1/q1/px2,p,and x2,qxq,especially, ‖x2, 1 ≤ ‖x1. Let Σs denote the set of all block s-sparse vectors: Σs={xRN:x2,0s}. Similar to the definition of σs(x)q, we use σs(x)2, q to denote the best block s-term approximation error of xRN in l2/lq (quasi-)norm where q > 0, i.e., σs(x)2,q:=infzΣsxz2,q.It is clear that σs(x)2,q=0 for all xΣs.

To recover a block sparse signal, similar to the standard l0-minimization, one seeks the sparsest block sparse vector via the following l2/l0-minimization [18], [19], [20], Δl2/l0(y):=argminxRNx2,0s.t.y=Ax.But the l2/l0-minimization problem is also NP-hard. It is natural to use the l2/l1-minimization to replace the l2/l0-minimization. Consider [18], [19], [20], [38] Δl2/l1(y):=argminxRNx2,1s.t.y=Ax.To characterize the performance of this method, Eldar and Mishali [19] proposed the block Restricted Isometry Property (block RIP).

Definition 3 Block RIP

Given a matrix ARn×N, for every block s-sparse xRN over I={d1,d2,,dm}, there exists a positive constant 0<δs|I<1, if (1δs|I)x22Ax22(1+δs|I)x22,then the matrix A satisfies the s-order block RIP over I.

Obviously, the block RIP is an extension of the standard RIP, but it is a less stringent requirement comparing with the standard RIP [1], [19]. Eldar et al. [19] proved that the l2/l1-minimization can exactly recover any block s-sparse signal when the measurement matrices A satisfy the block RIP with δ2s|I<0.414. The block RIP constant can be also improved, for example, Lin and Li [30] improved to δ2s|I<0.4931, and established another sufficient condition δs|I<0.307 for exact recovery.

Based on the performance of the lq(0 < q < 1)-minimization [7], [12], [23], it is also natural to extend the lq-minimization to the setting of block sparse recovery, this motivates us to consider the l2/lq-minimization. It is defined by [31], [40], [41] Δl2/lq(y):=argminxRNx2,qs.t.y=Ax,or Δl2/lqϵ(y):=argminxRNx2,qs.t.yAx2ϵfor the inaccurate measurement y=Ax+e with ‖e2 ≤ ϵ. Note that Δl2/lq0=Δl2/lq.

Like the lq(0 < q < 1)-minimization, the l2/lq(0 < q < 1)-minimization also has superior properties compared to the l2/l1-minimization. Some numerical experiments demonstrated that fewer measurements are needed for exact recovery with the decoder Δl2/lq when 0 < q < 1 than when q=1, see [31], [40], [41]. Moreover, [41] studied the exact recovery conditions and gave the stability estimate of the l2/lq(0 < q < 1)-minimization based on the block restricted q-isometry property. But there, it requires a proper estimate of the level ϵ of measurement error. However, such an estimate may not be a priori available in some settings, i.e., it is not necessary to estimate the upper bound of ‖e2. Thus, a question arises if the decoder Δl2/lq can perform well for arbitrary measurement error when the estimates of measurement noise levels are absent. Besides, another interesting question arises if the block RIP can be weakened to exactly recover block-sparse vector via the l2/lq-minimization. The purpose of this paper is to investigate the above two questions. To achieve this goal, we propose the lq stable block NSP, which weakens the block RIP. We also propose the lp, q robust block NSP for 0 < qp, the block quotient property and the block instance optimality, which are very crucial to characterize the stability and robustness of the decoder Δl2/lq for arbitrary noise e without the need to estimate ‖e2.

The remainder of the paper is organized as follows. In Section 2, based on the lq block NSP, we first give the lq stable block NSP and derive its an equivalent form, and show it is weaker than the block RIP. Then we further consider the lq robust block NSP and the lp, q robust block NSP for 0 < qp, respectively. Using these properties, we effectively characterize reconstruction results for the l2/lq-minimization when the observations are corrupted by noise. In Section 3, we give the block instance optimality, the block quotient property and the simultaneous block quotient property. These properties combining with the lp, q robust block NSP will yield two important lemmas, which are the cores of the proof of the main results. In addition, we also show that Gaussian random matrices satisfy the block quotient property with high probability. In Section 4, we give our two main results, Theorems 7 and 8. Section 5 is a discussion and Section 6 includes some conclusions. Finally, we relegate the proofs of the main results, lemmas and proposition, i.e., Theorems 1–Theorem 5, Theorem 7, Theorem 8, Lemma 1, Lemma 2 and Proposition 1 to Appendix A.

Section snippets

lq Block null space property

Suppose that xRN is an m-block signal, whose structure is like (13), we set S{1,2,,m} and by SC we mean the complement of the set S with respect to {1,2,,m}, i.e., SC={1,2,,m}S. Let xS denote the vector equal to x on a block index set S and zero elsewhere, then x=xS+xSC. In [26], we extended the lq NSP on the traditional sparse signals to the block sparse case.

Definition 4

(lq block NSP) Given a matrix ARn×N, for any set S{1,2,,m} with card(S) ≤ s and for all vKerA\{0}, if vS2,q<vSC2,q,then

Instance optimality and quotient property

In this section, we shall introduce the block instance optimality and the block quotient property on the l2/lq-minimization. These properties combining with the lp, q robust block NSP will yield Theorem 8. In addition, we shall discuss what matrices satisfy the block quotient property.

Definition 8

((p, q) block instance optimality) Let 0 < qp, for s[N]={1,2,,N} and for all xRN, there exist a pair (A,Δ)An,N and a constant C > 0, such that xΔ(Ax)2,pCs1/q1/pσs(x)2,q,then the encoder-decoder pair (

Main results

In this section, we give our main results. The lp, q robust block NSP is necessary to the two theorems, while the lq block quotient property is another necessary condition to Theorem 8. We defer the proofs of the results to Appendix A.

Theorem 7

Given 1 ≤ p ≤ 2, 0 < q ≤ 1, suppose that the matricesARn×N satisfy the lp, q robust block NSP of order s with constants 0 < τ < 1 and γ > 0, then, for allxRN andeRn withe2 ≤ ϵ, there exist positive constants C1 and D1, it holds that xΔl2/lqϵ(Ax+e)2,pC1s1

Discussion

In this section, we first discuss the tightness of the bounds in our results when e=0. Second, we describe the impact of some coefficients in Inequalities (43) and (46) as shown in Figs. 1 and 2. Third, we analyze several parameters relevant to the above desired properties.

A natural question is whether the bounds given in the main results are tight. Here, we only consider the case of e=0. Suppose that xB2,qN,s=cs*=cn/(ln(N/n)+1), according to (46), we have xΔl2/lq(Ax)2,pC(ln(N/n)+1n)1/q1/p

Conclusions

This paper focuses on block sparse recovery with the l2/lq-minimization. We generalized the null space property, the instance optimality and the quotient property to the block sparse case, and gave the lq stable block NSP and its an equivalent form, a new sufficient condition to exactly recover block sparse signals via the l2/lq-minimization, but it is weaker than the block RIP proposed by Eldar and Mishali. Because the instance optimality is important to evaluate the performance of the

Acknowledgements

The authors would like to thank Prof. Jianjun Wang and Dr. Wendong Wang, Southwest University, China, and Dr. Feng Zhao, University of Lincoln, UK, for a beneficial discussion of the manuscript. Moreover, the authors also thank the Editor and anonymous Reviewers for their insightful comments and valuable suggestions, which led to a significant improvement of this paper. This work was supported by the National Natural Science Foundation of China (NSFC) (11131006), Science Research Project of the

References (45)

  • R. Baraniuk et al.

    Model-based compressive sensing

    IEEE Trans. Inf. Theory

    (2010)
  • R. Baraniuk et al.

    A simple proof of the restricted isometry property for random matrices

    Constructive Approximation

    (2008)
  • E. Candès et al.

    Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information

    IEEE Trans. Inf. Theory

    (2006)
  • E. Candès et al.

    Decoding by linear programming

    IEEE Trans. Inf. Theory

    (2005)
  • E. Candès et al.

    Near-optimal signal recovery from random projections: universal encoding strategies

    IEEE Trans. Inf. Theory

    (2006)
  • R. Chartrand

    Exact reconstruction of sparse signals via nonconvex minimization

    IEEE Signal Process. Lett.

    (2007)
  • R. Chartrand et al.

    Restricted isometry properties and nonconvex compressive sensing

    Inverse Probl.

    (2008)
  • X. Chen

    Stability of compressed sensing for dictionaries and almost sure convergence rate for the kaczmarz algorithm

    (2012)
  • A. Cohen et al.

    Compressed sensing and best k-term approximation

    Am. Math. Soc.

    (2009)
  • S. Cotter et al.

    Sparse channel estimation via matching pursuit with application to equalization

    IEEE Trans. Commun.

    (2002)
  • M. Davies et al.

    Restricted isometry constants where lp sparse recovery can fail for 0 < p ≤ 1

    IEEE Trans. Inf. Theory

    (2009)
  • D. Donoho

    Compressed sensing

    IEEE Trans. Inf. Theory

    (2006)
  • Cited by (20)

    • Low-rank matrix recovery via regularized nuclear norm minimization

      2021, Applied and Computational Harmonic Analysis
    • Group sparse recovery in impulsive noise via alternating direction method of multipliers

      2020, Applied and Computational Harmonic Analysis
      Citation Excerpt :

      For the Gaussian noise case, see [21,22,30]. Furthermore, various sufficient conditions and other results on recovery of block sparse signals were gained in contributions [23–28]. However, all these researches are discussed only in Gaussian noise case, that is, the observation measurement b is disturbed by Gaussian noise.

    View all citing articles on Scopus
    View full text