Keywords

1 Introduction

Image reconstruction problems arisen in medical imaging area such as fast MRI and low dose CT are mathematically ill-posed inverse problems. We often consider a linear imaging system with a forward operator A, for example partial 2D Fourier transform for MRI and X-ray transform for CT. The measurement y is given as \(y=Ax\) for x being the underlying image in the perfect noise free case. The linear operator A is ill-posed for most applications; therefore some statistical priors are necessary to make these problems invertible. Sparsity priors such as total variation (TV) [1] and wavelet tight frame [2] have been among those popular regularization and studied extensively in the literature. 5 In practice, the measurements are often corrupted by noise, i.e.

$$\begin{aligned} y=Ax+\epsilon \end{aligned}$$
(1)

if we assume it is i.i.d additive Gaussian noise. Derived by a maximum-likelihood estimator of the physical process and sparsity prior distribution of the original image, it is common to solve the following unconstrained model

$$\begin{aligned} \min \limits _{x}\frac{1}{2} \Vert A{x}-y \Vert _2^2+ \mu \Vert \mathcal {D} {x} \Vert _1, \end{aligned}$$
(2)

where \(\mathcal {D}\) is a sparsity transform, for example, the mentioned spatial gradient operator \(\nabla \) (total variation) and a tight frame transform \(\mathcal {W}\).

However, some side effects will also be involved by sparsity regularization due to the predefined sparsity transform, for example, the staircasing effects are introduced by TV. Deep networks have been successfully applied to many image restoration tasks such as image denoising, inpainting, super resolution [3,4,5]. It is shown in these work that those delicately designed deep networks achieved state-of-the-art performance for these image processing problems. However, despite their superior performance, it is still challenging to adapt the network for medical image reconstruction problems as the networks are specifically designed for those particular forward operators. Most of the emerged deep learning based medical image reconstruction are based on the sparsity optimization algorithms such as primal dual methods and Alternating Direction of Multiplier methods (ADMM). For example, ADMM-net [6] and the learned variational network in [7] aim to mimic the optimization algorithms for solving the sparse regularization model (2) and build a network to learn the sparsity transform \(\mathcal {D}\). In [6, 8, 9], analytic solutions are obtained for the inversion layers and a proximal operator is learned for the denoising/anti-artifact layers. In the work [10, 11], the authors carefully designed a MRI reconstruction network to enhance data consistence. These networks achieve state-of-the-art reconstruction results and at the same time are usually more complicated compared to common neural networks, especially for derivative computing.

Because of the intractability of inversion of an ill-posed operator with partial and corrupted measurements, we do not intend to learn an end-to-end inversion mapping from the measurements to the reconstructed image as previous work. Inspired by regularization based image reconstruction methods, we propose to split the task of inversion of a known forward operator from learning an image representation network. In order to feed the inputs into networks implicitly, we establish a data consistence constrained network loss function and then apply ADMM to split the tasks of solving the inversion and learning a network. The problem is solved through simple iterations of existing techniques of conventional inversion and usual image representation/denoising deep network learning. We note that our method is different from ADMM-net, as ADMM-net considered the solution of the sparsity optimization algorithms ADMM as the network output and the sparsity transform \(\mathcal {D}\) is considered as network parameters to be learned. Our method does not intend to design a new network structure but integrate existing ones in the ADMM algorithm to solve the proposed model. The prior of to-be-reconstructed images is obtained by the learned network, which can be easily used for the inference process. Finally, data consistence is maintained through iteration for the reconstruction, which is usually not the case for most of learning based reconstruction methods.

2 Our Approach

The Stacked Auto-Encoder (SAE) deep network has shown to be a useful image representation method for denoising [12], where a greedy layerwise approach is proposed for pretraining. Stacked Convolutional Auto-Encoder (SCAE) [13] was further proposed to preserve frequently repeated local image features. And some improvement has been achieved by the Stacked Non-Local Auto-Encoder (SNLAE) in [14] by using a nonlocal collaborative stabilization. In recent years, more and more networks emerge for image restoration problems. For example, it has been demonstrated that generative adversarial network (GAN) model is powerful for medical or natural image restoration problems such as super-resolution [4] and deburring [15].

In the following, we propose our image reconstruction learning model based on a denoising network or GAN model. We denote the input dataset for a network \(\varvec{x}=\{\varvec{x}_k\}_{k=1}^{m}\) with the corresponding ground truth \(\tilde{\varvec{x}}=\{\tilde{\varvec{x}}_k\}_{k=1}^{m}\) where m is the number of samples. In image reconstruction inverse problems, we denote the corresponding measurements \(\varvec{y}=\{\varvec{y}_k\}_{k=1}^{m}\) for \(\varvec{y}_k=A\varvec{x}_k\) where A is a known forward operator. Here we use the boldface to denote the vectors of all the input and output images in the training procedure and we use the regular characters for their counterparts for the inference.

The learning procedure of a denoising network is designed to minimize a cost function \(L_{H} (\varvec{x}, {\theta })\), for example the quadratic function

$$\begin{aligned} \min _{{\theta }} L_{H} (\varvec{x}, {\theta }):= \Vert f(\varvec{x},{\theta })-\tilde{\varvec{x}}\Vert _{2}^2 \end{aligned}$$
(3)

where \(f(\varvec{x},{\theta })\) is the output and \({\theta }\) is the set of network parameters. For GAN model, the following min-max problem is considered

$$\begin{aligned} \max \limits _{{\theta _d}}\min \limits _{{\theta _g}} L_{G} (\varvec{x}, {\theta _g}, {\theta _d}) = \frac{1}{m}\sum _{i=1}^{m}[\log (D(\tilde{\varvec{x}}_i, {\theta _d}))]+\frac{1}{m}\sum _{i=1}^{m}[\log (1-D(G(\varvec{x}_i, {\theta _g}), {\theta _d}))] \end{aligned}$$
(4)

where \(\theta _g\) and \(\theta _d\) are the parameter sets for the generative and discriminative networks respectively, and \(G(\cdot ,\theta _g)\) and \(D(\cdot ,\theta _d)\) are the outputs of the two networks.

Let

$$\begin{aligned} J(\varvec{x})=\eta (\sum _{i=1}^{m}\Vert A\varvec{x}_i-\varvec{y}_i \Vert _2^2+ \mu \Vert \mathcal {D} \varvec{x}_i \Vert _1) \end{aligned}$$
(5)

be the conventional data consistency term with sparse regularization. The formulation can be easily generalized for other data fidelity derived from max-likelihood of a posteriori estimation, and with other regularization term in \(J(\varvec{x})\). The regularization parameter \(\mu \) can be very small or even zero.

Being motivated by the fact that many powerful networks are available for removing noise and artifact, we now attempt to propose to integrate deep learning network in the image reconstruction. Our basic idea is to use the variable \(\varvec{x}\) which meets the data consistence implicitly to fed into the to-be-learned networks, by solving the following problem with a deep learning regularization

$$\begin{aligned} \min _{\varvec{x},{\theta }}L_{H} (\varvec{x}, {\theta })+ J(\varvec{x}), \end{aligned}$$
(6)

and

$$\begin{aligned} \max \limits _{{\theta _d}}\min \limits _{{\theta _g},\varvec{x}} L_{G} (\varvec{x}, {\theta _g},{\theta _d})+ J(\varvec{x}) \end{aligned}$$
(7)

for \(L_H\) and \(L_G\) being the cost function for a denoising network and GAN model respectively.

The above two models can be solved by adapting ADMM algorithm [16]. Taking (6) as an example, we reformulate it as

$$\begin{aligned} \begin{array}{l} \min \limits _{\varvec{x},{\theta },\varvec{z}}L_{H} (\varvec{x}, {\theta })+ J(\varvec{z}) \\ s.t. \quad ~ \varvec{x}=\varvec{z}. \end{array} \end{aligned}$$
(8)

The augmented Lagrangian for the problem (8) is given as

$$\begin{aligned} L_\rho (\varvec{x},{\theta },\varvec{z},\varvec{p})=L_{H} (\varvec{x}, {\theta })+ J(\varvec{z})+\varvec{p}^T(\varvec{x}-\varvec{z})+\frac{\rho }{2}\Vert \varvec{x}-\varvec{z}\Vert _2^2 \end{aligned}$$
(9)

for a parameter \(\rho >0\).

The idea of the ADMM algorithm for solving the optimization problem (8) is to alternatingly update the primal variables \(\varvec{x},{\theta },\varvec{z}\) by minimizing the augmented Lagrangian function (9) and update the dual variable \(\varvec{p}\) with a dual ascent step, which leads to the following scheme

$$\begin{aligned} \left\{ \begin{array}{l} \varvec{z}^{k+1}= \arg \min \limits _{\varvec{z}} J(\varvec{z})+\frac{\rho }{2}\Vert \varvec{x}^{k}-\varvec{z}+\varvec{b}^k\Vert _2^2\\ {\theta }^{k+1}= \arg \min \limits _{{\theta }} L_{H} (\varvec{x}^k, {\theta })\\ \varvec{x}^{k+1}= \arg \min \limits _{\varvec{x}} L_{H} (\varvec{x}, {\theta }^{k+1})+\frac{\rho }{2}\Vert \varvec{x}-\varvec{z}^{k+1}+\varvec{b}^k\Vert _2^2\\ \varvec{b}^{k+1}=\varvec{b}^{k}+(\varvec{x}^{k+1}-\varvec{z}^{k+1}) \end{array} \right. \end{aligned}$$
(10)

for \(\varvec{p}^k={\rho }{\varvec{b}^k}\). The variables \(\varvec{x}^{0}\) and \(\varvec{z}^{0}\) are initialized by

$$\begin{aligned} \varvec{x}^{0}=\varvec{z}^{0}= \arg \min \limits _{\varvec{z}} J(\varvec{z}). \end{aligned}$$
(11)

For the first subproblem in (10), we can solve this conventional reconstruction problem with a classical reconstruction method, such as ADMM again if there is a sparse regularization term present in J; For the second subproblem in (10), it is a typical loss function minimization for a deep learning network with the input \(\varvec{x}^k\), and a stochastic gradient descent method built in the neural network tools can be applied; For the third subproblem in (10), we can also use a stochastic gradient descent method.

The similar alternating scheme as (10) can be obtained for solving the GAN training model (7) by replacing the second step in (10) by

$$\begin{aligned} ({\theta _g^{k+1}}, {\theta _d^{k+1}})= \arg \max \limits _{{\theta _d}}\min \limits _{{\theta _g}} L_{G} (\varvec{x}^k, {\theta _g},{\theta _d}). \end{aligned}$$
(12)

Here we need to alternatingly apply gradient descent for updating \({\theta _g^{k+1}}\) and gradient ascent for updating \({\theta _d^{k+1}}\) as the general GAN methods do.

After we obtain the network parameter set \({\theta }^*\), the learned network is ready to be used for mapping a given input image \(\hat{x}\) to an estimated ground truth image x by \(x=f(\hat{x},{\theta }^*)\). More precisely, given a measurement y, we obtain the reconstructed image x through the following scheme

$$\begin{aligned} \left\{ \begin{array}{l} {z}^{k+1}= \arg \min \limits _{{z}} J({z})+\frac{\rho }{2}\Vert {x}^{k}-{z}+{b}^k\Vert _2^2\\ \hat{x}^{k+1}= f ({x}^k, {\theta }^*)\\ {x}^{k+1}= \frac{1}{1+\rho }(\hat{x}^{k+1}+\rho ({z}^{k+1}-{b}^k))\\ {b}^{k+1}={b}^{k}+(\hat{x}^{k+1}-{z}^{k+1}). \end{array} \right. \end{aligned}$$
(13)

The initialization of \(\varvec{x}^{0}\) and \(\varvec{z}^{0}\) are performed similarly as (11). For the GAN based reconstruction model, we can use the similar scheme to (13) by replacing \(f(\varvec{x}^{k+1}, \theta ^*)\) with \(G(\varvec{x}^{k+1}, \theta ^*_g)\) in the second step to obtain the reconstructed image x from a measurement.

3 Experiments

In this section, we perform the experiments on MRI reconstruction from downsampled measurements. The MRI data are generated by partial Fourier transform with Gaussian noise corruption, i.e. \(y=K \mathcal {F}(x+l*(\xi _1+\xi _2*i))\) where l is the noise level, \(\xi _1, \xi _2\) obey i.i.d normal distribution, x is the ground truth image, and K is the downsample operator. In our experiments, the MRI image dataset is from ADNI (Alzheimer’s Disease Neuroimaging Initiative) of which 300 slices of size \(192\times 160\) are used for training and 21 slices are used for inferring, and three different downsampling patterns with three downsamping rates are used for simulating the measurements. To speed up the training process and alleviate the ill-conditionness when the sampling rate is severely low, we use TV term in the reconstruction functional J(x) (5), but its weight \(\mu \) is decreasing by outer loops of our method. As \(\mu \) gets smaller and smaller, the contribution of TV is eventually much smaller than what is used for sparsity regularized reconstruction. In order to demonstrate the flexibility of our approach, we implement three kinds of networks for MRI reconstruction, i.e. SCAE [13], SNLAE [14] and GAN [5]. The basic setup and training/test time on a PC with an Intel i7 and a Nvidia GPU GTX1060 for the three networks SCAE, SNLAE and GAN are listed in Table 1. To demonstrate the convergence of the proposed method, the intermediate training results and inferring results by SCAE of one slice are shown in Fig. 1, in which we can see that the images trend to be of good quality and higher PSNR.

Table 1. Network setup and computation time
Fig. 1.
figure 1

Intermediate results \([f( x^{0}, {\theta }^{0}), f( x^{1}, {\theta }^{1}),\cdots , f( x^{5}, {\theta }^{5})]\) for MRI reconstruction with 25% radial downsampling by SCAE. (a) training step; (b) inferring step

Fig. 2.
figure 2

MRI reconstruction results. Sampling pattern and rate: 1D random with 25%; The first row: noise free; The second row: 10% noise.

In Fig. 2, we show the comparison of our methods with the zero-filling method (ZF) [17], TV regularization based reconstruction [18], and ADMM-net [6] for the case with 1D random downsampling pattern and 25% downsampling rate (the results for the other sampling patterns and rates are provided in the Supplementary file). For the three downsampling patterns, we observe that in the case of noise-free, the reconstructed images by our proposed methods including SCAE, SNLAE and GAN, and by ADMM-net have better spatial resolution. In the case of noise level 10% and with measurements of very low sampling rate, our method with the four networks still achieve good performance. Specially, for the case with 1D random downsampling pattern, we can find our methods have alleviate both noise and artifact. Visually, our methods generally achieved cleaner images compared to ADMM-net for noisy data. To assess reconstruction image quality quantitatively, we further show the results of PSNR and SSIM in Table 2. All the best PSNR are scattered at the methods with learned regularization.

Table 2. MRI reconstruction quality (PSNR/SSIM)

4 Conclusion

We developed a variational image reconstruction method which integrates image representation network and classical image reconstruction method. The proposed model exhibits flexibility of choosing classical reconstruction method and powerful deep representation network. The application on MRI image reconstruction shows the effectiveness of the proposed method and it is also clear that the proposed method can be easily extended to other applications.