Keywords

1 Introduction

In magnetic resonance imaging (MRI), k-space data is acquired using a variety of MR sequences consisting of different radiofrequency and gradient pulses. Then, by transforming the frequency information of the k-space data into spatial information, an MR image can be reconstructed. Generally, 2D or 3D Fourier transform (FT) is used for reconstruction of an MR image because the k-space is typically scanned in a Cartesian trajectory. However, relying entirely on the FT may not be sufficient for certain applications of MRI such as non-Cartesian MRI, accelerated MRI, etc. In such cases, other iterative reconstruction techniques can be utilized to enhance the quality of MR images.

Recently, deep learning has been also applied in MRI to enhance the image quality [1,2,3,4,5,6,7]. Considering that MRI needs to consider multiple sources of errors due to its complex nature, including field inhomogeneity, eddy current effects, phase distortions, regridding, etc., an end-to-end reconstruction using deep learning could provide a unified solution and simplify the reconstruction process of MRI. Image reconstruction by domain-transform manifold learning (AUTOMAP) [1] was a state-of-the-art technique that performed end-to-end reconstruction. As each voxel of an MR image can be regarded as a weighted sum of the entire k-space, AUTOMAP is structured as multiple fully connected layers, requiring a large amount of weights and biases. As a result, the maximum size of input data is limited by the hardware specification that determines the amount of processable weights and biases. To resolve this issue, we adopt the bidirectional recurrent neural network (RNN) as an alternative to the fully connected layers [8, 9]. As a recurrent cell has a memory that stores states, it allows the previous information to be reflected in the output of the next time step without overloading the system.

In this work, the RNN extracts features for image reconstruction while sweeping the k-space both horizontally and vertically. In addition, by utilizing the sequentiality of a recurrent cell, the bidirectional RNN can decode the input k-space data into image data with a reduced number of weights and biases than the fully connected layers. In the following sections, we present a detailed explanation of the proposed recurrent neural network architecture and experiment results.

Fig. 1.
figure 1

The proposed network architecture. The input is sensor domain data with a dimension of \(w \times h \times 2\) (2 for real and imaginary k-space). Dotted lines represent forward direction RNNs and solid lines represent reverse direction RNNs. Two RNN layers for horizontal and vertical sweeps are sequentially used to learn the relation of k-space data in the left-right, right-left, top-down, and bottom-up directions.

2 Method

2.1 Network Architectures

The proposed architecture was designed by benchmarking the ReNet architecture, which was proposed as an alternative to the use of convolutional neural network (CNN) (Fig. 1). The ReNet replaces the convolution and pooling layer of the deep CNN with four recurrent neural networks that sweep both horizontally and vertically in forward and backward directions across the images. In ReNet [9], the relationship between input patch \( P=\left\{ p_{i,j} \right\} \), state \( z_{i,j}^{F}, z_{i,j}^{R} \), and feature map \( V=\left\{ v_{i,j} \right\} _{i=1,...,I}^{j=1,...,J} \), where \(v_{i,j}\in \mathbb {R}^{2d} \) and d is the number of recurrent units, are defined as follows.

$$\begin{aligned} \begin{aligned} v_{i,j}^{F}=f_{VFWD}\left( z_{i,j-1}^{F},p_{i,j} \right) , for \quad j=1,...,J \\ v_{i,j}^{R}=f_{VREV}\left( z_{i,j+1}^{R},p_{i,j} \right) , for \quad j=J,...,1 \end{aligned} \end{aligned}$$
(1)

In this work, we modified the architecture of the original ReNet with the consideration of the characteristics of MR k-space data, thereby improve stability and convergence of network for MRI reconstruction application. In MRI, most of the high frequency values in k-space are close to zero, preventing the network from properly updating weights and biases. Similarly, the marginal areas of MR images tend to have zero-values, which is also inadequate for training. Thus, instead of using single points as an input, single columns (or rows) are used as an input to the recurrent layer. For input format of the network, complex values of the k-space data having a matrix size of \( {w\times h}\ (X\in \mathbb {C}^{w\times h} )\) can be processed as two sets of real-value data, which can be denoted as \( X=\left\{ x_{i,j,k} \right\} \) for \( X\in \mathbb {R}^{w\times h\times 2} \), where w is size of width (\({k_{x}}\)) and h is size of height (\({k_{y}}\)). Then, the output vectors can be defined as follows,

$$\begin{aligned} \begin{aligned} o_{\cdot ,j}^{F}=f_{VFWD}\left( s_{1,j-1}^{F},...,s_{n_{hidden},j-1}^{F},x_{1,j,\cdot },...,x_{I,j,\cdot } \right) , for \quad j=1,...,J\\ o_{\cdot ,j}^{R}=f_{VREV}\left( s_{1,j+1}^{R},...,s_{n_{hidden},j+1}^{R},x_{1,j,\cdot },...,x_{I,j,\cdot } \right) , for \quad j=J,...,1 \end{aligned} \end{aligned}$$
(2)

where \( s_{\cdot ,j} \) represents hidden states of the recurrent layer in the \( j_{th} \) time step and \( n_{hidden} \) represents the number of hidden units in the recurrent layer. J and I represent the sizes of the time step axis and its orthogonal axis, respectively, of the input to the recurrent layer. The output size of the recurrent layer should be \( 2 \times n_{hidden} \) because of the bi-directionality.

In the proposed network, the input data initially flows into the recurrent layer which has a time step axis along the horizontal bi-directions. The output of the horizontal recurrent layer \( O_{horizontal} \in \mathbb {R}^{w\times 2 \times n_{hidden} } \) is serialized into the same size as the input and depth dimensions. Then, the serialized output of the recurrent layer with horizontal time step axis flows into the second recurrent layer, which has a time step axis along the vertical bi-directions. The output of the recurrent layer with vertical time step axis \( O_{vertical} \in \mathbb {R}^{h\times 2 \times n_{hidden} } \) is also serialized into the same size as the input and depth dimensions. Finally, the convolutional layer combines the data along the depth dimensions into a single data to generate a reconstructed image. We deploy a mean-squared-error between the regressed image and the ground truth image as a loss function of the proposed network and choose an optimizer RMSProp [10] with a learning rate of 0.002, momentum of 0.0, and decay of 0.9. Additionally, the learning rate annealing with a step decay is also applied. If a difference of loss values between the current epoch and the one before three epochs is smaller than the pre-determined threshold value, the learning rate is reduced to a predefined ratio.

2.2 Training Environment

2.2.1 Hardware.

The proposed architecture was implemented using Tensorflow [11] based Keras [12] and processed by Intel(R) Core(TM) i5-8400 2.80 GHz CPU and NVIDIA Geforce GTX 1080-Ti GPU. Training the weights and biases takes approximately one hour per each epoch for 50,000 samples.

2.2.2 Dataset.

50,000 image samples of ILSVRC2012 [13] and ten MPRAGE images (3D-array samples) of HCP [14] datasets were used for training, validating, and testing. In addition to MR dataset (i.e., HCP images), we also included images of ILSVRC2012 as pre-training dataset to train the network architecture with input images having more diverse image characteristics. The T1-weighted MPRAGE images in HCP dataset were acquired with following parameters. TR = 2,530 ms, TE = 1.15 ms, TI = 1,100 ms, FA = 7.0\(^\circ \) and BW = 651 Hz/Px on a Siemens Skyra 3T MRI Scanner (Siemens Medical Solutions, Erlangen, Germany).

The preprocessing of dataset was conducted as follows. To convert natural images of ILSVRC2012 into grey scale images, we computed the luminance (Y) component from the RGB values of each pixel (Y =  0.2125 R + 0.7154 G + 0.0721 B) [15] using scikit-image, which is a well-known python package for image processing [16]. After the conversion, the intensity values were normalized into a range of 0 to 1. To guarantee consistency, the intensity of HCP brain images was also normalized into a range of 0 to 1. Then, data in the image domain was synthesized into k-space data using Cartesian or radial trajectories. To synthesize radial k-space data, the numbers of radial views (\(n_{v}\)) and readout samples (\(n_{p}\)) were 128 and 128, respectively. To synthesize radial k-space data, we used Berkeley Advanced Reconstruction Toolbox (BART) in MATLAB [17]. All images were resized as \( {128\times 128} \) to synthesize the k-space data \( X \in \mathbb {C}^{w\times h}\) with \( w=128 \) and \( h=128 \) in Cartesian or the k-space data \( X \in \mathbb {C}^{n_{v}\times n_{p}}\) with \( n_{v}=128 \) and \( n_{p}=128 \) in radial.

In addition, low pass filtered (LPF) k-space dataset was generated from the HCP MR images and used as an input of the proposed network, which was not included in the training dataset, to show the characteristics of the network and to compare it with 2D fast FT (FFT). LPF k-space data was generated by masking the high frequency of original k-space using the rectangular mask with a size of \({w/2 \times h/2}\).

2.3 Quantitative Evaluation

For quantitative evaluation, we deploy the normalized mean square error and structural similarity index [18] defined as follows.

$$\begin{aligned} nMSE = \frac{ \sum _{i}^{I} \sum _{i}^{I} \left[ x_{i,j}^{GT} - x_{i,j}^{PRED} \right] ^{2} }{\sum _{i}^{I} \sum _{i}^{I} \left[ x_{i,j}^{GT} \right] ^{2}}, \end{aligned}$$
(3)
$$\begin{aligned} SSIM=\frac{ \left( 2\mu _{x}\mu _{y}+C_{1} \right) \left( 2\sigma _{xy}+C_{2} \right) }{ \left( \mu _{x}^{2}+\mu _{y}^{2}+C_{1} \right) \left( \sigma _{x}^{2}+\sigma _{y}^{2}+C_{2} \right) }, \end{aligned}$$
(4)

where \(x_{i,j}^{GT} \) is the ground truth image and \(x_{i,j}^{PRED} \) is the predicted image. Small constants \(C_{1}\) and \(C_{2} \) are included to avoid instability when the denominator is very close to zero.

3 Results

In Fig. 2, selected images from the entire dataset are presented for comparison, where the left column shows the ground truth images and the middle column shows the predicted results of the corresponding images using the proposed network. The difference map between the ground truth and the predicted image is also presented in the right column with amplified intensity (\({\times 5}\)) for viewing purposes.

Fig. 2.
figure 2

Comparison of ground truth, prediction, and difference images. Upper and lower rows show the 135th and 185th slices of selected MR data of HCP, respectively. Left Ground truth images Middle Prediction images Right Difference between ground truth and predicted image. Magnitude of difference image is amplified with a factor of five.

Fig. 3.
figure 3

Reconstruction of LPF dataset. Upper and lower rows shows the 135th and the 185th slices, respectively. Left Ground truth images Middle Inverse FFT images of LPF dataset Right Prediction results of LPF dataset.

As the proposed network is trained to conduct inverse FFT, we performed image reconstruction from the LPF k-space data to indirectly show the characteristics of the proposed network. Figure 3 shows the ground truth images (left column), images reconstructed using the inverse FFT (middle column), and the images reconstructed by the proposed network (right column) from the LPF k-space data. Although LPF data was not included in the training dataset, the predicted images showed blurring, which is similarly shown in the images reconstructed by inverse FFT. Thus, we speculate that the proposed network conducted similar tasks as the inverse FFT.

The mean and standard deviation of the quantitative evaluation values calculated from the entire slices of a single subject are nMSE = 0.3399 and SSIM = 0.9583. The selected 135th slice image presented in the previous figure shows nMSE of 0.4053 and SSIM of 0.9543, meaning that the selected slice represents below-average quality than the rest of the data used in this experiment. For the 185th slice, nMSE of 0.2238 and SSIM of 0.9758 were measured, representing above-average image quality.

To verify the feasibility of the proposed network for non-Cartesian trajectories, we trained and tested our network with synthesized radial data. As shown in Fig. 4, the images reconstructed by the proposed method are blurred and the edges of WM and GM are ambiguous. However, this result suggest that the proposed method may be used to end-to-end reconstruction of non-Cartesian k-space with improvements. Thus, further investigation on training process and optimization of network should be conducted to increase the quality of reconstructed images and to reduce blurring artifacts.

4 Discussion

In many deep learning techniques for image processing and understanding, CNN is preferred for the weight sharing characteristics of convolution kernel, not to mention the availability of highly optimized CNN structures. Especially in image to image tasks, the convolution kernel with locally bounded receptive field can be efficiently used. However, CNN may not be appropriate for end-to-end MR reconstruction tasks because k-space signals are integrated sum of MR signals from the entire excitation volume in MRI. For this reason, we proposed an end-to-end MR image reconstruction method based on the ReNet [9], which utilized recurrent neural network. In the proposed network, the recurrent cell has a role of collecting and holding the forward and reverse scan information using bidirectional sweeps. By utilizing information from both directional sweeps, it combines the current input and the hidden states to produce the output. As the 2D Fourier transform can be independently performed for each orthogonal axis, we designed our architecture to perform the reconstruction by sequentially using two RNNs. In the original architecture of ReNet, the recurrent cell takes input from the corresponding point or patch. However, point-by-point input data in k-space may disturb the network learning process because the magnitude of k-space data is highly unbalanced and the information is clustered in the central k-space. Thus, we modified the original architecture of ReNet, so that the recurrent cell can take input from the entire vector of the orthogonal axis to time step axis.

With respect to computation loads, the number of parameters for weights and biases in the proposed architecture is much less compared to AUTOMAP. In the custom-built version of AUTOMAP [1] architecture with an input dimension of \(X\in \mathbb {R}^{2 \times n^{2}} \) for \({n=80}\) the total number of parameters was 163,982,336. If the input dimension is increased from 80 to 128 (as in the paper), the number of parameters between the input and the first fully connected layer can be calculated as 536,887,296. Considering that this is only for the first layer of the whole architecture, it is a demanding requirement. On the other hand, the proposed network requires 79,725,073 parameters for \({128 \times 128} \) Cartesian k-space input of \(X\in \mathbb {C}^{128 \times 128} \) and 79,733,777 parameters for \({128 \times 128} \) radial input of \(X\in \mathbb {C}^{128 \times 128} \).

As shown in Figs. 2 and 3, predicted images show the horizontal and vertical stripe pattern artifacts, and dark regions in frontal regions. We speculate that the stripe pattern artifacts were caused by the recurrent cell used in our network architecture, and dark regions were caused by insufficient amount of training data and lack of augmentation. Thus, increasing the amount of data, improvement of augmentation algorithm, and use of more advanced activation functions, such as leaky ReLU [19] may reduce these artifacts.

Fig. 4.
figure 4

Reconstructed images from radial sampling. Upper and lower rows show the 135th and the 185th slices, respectively. Left Ground truth images Middle Prediction results of radial dataset Right Difference between ground truth and predicted image. Magnitude of difference image is amplified with a factor of five.

In conclusion, we proposed a new end-to-end MR image reconstruction technique based on recurrent deep learning architecture. As ReNet can be used with a reduced number of training parameters, it can be used for reconstruction of images with higher resolution. In this study, preliminary results were presented to show the feasibility of ReNet for end-to-end MR image reconstruction tasks. Thus, further investigation should be conducted to improve image quality and to extend the proposed method for under-sampled k-space data and for other non-Cartesian trajectories.