Abstract
Recently, an end-to-end MR image reconstruction technique, called AUTOMAP, was introduced to simplify the complicated reconstruction process of MR image and to improve the quality of reconstructed MR images using deep learning. Despite the benefits of end-to-end architecture and superior quality of reconstructed MR images, AUTOMAP suffers from the large amount of training parameters required by multiple fully connected layers. In this work, we propose a new end-to-end MR image reconstruction technique based on the recurrent neural network (RNN) architecture, which can be more efficiently used for magnetic resonance (MR) image reconstruction than the convolutional neural network (CNN). We modified the RNN architecture of ReNet for image domain data to reconstruct an MR image from k-space data by utilizing recurrent cells. The proposed network reconstructs images from the k-space data with a reduced number of parameters compared with that of fully connected architectures. We present a quantitative evaluation of the proposed method for Cartesian trajectories using nMSE and SSIM. We also present preliminary images reconstructed from k-space data acquired in the radial trajectory.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
In magnetic resonance imaging (MRI), k-space data is acquired using a variety of MR sequences consisting of different radiofrequency and gradient pulses. Then, by transforming the frequency information of the k-space data into spatial information, an MR image can be reconstructed. Generally, 2D or 3D Fourier transform (FT) is used for reconstruction of an MR image because the k-space is typically scanned in a Cartesian trajectory. However, relying entirely on the FT may not be sufficient for certain applications of MRI such as non-Cartesian MRI, accelerated MRI, etc. In such cases, other iterative reconstruction techniques can be utilized to enhance the quality of MR images.
Recently, deep learning has been also applied in MRI to enhance the image quality [1,2,3,4,5,6,7]. Considering that MRI needs to consider multiple sources of errors due to its complex nature, including field inhomogeneity, eddy current effects, phase distortions, regridding, etc., an end-to-end reconstruction using deep learning could provide a unified solution and simplify the reconstruction process of MRI. Image reconstruction by domain-transform manifold learning (AUTOMAP) [1] was a state-of-the-art technique that performed end-to-end reconstruction. As each voxel of an MR image can be regarded as a weighted sum of the entire k-space, AUTOMAP is structured as multiple fully connected layers, requiring a large amount of weights and biases. As a result, the maximum size of input data is limited by the hardware specification that determines the amount of processable weights and biases. To resolve this issue, we adopt the bidirectional recurrent neural network (RNN) as an alternative to the fully connected layers [8, 9]. As a recurrent cell has a memory that stores states, it allows the previous information to be reflected in the output of the next time step without overloading the system.
In this work, the RNN extracts features for image reconstruction while sweeping the k-space both horizontally and vertically. In addition, by utilizing the sequentiality of a recurrent cell, the bidirectional RNN can decode the input k-space data into image data with a reduced number of weights and biases than the fully connected layers. In the following sections, we present a detailed explanation of the proposed recurrent neural network architecture and experiment results.
2 Method
2.1 Network Architectures
The proposed architecture was designed by benchmarking the ReNet architecture, which was proposed as an alternative to the use of convolutional neural network (CNN) (Fig. 1). The ReNet replaces the convolution and pooling layer of the deep CNN with four recurrent neural networks that sweep both horizontally and vertically in forward and backward directions across the images. In ReNet [9], the relationship between input patch \( P=\left\{ p_{i,j} \right\} \), state \( z_{i,j}^{F}, z_{i,j}^{R} \), and feature map \( V=\left\{ v_{i,j} \right\} _{i=1,...,I}^{j=1,...,J} \), where \(v_{i,j}\in \mathbb {R}^{2d} \) and d is the number of recurrent units, are defined as follows.
In this work, we modified the architecture of the original ReNet with the consideration of the characteristics of MR k-space data, thereby improve stability and convergence of network for MRI reconstruction application. In MRI, most of the high frequency values in k-space are close to zero, preventing the network from properly updating weights and biases. Similarly, the marginal areas of MR images tend to have zero-values, which is also inadequate for training. Thus, instead of using single points as an input, single columns (or rows) are used as an input to the recurrent layer. For input format of the network, complex values of the k-space data having a matrix size of \( {w\times h}\ (X\in \mathbb {C}^{w\times h} )\) can be processed as two sets of real-value data, which can be denoted as \( X=\left\{ x_{i,j,k} \right\} \) for \( X\in \mathbb {R}^{w\times h\times 2} \), where w is size of width (\({k_{x}}\)) and h is size of height (\({k_{y}}\)). Then, the output vectors can be defined as follows,
where \( s_{\cdot ,j} \) represents hidden states of the recurrent layer in the \( j_{th} \) time step and \( n_{hidden} \) represents the number of hidden units in the recurrent layer. J and I represent the sizes of the time step axis and its orthogonal axis, respectively, of the input to the recurrent layer. The output size of the recurrent layer should be \( 2 \times n_{hidden} \) because of the bi-directionality.
In the proposed network, the input data initially flows into the recurrent layer which has a time step axis along the horizontal bi-directions. The output of the horizontal recurrent layer \( O_{horizontal} \in \mathbb {R}^{w\times 2 \times n_{hidden} } \) is serialized into the same size as the input and depth dimensions. Then, the serialized output of the recurrent layer with horizontal time step axis flows into the second recurrent layer, which has a time step axis along the vertical bi-directions. The output of the recurrent layer with vertical time step axis \( O_{vertical} \in \mathbb {R}^{h\times 2 \times n_{hidden} } \) is also serialized into the same size as the input and depth dimensions. Finally, the convolutional layer combines the data along the depth dimensions into a single data to generate a reconstructed image. We deploy a mean-squared-error between the regressed image and the ground truth image as a loss function of the proposed network and choose an optimizer RMSProp [10] with a learning rate of 0.002, momentum of 0.0, and decay of 0.9. Additionally, the learning rate annealing with a step decay is also applied. If a difference of loss values between the current epoch and the one before three epochs is smaller than the pre-determined threshold value, the learning rate is reduced to a predefined ratio.
2.2 Training Environment
2.2.1 Hardware.
The proposed architecture was implemented using Tensorflow [11] based Keras [12] and processed by Intel(R) Core(TM) i5-8400 2.80 GHz CPU and NVIDIA Geforce GTX 1080-Ti GPU. Training the weights and biases takes approximately one hour per each epoch for 50,000 samples.
2.2.2 Dataset.
50,000 image samples of ILSVRC2012Â [13] and ten MPRAGE images (3D-array samples) of HCPÂ [14] datasets were used for training, validating, and testing. In addition to MR dataset (i.e., HCP images), we also included images of ILSVRC2012 as pre-training dataset to train the network architecture with input images having more diverse image characteristics. The T1-weighted MPRAGE images in HCP dataset were acquired with following parameters. TRÂ =Â 2,530Â ms, TEÂ =Â 1.15Â ms, TIÂ =Â 1,100Â ms, FAÂ =Â 7.0\(^\circ \) and BWÂ =Â 651Â Hz/Px on a Siemens Skyra 3T MRI Scanner (Siemens Medical Solutions, Erlangen, Germany).
The preprocessing of dataset was conducted as follows. To convert natural images of ILSVRC2012 into grey scale images, we computed the luminance (Y) component from the RGB values of each pixel (Y = 0.2125 R + 0.7154 G + 0.0721 B) [15] using scikit-image, which is a well-known python package for image processing [16]. After the conversion, the intensity values were normalized into a range of 0 to 1. To guarantee consistency, the intensity of HCP brain images was also normalized into a range of 0 to 1. Then, data in the image domain was synthesized into k-space data using Cartesian or radial trajectories. To synthesize radial k-space data, the numbers of radial views (\(n_{v}\)) and readout samples (\(n_{p}\)) were 128 and 128, respectively. To synthesize radial k-space data, we used Berkeley Advanced Reconstruction Toolbox (BART) in MATLAB [17]. All images were resized as \( {128\times 128} \) to synthesize the k-space data \( X \in \mathbb {C}^{w\times h}\) with \( w=128 \) and \( h=128 \) in Cartesian or the k-space data \( X \in \mathbb {C}^{n_{v}\times n_{p}}\) with \( n_{v}=128 \) and \( n_{p}=128 \) in radial.
In addition, low pass filtered (LPF) k-space dataset was generated from the HCP MR images and used as an input of the proposed network, which was not included in the training dataset, to show the characteristics of the network and to compare it with 2D fast FT (FFT). LPF k-space data was generated by masking the high frequency of original k-space using the rectangular mask with a size of \({w/2 \times h/2}\).
2.3 Quantitative Evaluation
For quantitative evaluation, we deploy the normalized mean square error and structural similarity index [18] defined as follows.
where \(x_{i,j}^{GT} \) is the ground truth image and \(x_{i,j}^{PRED} \) is the predicted image. Small constants \(C_{1}\) and \(C_{2} \) are included to avoid instability when the denominator is very close to zero.
3 Results
In Fig. 2, selected images from the entire dataset are presented for comparison, where the left column shows the ground truth images and the middle column shows the predicted results of the corresponding images using the proposed network. The difference map between the ground truth and the predicted image is also presented in the right column with amplified intensity (\({\times 5}\)) for viewing purposes.
As the proposed network is trained to conduct inverse FFT, we performed image reconstruction from the LPF k-space data to indirectly show the characteristics of the proposed network. Figure 3 shows the ground truth images (left column), images reconstructed using the inverse FFT (middle column), and the images reconstructed by the proposed network (right column) from the LPF k-space data. Although LPF data was not included in the training dataset, the predicted images showed blurring, which is similarly shown in the images reconstructed by inverse FFT. Thus, we speculate that the proposed network conducted similar tasks as the inverse FFT.
The mean and standard deviation of the quantitative evaluation values calculated from the entire slices of a single subject are nMSEÂ =Â 0.3399 and SSIMÂ =Â 0.9583. The selected 135th slice image presented in the previous figure shows nMSE of 0.4053 and SSIM of 0.9543, meaning that the selected slice represents below-average quality than the rest of the data used in this experiment. For the 185th slice, nMSE of 0.2238 and SSIM of 0.9758 were measured, representing above-average image quality.
To verify the feasibility of the proposed network for non-Cartesian trajectories, we trained and tested our network with synthesized radial data. As shown in Fig. 4, the images reconstructed by the proposed method are blurred and the edges of WM and GM are ambiguous. However, this result suggest that the proposed method may be used to end-to-end reconstruction of non-Cartesian k-space with improvements. Thus, further investigation on training process and optimization of network should be conducted to increase the quality of reconstructed images and to reduce blurring artifacts.
4 Discussion
In many deep learning techniques for image processing and understanding, CNN is preferred for the weight sharing characteristics of convolution kernel, not to mention the availability of highly optimized CNN structures. Especially in image to image tasks, the convolution kernel with locally bounded receptive field can be efficiently used. However, CNN may not be appropriate for end-to-end MR reconstruction tasks because k-space signals are integrated sum of MR signals from the entire excitation volume in MRI. For this reason, we proposed an end-to-end MR image reconstruction method based on the ReNet [9], which utilized recurrent neural network. In the proposed network, the recurrent cell has a role of collecting and holding the forward and reverse scan information using bidirectional sweeps. By utilizing information from both directional sweeps, it combines the current input and the hidden states to produce the output. As the 2D Fourier transform can be independently performed for each orthogonal axis, we designed our architecture to perform the reconstruction by sequentially using two RNNs. In the original architecture of ReNet, the recurrent cell takes input from the corresponding point or patch. However, point-by-point input data in k-space may disturb the network learning process because the magnitude of k-space data is highly unbalanced and the information is clustered in the central k-space. Thus, we modified the original architecture of ReNet, so that the recurrent cell can take input from the entire vector of the orthogonal axis to time step axis.
With respect to computation loads, the number of parameters for weights and biases in the proposed architecture is much less compared to AUTOMAP. In the custom-built version of AUTOMAPÂ [1] architecture with an input dimension of \(X\in \mathbb {R}^{2 \times n^{2}} \) for \({n=80}\) the total number of parameters was 163,982,336. If the input dimension is increased from 80 to 128 (as in the paper), the number of parameters between the input and the first fully connected layer can be calculated as 536,887,296. Considering that this is only for the first layer of the whole architecture, it is a demanding requirement. On the other hand, the proposed network requires 79,725,073 parameters for \({128 \times 128} \) Cartesian k-space input of \(X\in \mathbb {C}^{128 \times 128} \) and 79,733,777 parameters for \({128 \times 128} \) radial input of \(X\in \mathbb {C}^{128 \times 128} \).
As shown in Figs. 2 and 3, predicted images show the horizontal and vertical stripe pattern artifacts, and dark regions in frontal regions. We speculate that the stripe pattern artifacts were caused by the recurrent cell used in our network architecture, and dark regions were caused by insufficient amount of training data and lack of augmentation. Thus, increasing the amount of data, improvement of augmentation algorithm, and use of more advanced activation functions, such as leaky ReLU [19] may reduce these artifacts.
In conclusion, we proposed a new end-to-end MR image reconstruction technique based on recurrent deep learning architecture. As ReNet can be used with a reduced number of training parameters, it can be used for reconstruction of images with higher resolution. In this study, preliminary results were presented to show the feasibility of ReNet for end-to-end MR image reconstruction tasks. Thus, further investigation should be conducted to improve image quality and to extend the proposed method for under-sampled k-space data and for other non-Cartesian trajectories.
References
Zhu, B., Liu, J.Z., Cauley, S.F., Rosen, B.R., Rosen, M.S.: Image reconstruction by domain-transform manifold learning. Nature 555(7697), 487 (2018)
Han, Y., Ye, J.C.: k-space deep learning for accelerated MRI. arXiv preprint arXiv:1805.03779 (2018)
Hammernik, K., et al.: Learning a variational network for reconstruction of accelerated MRI data. Magn. Reson. Med. 79(6), 3055–3071 (2018)
Kwon, K., Kim, D., Park, H.: A parallel MR imaging method using multilayer perceptron. Med. Phys. 44(12), 6209–6224 (2017)
Schlemper, J., Caballero, J., Hajnal, J.V., Price, A.N., Rueckert, D.: A deep cascade of convolutional neural networks for dynamic MR image reconstruction. IEEE Trans. Med. Imaging 37(2), 491–503 (2018)
Yang, G., et al.: DAGAN: deep de-aliasing generative adversarial networks for fast compressed sensing MRI reconstruction. IEEE Trans. Med. Imaging 37(6), 1310–1321 (2018)
Quan, T.M., Nguyen-Duc, T., Jeong, W.K.: Compressed sensing MRI reconstruction using a generative adversarial network with a cyclic loss. IEEE Trans. Med. Imaging 37(6), 1488–1497 (2018)
Graves, A., Schmidhuber, J.: Offline handwriting recognition with multidimensional recurrent neural networks. In: Advances in Neural Information Processing Systems, pp. 545–552 (2009)
Visin, F., Kastner, K., Cho, K., Matteucci, M., Courville, A., Bengio, Y.: ReNet: a recurrent neural network based alternative to convolutional networks. arXiv preprint arXiv:1505.00393 (2015)
Tieleman, T., Hinton, G.: Lecture 6.5-RmsProp: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw. Mach. Learn. 4(2), 26–31 (2012)
Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. OSDI 16, 265–283 (2016)
Chollet, F., et al.: Keras (2015)
Deng, J., Berg, A., Satheesh, S., Su, H., Khosla, A., Fei-Fei, L.: ImageNet large scale visual recognition competition 2012 (ILSVRC 2012). Google Scholar (2012)
Fan, Q., et al.: MGH-USC human connectome project datasets with ultra-high b-value diffusion MRI. NeuroImage 124, 1108–1114 (2016)
Poynton, C.: Frequently asked questions about color (1997). Accessed 19 June 2004
Van der Walt, S., et al.: scikit-image: image processing in python. PeerJ 2, e453 (2014)
Uecker, M., et al.: Berkeley advanced reconstruction toolbox. In: Proceedings of International Society for Magnetic Resonance in Medicine, vol. 23, p. 2486 (2015)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Xu, B., Wang, N., Chen, T., Li, M.: Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853 (2015)
Acknowledgment
This work was partly supported by the Brain Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (NRF-2017M3C7A1047228).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Oh, C., Kim, D., Chung, JY., Han, Y., Park, H. (2018). ETER-net: End to End MR Image Reconstruction Using Recurrent Neural Network. In: Knoll, F., Maier, A., Rueckert, D. (eds) Machine Learning for Medical Image Reconstruction. MLMIR 2018. Lecture Notes in Computer Science(), vol 11074. Springer, Cham. https://doi.org/10.1007/978-3-030-00129-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-00129-2_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00128-5
Online ISBN: 978-3-030-00129-2
eBook Packages: Computer ScienceComputer Science (R0)