1 Introduction

Short axis cine (SAX) is the primary cardiac magnetic resonance imaging (cMRI) protocol for assessing cardiac function, but relies on electrocardiogram (ECG) gating and breath holds during acquisition. The protocol requires the subject to lie still, restricting it to acquisitions at rest, and to perform breath holds, which not all patients can do due to pathology or age. Free breathing non-gated real-time cine (RT) is an alternative protocol which has recently been proposed as a solution to the above problem [10, 11]. This method acquires images over multiple cardiac cycles at the same locations in the left ventricle (LV) as SAX, both eliminating the need for patient cooperation during breath holds and allowing imaging during movement and distorted ECG signals, eg. during exercise. A less demanding scanning sequence such as RT greatly expands the number of patients who can be scanned, and using RT to assess cardiac function during exercise has its own diagnostic potential [3]. Developing a methodology for improving the quality of RT imaging would therefore provide a new and effective tool for clinicians.

However the RT series suffers from poor image quality, in terms of greater noise and artefacts, compared to SAX sequences, as well as patient motion from respiration and minor body shifts. The images of this series are thus neither aligned in space nor in time as each plane starts acquisition at arbitrary points in the cardiac cycle. This significantly hinders clinical applicability, as through-plane motion of the heart results in significant variability in quantified volumes. In order to approach clinically acceptable standards, these issues need to be compensated for, and to calculate biomarkers with comparable accuracy to conventional cine, a motion corrected ECG gated cine image series should be reconstructed. Specifically images must be retained at end-expiration only to eliminate geometric and position variation, and these must be aligned so that the LV region in each image is correctly aligned in space.

In this paper we propose a workflow for accomplishing this reconstruction using convolutional neural networks (CNN) for automatic segmentation of the LV, manifold learning for estimating respiratory cycles, and an image processing pipeline which uses the information from these two sources to create a single-cycle image series. Our workflow uses automatically generated segmentations alone for determining cardiac cycle position for images. This contrasts with approaches in [1, 13] which estimate the R-R interval using the image frequency domain and then determine which images are end-diastole based on semi-automated segmentation, and other techniques including [5] which use stable ECG signals for reconstruction. Furthermore our method does not require complex k-space reconstruction techniques [4] which are not widely available in clinical practice. Analysis techniques used to assess SAX series can then be applied to this reconstructed series to calculate biomarkers.

Our novel contribution is a workflow which wholly automates the process of reconstructing a single-cycle image series from the input RT series, eliminating both the need for manual segmentation and inter-observer variability. This contrasts with previous work which involved manual input in the reconstruction pipeline [13]. The next sections will describe the steps in the workflow in detail, present results of applying the workflow to volunteer images, and then discuss the further work necessary to refine the technique.

Fig. 1.
figure 1

Outline of the image processing workflow

2 Methodology

The automated workflow shown in Fig. 1 transforms a RT series spanning multiple cardiac cycles, containing images neither aligned in space nor corrected for respiration, into a single-cycle series corrected for spatial alignment and containing only end-expiration images. The input image series is a 4-dimensional image volume storing 3D image volumes capturing cardiac geometry over time (3D+t). The output image is also 4D, but contains only a single cardiac cycle, removing the respiratory variability and correctly aligning the sequence to reconstruct an image similar to the standard SAX sequence. The steps in this workflow are outlined here in brief and expanded in the following subsections:

  1. 1.

    Data Import: Image data is imported from the Dicom files and converted to Nifti format using Eidolon [8]. This is the most convenient and reliable tool for this step although any Nifti file from other sources is acceptable.

  2. 2.

    Estimate Respiration: Manifold learning is used to estimate the position of each image in the respiratory cycle, and images not at end-expiration are rejected.

  3. 3.

    Segment Images: A neural network trained to segment the myocardium of the left ventricle is applied to each image, providing a myocardial mask.

  4. 4.

    Select Segments: Each mask is assessed for correctness and quality to remove malformed segmentations outside the LV region.

  5. 5.

    Assign Bins: All images not rejected in the previous steps are assigned to a position in the cardiac cycle. Firstly, images at end-diastole are identified by measuring the pool volumes of their myocardial masks. Other images are then assigned to bins representing phases of the cardiac cycle using the time differences between the identified end-diastole images at the start and end of that cardiac cycle.

  6. 6.

    Reconstruct Series: A reconstructed series is created by selecting an image from each cycle position for every slice position.

  7. 7.

    Calculate Volumes: The reconstruction is also applied to the myocardial mask associated with each image to produce a single-cycle segment series which can be used to calculate volume properties.

Fig. 2.
figure 2

Topology of the segmentation network.

Fig. 3.
figure 3

Example augmented image/mask pairs from the training dataset.

2.1 Deep Learning Segmentation

Our technique relies on a convolutional neural network trained to segment LV myocardial tissue in real-time images. The network architecture (Fig. 2) is based on U-Net [12, 16] using residual units [7] and is implemented in Tensorflow. Each numbered section of the encoding phase of the network is implemented using the residual unit definition on the left, where the number indicates the output number of filters. Upsampling in the decoding phase is done with a transpose convolution followed by a single set of batchnorm-prelu-conv2D operations.

Residual units use Parametric Rectified Linear Unit (PReLU) [6] for activation and are defined with 4 series of batchnorm-prelu-convolution operations. Downsampling in the encoding phase of the network is done using striding on the first convolution of the residual unit.

The training data for the network consists of RT series images from 10 volunteers segmented manually by a clinician resulting in a dataset of 3153 greyscale image/mask pairs. Figure 3 shows a selection of augmented image/mask pairs taken from the training dataset. During training, data augmentation [9, 15] is applied to the images to ensure the model does not overfit to the dataset. Specifically, a random selection of rotation, transposition, flip, shift, and zoom operations are applied to each image/mask pair of a batch. Segmenting RT imaging is particularly challenging as RT image quality is inferior to conventional gated cine. Our proposed approach and the network architecture have been specifically chosen and optimised for this task.

Figure 4 illustrates a set of segmentations for a real-time image series after malformed segmentations are removed in the selection stage. The selection process is necessary since every image in the RT series is used for inference with the segmentation network. This includes images above and below the LV which are not meant to be segmented and so are not represented in the training set. When the network attempts to segment these the results are not well-formed circular segmentations, and must be identified and filtered out. Secondly, during the respiration cycle the LV may deform or move through the image plane sufficiently to distort the image and produce poor segmentations. The expectation is that the RT image will capture enough cardiac cycles that there will be a well-segmented representative somewhere to reconstruct a single cycle.

Fig. 4.
figure 4

Example automatically generated segmentation

2.2 Manifold Learning

To account for respiratory motion we select the frames at end-expiration, which is the current standard for assessment for cardiac volumes. Here we use Laplacian Eigenmaps (LE) [2] to automatically estimate the respiratory state in each slice in the image stack. LE is a manifold learning technique which embeds the data into a low-dimensional space while preserving local neighbourhood relations by finding the most significant modes of variation along the manifold on which the original data lies. When applied to a sequence of cine cardiac images, we find that this variation consistently corresponds to the respiratory cycle.

Using each image’s coordinates in the low-dimensional embedding we assign a true or false value stating whether each image is within some specified margin of the end-respiration state. MR images, rather than myocardial masks, are used at this step as regions outside the heart also contain information about the respiratory state and help in this categorisation.

2.3 Image Processing

In addition to Eidolon and Tensorflow, our workflow uses Python with the Numpy, Scipy, and Nibabel libraries for array types, scientific functions, and Nifti loading respectively. Our workflow is implemented as a Jupyter notebook which guides the user through the process of applying the operations. This is available as open source on our repository at https://github.com/ericspod/RealtimeReconWorkflow.

Select Segments. In this stage each 2D image is analysed individually to filter out bad segmentations. Due to image quality and subject motion not all images present usable cardiac geometry and thus cannot be correctly segmented. A segmentation is rejected if it does not have exactly one cavity (i.e. the LV chamber), if it contains fewer than 100 pixels, or if the surface area of a convex hull enclosing the segmentation is at least 15% larger than the area of the segmentation including pool. This ensures that only a segmentation representing a relatively smooth annulus is accepted. The value of 100 pixels for minimal size has been found in our analysis to be a rational rejection criteria for images of 256\(\,\times \,\)256 pixels, and similarly the 15% size criteria has been found to reject highly irregular segmentations.

Assign Bins. At this stage images at end-expiration have been selected using manifold learning, these are then assigned to a position in the cardiac cycle based on the segmentation areas. Each image with a segmentation is assigned the surface area of their segmentation’s pools. Images with areas larger than its non-zero neighbours are assumed to be as ED, and so an identifiable cardiac cycle is a list of contiguous images in time, which all have surface areas, from one of these maximal images up to the but not including the next. These individual cycles must be a certain length to be accepted, if so each image in the cycle is assigned a value representing their position in the cycle from 0 to 1.

The cycles thus defined will vary in length and so the values assigned to each image do not all fall easily into obvious bins. A histogram is computed for the percentages to determine what the bins should be such that no bin is empty, which then allows a bin number to be assigned to each image. Each bin represents a frame of a single cardiac cycle, where bin 0 is the ED frame. Bins contain many images, however with the filtering done for segmentation quality and respiration they should all represent the same geometry at the same moment in the cardiac cycle.

Reconstruct Series. With bins assigned to each image (excepting those rejected for poor segmentation or respiratory motion), reconstructing a short-axis like image becomes a process of selecting an image from each bin for each image plane. The number of bins has been chosen to ensure that each bin is represented by at least one image from each image plane, however typically there will be many images to choose from at each plane per bin. Our current strategy is to select, for particular plane and bin, the image whose segmentation’s pool has a surface area closest to the mean of all those in its bin. The time offset between frames is taken from the original images and if this value correctly represents the per-image time since ED this will be preserved in the final output image.

Having selected the images for each slice at each timestep, the same process is applied to the segmentations to produce a second segmentation series. The average of the image centroids from this series is calculated, then each image in the segment series and its corresponding image in the reconstructed SAX-like image are shifted in the XY plane to be centred at this position.

This produces an image and a segmentation series which accounts for patient movement during scanning which is not respiratory in origin, although aligning all images does produce non-physiological results since the left ventricle is not entirely conical but flatter on the septal side. For the purposes of various 2D assessment criteria, for example estimating ejection fraction, this does not impact results but does aid in observing wall motion abnormalities.

Calculate Volumes. Pool volumes are calculated by summing up the number of pixels in the pools of the segmentations. Volumes are then summed in the depth dimension then multiplied by the voxel volume of the original image to give a total per timestep volume.

3 Results

We scanned seven participants using the real-time protocol twice with a rest interval between scans at St. Thomas’ Hospital, London, UK, using a 1.5T Philips Ingenia MR scanner (Philips, Best, Netherlands). All participants were healthy volunteers with ages ranging from 20 to 32, 4 males and 3 females. Due to the length of the scan sequence a much larger dataset of 100 frames was acquired for each subject. These frames are significantly dissimilar even within one patient due to breathing and exercise motion. Therefore these datasets represent a larger training dataset than the number of participants would seem to imply. We did not consider clinical patient participants for this study since their morphological details will not vary significantly from healthy volunteers as observed in the used RT sequences.

Figure 5 shows an example of reconstructed CINE-like sequence and segmentations at end-diastole for all the slices. We can see that the slices are correctly aligned and segmentations allow for accurate assessment of volumes.

For a further validation, we compared the volumes obtained with the proposed reconstruction pipeline cine with a previously validated, non-gated real-time imaging protocol (non-gated RT) that uses manual selection of respiratory state [10]. Table 1 summarises the ED volume (EDV) and end-systolic volume (ESV) for the two methods.

The results obtained with the proposed automatic pipeline correlated well with the manual results from the non-gated MRI sequence. Visual inspection of this sort is standard clinical practice and thus our results are comparable to assessments considered sufficiently accurate for diagnostic purposes. From a clinical perspective, the differences between results are within expected ranges between inter-observer measurements in the reference standard [10, 13], and so are diagnostically valid.

Fig. 5.
figure 5

Example reconstructed CINE-like image series for all slices at ED.

Table 1. Median and inter-quartile range (iqr) of EDV and ESV for the proposed method and the non-gated MRI method, and differences.

4 Conclusion and Future Work

We have in this paper demonstrated a workflow for reconstructing a single-cycle image series from a real-time series. This is in multiple ways an extension of previous work on reconstruction of respiratory and cardiac gated cine SAX [13], whose method relied on manual input to determine the location of the heart. Furthermore, the previous method for estimating temporal and spatial motions of the images were obtained from changes in image contrast, resulting in sub-optimal alignment and blurring of the images due to inaccurate motion estimation.

Other pipelines have been proposed for cine reconstruction from ungated, free-breathing MR. However, these methods often rely on complex, self-tailored k-space acquisition and reconstruction schemes [4]. Whereas these techniques benefit from utilising raw data instead of reconstructed RT images for reconstruction of gated cine MR, such frameworks have limited clinical applicability, as they are not available in most clinical MR facilities. Our technique is available as standalone Python software or integrated into Eidolon, a free cardiac MR analysis software, and can therefore be implemented using a standard personal computer.

Due to the high acceleration factors needed for RT imaging, image quality is lower compared to conventional cine. Although this did not result in a significant impact on measurements of the most frequently obtained parameters (cardiac volumes and function), specific details and features of boundaries are less distinct and can possibly impact more comprehensive cardiac assessment. Our future research goals include investigating the use of autoencoders [14] as a method of improving image quality.

Our process of assigning images to cardiac phase bins relies the assignment of an ED image using the local maximal pool volume. This is not strictly true and so there is some approximation in how images are assigned to cardiac cycle position. This does not optimally utilise the RT information, as the independency of acquisition and cardiac frequency likely results in different offsets of images from ‘true’ ED between the different cycles acquired per slice. Exploitation of this knowledge will make it possible to create smaller (and thus more) bins and therefore increase temporal resolution of the reconstructed images.

With this work, we have shown a feasible technique for a complete automatic pipeline for the reconstruction of cardiac and respiratory gated cine MR from RT imaging and subsequent quantification of key parameters of cardiac function. When extended to include the full heart, including the basal part of the ventricle and the atria, this technique can be of great benefit in imaging of subjects that are unable to hold their breath or lie still, such as children or severely ill patients. Furthermore, as demonstrated our work allows imaging during diagnostic scans employing physical exercise.