Elsevier

Medical Image Analysis

Volume 84, February 2023, 102686
Medical Image Analysis

Co-learning of appearance and shape for precise ejection fraction estimation from echocardiographic sequences

https://doi.org/10.1016/j.media.2022.102686Get rights and content

Highlights

  • A whole pipeline of precise EF estimation from echocardiograph sequences.

  • Accurate full sequence segmentation using sparsely labeled sequences.

  • Evaluations of segmentation, tracking, parameters estimation & temporal consistency.

Abstract

Accurate estimation of ejection fraction (EF) from echocardiography is of great importance for evaluation of cardiac function. It is usually obtained by the Simpson’s bi-plane method based on the segmentation of the left ventricle (LV) in two keyframes. However, obtaining accurate EF estimation from echocardiography is challenging due to (1) noisy appearance in ultrasound images, (2) temporal dynamic movement of myocardium, (3) sparse annotation of the full sequence, and (4) potential quality degradation during scanning.

In this paper, we propose a multi-task semi-supervised framework, which is denoted as MCLAS, for precise EF estimation from echocardiographic sequences of two cardiac views. Specifically, we first propose a co-learning mechanism to explore the mutual benefits of cardiac segmentation and myocardium tracking iteratively on appearance level and shape level, therefore alleviating the noisy appearance and enforcing the temporal consistency of the segmentation results. This temporal consistency, as shown in our work, is critical for precise EF estimation. Then we propose two auxiliary tasks for the encoder, (1) view classification to help extract the discriminative features of each view, and automatize the whole pipeline of EF estimation in clinical practice, and (2) EF regression to help regularize the spatiotemporal embedding of the echocardiographic sequence. Both two auxiliary tasks can improve the segmentation-based EF prediction, especially for sequences of poor quality.

Our method is capable of automating the whole pipeline of EF estimation, from view identification, cardiac structures segmentation to EF calculation. The effectiveness of our method is validated in aspects of segmentation, tracking, consistency analysis, and clinical parameters estimation. When compared with existing methods, our method shows obvious superiority for LV volumes on ED and ES phases, and EF estimation, with Pearson correlation of 0.975, 0.983 and 0.946, respectively. This is a significant improvement for echocardiography-based EF estimation and improves the potential of automated EF estimation in clinical practice. Besides, our method can obtain accurate and temporal-consistent segmentation for the in-between frames, which enables it for cardiac dynamic function evaluation.

Introduction

Left ventricle (LV) ejection fraction (EF), represents the percentage of blood pumped by the LV with each contraction, and is one of the most commonly measured cardiac metrics for the diagnosis of cardiovascular disease. Imaging techniques can be used to estimate EF by calculating the ratio of volume change during the contracting stage of a cardiac cycle. In clinical practice, B-Mode ultrasound is usually used as the first imaging technique to estimate EF due to its advantages of being real-time, radiation-free, and its low cost (Chen et al., 2020). Standard process of manual EF estimation from 2D ultrasound images includes the following steps: (1) view identification of the apical two-chamber (A2C) and four-chamber (A4C) sequences; (2) segmentation the endocardium (Endo) on the end-diastole (ED) and end-systole (ES) frames of both views; (3) calculating LV volumes on ED and ES (EDV and ESV) following Simpson’s biplane method (Folland et al., 1979), where LV is assumed to consist of multiple elliptical disks, and the two axes are the transverse diameters of Endo on A2C and A4C views; (4) calculating LV volumes on ED and ES (EDV and ESV) phases, and obtain EF by (EDVESV)/EDV. The pipeline of EF prediction is shown in Fig. 1.

Manual EF calculation on cardiac ultrasound images requires contouring of the endocardium and is not only time-consuming but also less reliable (compared to MRI) due to the large noise and low contrast of the ultrasound images. The reported inter-observer and intra-observer agreements are only 0.801, 0.896 respectively (Leclerc et al., 2019b). Therefore, exploring fully-automatic and effective EF estimation method is of great importance, not only for obtaining reliable results, but also for improving the efficiency of EF estimation in clinical practice.

Great challenges exist for these tasks, which hinder the development of automatic EF estimation. Firstly, in ultrasound images, the boundary between cardiac structures and the background is fuzzy and of low contrast. The presence of papillary muscle also poses additional difficulty for the identification of endocardium. Robust spatial feature embedding and prior information are required for an accurate segmentation of the cardiac structures. Secondly, due to the movement of myocardium, experts usually need to visually inspect several consecutive frames to contour the border of the middle frame. Segmentation of the keyframes independently may lead to inconsistent segmentations, and thus inferior EF estimation. The temporal dynamics of myocardium movement need to be well explored to obtain temporal consistent results, where the trend of LV size of consecutive frames matches the movement of the myocardium (dilating or contracting). During the systolic phase, the LV size should decrease monotonically, and vice versa. Thirdly, since EF is mainly related to the keyframes, and annotation of the full sequences takes too much workload, it is often the case that the echocardiographic sequences are usually labeled sparsely. The automatic model needs to learn an accurate segmentation model from these sparsely labeled sequences. Fourthly, the non-standard scanning procedure and improper imaging parameters may lead to the quality degradation of echocardiographic sequences.

Methods for automated EF estimation can be completed by a two-stage procedure: LV segmentation from the apical two-chamber (A2C) and four-chamber (A4C) views and EF estimation by the Simpson’s biplane method (Folland et al., 1979). According to whether the temporal information is employed, we categorize them into spatial features modeling and spatial–temporal modeling.

Previous studies on LV segmentation include traditional ones like active contour models (Kass et al., 1988, Barbosa et al., 2011), edge detection (Dollár and Zitnick, 2014), level set (Li et al., 2005, Yang et al., 2017), and atlas (Zhuang et al., 2010, Dong et al., 2020), and those based on deep neural network (Kabani and El-Sakka, 2017, Abdelmaguid et al., 2018, Zhang et al., 2018, Zheng et al., 2018, Jafari et al., 2019, Leclerc et al., 2020, Leclerc et al., 2019b, Zhou et al., 2018, Oktay et al., 2017, Ronneberger et al., 2015, Newell et al., 2016, Leclerc et al., 2019a, Oktay et al., 2018).  Jafari et al. (2019) proposed an FCN for multi-task learning of LV segmentation and landmark detection. Tan et al. (2016) used two convolutional neural networks to localize LV and determine the endocardial radius, respectively. Zheng et al. (2018) utilized the FCN to propagate the spatial information for 3D consistent ventricle segmentation. There are also works that direct regress EF from hand-crafted features of cardiac images (Wang et al., 2014, Zhen et al., 2014, Zhen et al., 2017, Gu et al., 2018), or their feature embeddings in deep neural network (Zhen et al., 2016, Xue et al., 2017, Luo et al., 2017, Luo et al., 2020). The regression models are based on random forests (Zhen et al., 2014, Zhen et al., 2016), bayesian classifier (Wang et al., 2014), and neural network (Xue et al., 2017, Luo et al., 2017, Luo et al., 2020, Xue et al., 2021).

Of these methods, the traditional ones cannot capture effective representations of cardiac structures from ultrasound images that are noisy and of low contrast. The latter ones that based on deep neural network usually process each frame independently, thus are not capable of modeling the temporal dependency between neighboring frames, which is important for learning effective features that are responsible for the expanding and contracting of myocardium. We have found that a prerequisite for precise EF estimation is to preserve the temporal consistency of the segmentation for all frames when only the ED and ES frames are annotated (Wei et al., 2020).

To improve segmentation with the temporal information in the cardiac sequences, previous studies mainly use optical flow (OF) (Jafari et al., 2018, Qin et al., 2018, Yan et al., 2018, Chen et al., 2019) and recurrent neural network (RNN) (Savioli et al., 2018, Du et al., 2019, Li et al., 2019) for temporal modeling. OF was utilized to extract the relative motion vector between successive frames. Then the learned motion vector was used as a complementary input to the neural networks (Jafari et al., 2018, Yan et al., 2018), or to propagate labels (Pedrosa et al., 2017, Chen et al., 2019). Joint learning of motion estimation and segmentation with the same encoder and separated heads was proposed in Qin et al. (2018), revealing the benefits of joint representation learning as a regularizer of the segmentation task. Besides OF, Li et al. (2019) utilizes hierarchical convolution LSTM-based RNN for spatial–temporal feature embedding. Temporal information was also explored in the task of direct regression of EF and other related cardiac measurements by OF Li et al. (2020), RNN (Xue et al., 2017, Xue et al., 2018, Li et al., 2020) and 3D convolution (Behnami et al., 2019).

Although temporal information helped improve the segmentation/ regression performance in these methods, the improvements are very limited. The underlying reason for these limited benefits can be attributed to (1) they did not explicitly enhance the temporal consistency, neither for the prediction results, nor for the embedding features; (2) the non-supervised OF model from the noisy echocardiographic sequences with low contrast cannot lead to accurate motion information.

To settle the above mentioned challenges, we first propose a semi-supervised method based on Co-Learning of the tasks of segmentation and tracking on Appearance level and Shape level, which is denoted as CLAS, to fully explore the temporal dynamics of the echocardiographic sequences and make use of the abundant unlabeled frames. The proposed co-learning strategy is able to explore the mutual benefits of cardiac segmentation and tracking iteratively on appearance and shape level. The tracking task helps improve the temporal consistency of the segmentation results, while the segmentation helps improve tracking accuracy. The preliminary results have been reported in MICCAI 2020 (Wei et al., 2020).

In this work, we further advance our preliminary work by (1) providing a automatic solution for the whole pipeline of EF estimation, from automatic view identification, to full sequence segmentation of endocardium, epicardium and LA, and then to EF calculation using the bi-plane method; (2) further regularizing of the feature embedding with two auxiliary tasks, view classification and single-view EF regression, resulting the multi-task version of CLAS (MCLAS), to improve the accuracy of EF estimation; and (3) providing more comprehensive evaluation of the methods and validating their advantages in temporal consistency and clinical parameters estimation.

To summarize, the contributions of this work include:

  • We propose a novel semi-supervised co-learning mechanism for segmentation and tracking on appearance and shape levels, denoted as CLAS, for sparsely labeled sequences. This mechanism can make use of the unlabeled intermediate frames and iteratively improves the segmentation and tracking of the cardiac structures, and therefore enhance the temporal consistency of the whole sequence segmentation, which is critical for precise EF estimation.

  • We introduce the view classification and single-view EF regression into CLAS, and result in a multi-task method MCLAS, which can (1) provide an automatic solution for the whole pipeline of EF estimation; and (2) further improve the performance of EF estimation by regularizing the feature embedding of the encoder.

  • We conduct comprehensive experiments to show the effectiveness in aspects of myocardium tracking, temporal consistency analysis, segmentation, and EF estimation. We design three novel consistency measurements to quantitatively analyze the segmentation results of the whole sequence.

  • We validate MCLAS on a public dataset of 500 patients and show its advantages over all existing methods in terms of cardiac structure segmentation, clinical parameters estimation, and their temporal-consistency. MCLAS achieves the best performance for estimation of EDV, ESV and EF, with Pearson correlation of 0.975, 0.983 and 0.946, respectively.

The rest of this paper is organized as follows. Section 2 describes the proposed framework of MCLAS, including the network architecture, CLAS for temporal-consistency enhancement, and the two auxiliary tasks. Section 3 presents the details of the dataset and experiment configurations. Section 4 demonstrates the experimental results in aspects of segmentation, tracking, and EF estimation, and conducts a comprehensive analysis of how our method works. Section 5 concludes the paper.

Section snippets

Method

In this work, we aim to achieve a precise estimation of EF from two echocardiographic sequences of A2C and A4C views. To achieve this, an important prerequisite is accurate and temporal-consistent segmentation of Endo for the ED and ES frames. In this section, we first present the network architecture of our framework. Then, the appearance-level and shape-level co-learning mechanism is described. After that, we introduce two auxiliary tasks, the view classification, and the direct single-view

Dataset

We use an open dataset/challenge CAMUS (Leclerc et al., 2019b) to evaluate the proposed method. CAMUS was acquired at the University Hospital of St Etienne (France) using GE Vivid E95 ultrasound scanners with a GE M5S probe. It consists of 500 subjects’ 2D echocardiographic sequences of apical two-chamber (A2C) and four-chamber (A4C) views from ED to ES, with each sequence 10–40 frames. ED and ES phases were selected as the frames where the LV size was largest or smallest. The dataset was split

Results and analysis

In this section, the performance of the proposed MCLAS is validated from the following aspects: (1) the segmentation for multiple cardiac structures, i.e., LV Endo, LV epicardium (Epi) and LA; (2) the tracking of the motion field; (3) the temporal consistency of the segmentation across the whole sequences; (4) the estimation of EDV, ESV and EF; (5) single-view EF regression; and (6) view classification.

Conclusion

In this paper, we proposed a co-learning mechanism CLAS to enhance the temporal consistency of cardiac segmentation in echocardiographic sequences given only the labels of ED and ES frames for training. Our CLAS is capable of leveraging the mutual benefits of segmentation and tracking from appearance and shape of the heart images, therefore enhancing the temporal consistency of the segmentation results, and clearly improving the estimation of the clinical parameters. CLAS is further improved

CRediT authorship contribution statement

Hongrong Wei: Conceptualization, Methodology, Validation, Formal analysis, Investigation, Data curation, Writing – Original Draft. Junqiang Ma: Validation. Yongjin Zhou: Writing – review & editing. Wufeng Xue: Conceptualization, Methodology, Resources, Writing – review & editing, Supervision, Funding acquisition. Dong Ni: Resources, Writing – review & editing, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The paper is partially supported by Natural Science Foundation of China under Grants 61801296 and 62171290, the Shenzhen Basic Research, China JCYJ20190808115419619, Shenzhen Peacock Plan, China (No. KQTD2016053112051497, KQJSCX20180328095606003), Medical Scientific Research Foundation of Guangdong Province, China (No. A2021370).

References (48)

  • ZhenX. et al.

    Direct and simultaneous estimation of cardiac four chamber volumes by multioutput sparse regression

    Med. Image Anal.

    (2017)
  • AbdelmaguidE. et al.

    Left ventricle segmentation and volume estimation on cardiac mri using deep learning

    (2018)
  • BalakrishnanG. et al.

    VoxelMorph: a learning framework for deformable medical image registration

    IEEE Trans. Med. Imaging

    (2019)
  • BarbosaD. et al.

    B-spline explicit active surfaces: an efficient framework for real-time 3-D region-based segmentation

    IEEE Trans. Image Process.

    (2011)
  • BehnamiD. et al.

    Dual-view joint estimation of left ventricular ejection fraction with uncertainty modelling in echocardiograms

  • ChenS. et al.

    TAN: Temporal affine network for real-time left ventricle anatomical structure analysis based on 2D ultrasound videos

    (2019)
  • DollárP. et al.

    Fast edge detection using structured forests

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2014)
  • FollandE. et al.

    Assessment of left ventricular ejection fraction and volumes by real-time, two-dimensional echocardiography. a comparison of cineangiographic and radionuclide techniques

    Circulation

    (1979)
  • JafariM.H. et al.

    A unified framework integrating recurrent fully-convolutional networks and optical flow for segmentation of the left ventricle in echocardiography data

  • JafariM.H. et al.

    Automatic biplane left ventricular ejection fraction estimation with mobile point-of-care ultrasound using multi-task learning and adversarial training

    Int. J. Comput. Assist. Radiol. Surg.

    (2019)
  • KabaniA. et al.

    Ejection fraction estimation using a wide convolutional neural network

  • KassM. et al.

    Snakes: Active contour models

    Int. J. Comput. Vis.

    (1988)
  • LeclercS. et al.

    RU-Net: A refining segmentation network for 2D echocardiography

  • LeclercS. et al.

    LU-Net: a multi-stage attention network to improve the robustness of segmentation of left ventricular structures in 2D echocardiography

    IEEE Trans. Ultrason. Ferroelectr. Freq. Control

    (2020)
  • Cited by (6)

    View full text