Abstract
Heart disease is the global leading cause of death. A key predictor of heart failure and the most commonly measured cardiac parameter is left ventricular ejection fraction (LVEF). Despite available segmentation technologies, experienced cardiologists often rely on visual estimation of LVEF for a swift assessment. In this paper, we present a direct dual-channel LVEF estimation approach that mimics cardiologists’ visual assessment for detecting patients with high risk of systolic heart failure. The proposed framework consists of various layers for extracting spatial and temporal features from echocardiography (echo) cines. A data set of 1,186 apical two-chamber (A2C) and four-chamber (A4C) echo cines were used in this study. LVEF labels were assigned based on risk of heart failure: high-risk for \(\text {LVEF}\le 40\%\) and low-risk for \(40\%<\text {LVEF}\le 75\%\). We validated the proposed framework on 237 clinical exams and achieved a success rate of 83.1% for risk-based LVEF classification. Our experiments suggests the fusion of the two apical views improves the performance, compared to single-view networks, especially A2C. The proposed solution is promising for segmentation-free detection of high-risk LVEF. Direct LVEF estimation eliminates ventricle segmentation, and can hence be a useful tool for formal echo and point-of-care cardiac ultrasound.
D. Behnami and C. Luong—Joint first authors.
P. Abolmaesumi and T. Tsang—Joint senior authors.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
1 Introduction
Heart disease is the leading cause of death globally, claiming the lives of over 8.5 million people in year 2015 alone [17]. Left ventricular ejection fraction (LVEF) is an important cardiac parameter and the key predictor for prognosis in most cardiac conditions, including valve disease, coronary artery disease, and heart failure [3]. Formally, LVEF is defined as the ratio between the amount of blood pumped out of the left ventricle (LV) every systole and the maximum amount of blood in LV at the end of diastole. The most common imaging modality for measuring LVEF is echocardiography (echo) [3]. Echo is non-ionizing, accessible, low-cost, real time, and therefore ideal for studying the cardiac anatomy and function. In 2D echo, LVEF is conventionally quantified using the biplane method of disks, a.k.a. Simpson’s rule [3]. This method calculates LVEF through LV volume estimation in end-systolic (ES) and end-diastolic (ED) frames, from apical two-chamber (A2C) and apical four-chamber (A4C) views. This segmentation-based routine is time-consuming and challenging with the presence of noise and unclear endocardial boundaries. Furthermore, studies suggest manual measurement of LVEF suffers from intra- and inter-user variability, especially among novice cardiologists [2, 5]. To assist with automation of LV segmentation, several solutions have become commercially available [19]. A number of research groups have also proposed semi-automatic and automatic LV segmentation techniques, including recent machine learning and deep learning approaches [6, 8, 14, 15, 21]. Though promising for LV volume estimation in a given frame, these methods can lack robustness for LVEF prediction. This is due to dependence of LVEF on accurate LV tracing in ED and ES.
Clinically, LVEF is often measured through direct visual estimation [13]. Experienced cardiologists can eyeball LVEF from echo cine loops based on the wall motion and atrio-ventricular plane displacements [13]. Studies suggest direct visual estimation of LVEF is closely correlated to quantitative segmentation-based techniques [10]. Though this is the preferred choice of experts for quick LVEF assessment, visual estimation is a highly reader-dependent technique, leading inexperienced novice imagers to hesitate to use it [3, 13]. Moreover, eyeballing LVEF is not a reliable option for other clinicians with limited echo training.
Direct estimation of LV volume and LVEF in cardiac magnetic resonance (MR) images has been explored by several groups [9, 12, 20, 22]. Nevertheless, to the best of our knowledge, direct LVEF assessment has not been previously investigated in echo images. It is worth noting that LVEF estimation in echo is inherently a much more difficult problem compared to MR for several reasons. First, variability in acquiring standard echo imaging planes introduces greater variance in the appearance of the LV anatomy in 2D echo images. Moreover, the short-axis (SAX) view, used for LVEF estimation in MR (Fig. 1(a)), captures a much simpler cardiac motion and field-of-view compared to the views used in echo (Fig. 1(b) and (c)). Other challenges in echo include variable image quality and image settings, which also add to the complexity of a machine learning-based solution for direct LVEF assessment.
In this paper, we introduce a deep network that mimics the clinicians’ eye-balling technique in echo to help classify exams as high-risk (\(\text {LVEF}\le 40\%\)) or low-risk (\(40\%<\text {LVEF}\le 75\%\)). The following contributions are made: (1) Our approach directly estimates LVEF from echo cine loops, eliminating the need for LV segmentation and detection of key cardiac frames. LV segmentation can be challenging due to the high variability in echo image quality and image settings, as well as variability in the operator’s experience in obtaining the correct echo standard views; (2) We propose a dual-stream framework for A2C and A4C views, consisted of view-specific spatial feature extraction blocks as well as shared recurrent neural network (RNN) layers. (3) We report the performance of several state-of-the-art networks and empirically show that for all the dual-view framework perform equally or better than a single apical view in classification of low-risk vs. high-risk LVEF.
2 Material
LVEF Labels: Our objective is to distinguish between the low-risk and high-risk LVEF classes. Let \(\mathbf {Y}_{\text {Simpson's}}\) and \(\mathbf {Y}_{\text {Binary}}\) denote the Simpson’s rule-based gold standard LVEF measurement and derived risk-based binary labels, respectively. We define \(\mathbf {Y}_{\text {Binary}}\) such that \(\mathbf {Y}_{\text {Binary}}=1\) for \({\mathbf {Y}}_{\text {Simpson's}}\le 40\%\), and \(\mathbf {Y}_{\text {Binary}}=0\) for \({40\%<\mathbf {Y}}_{\text {Simpson's}}\le 75\%\). Figure 2 visualizes the clinical labels in the database (\({\mathbf {Y}}_{\text {Simpson's}}\) and \({\mathbf {Y}}_{\text {Eyeballed}}\)) and the derived risk-based binary labels used in the present classification network (\(\mathbf {Y}_{\text {Binary}}\)). Cases with \({\mathbf {Y}}_{\text {Simpson's}}>75\%\) are excluded from this study due to the very limited number of samples.
Database: Ethics approval was obtained from our local regulatory authority to access a database of clinical echo exams and corresponding diagnostic reports at a tertiary care center. We searched the report database for echo exams that satisfied the following criteria: (1) The segmentation-based (\(\mathbf {Y}_{\text {Simpson's}}\)) and segmentation-free (\(\mathbf {Y}_{\text {Eyeballed}}\)) LVEF labels are recorded in the report database, and in agreement; (2) Correspondences can be found between the echo cines and diagnostic report based on the study identification information; (3) A2C and A4C views are both available. Also, in this paper, we focus the studies acquired using the same family of ultrasound machines (Philips iE33). A total of 1,186 samples with the above criteria were gathered; 541 high-risk and 645 low-risk cases. The dataset was divided in a 4Â :Â 1 ratio for training and test.
Echo Data and Preparation: 2D frames of \(800\times 600\) pixels are cleaned using a binary beam-shaped mask, cropped around the beam area, and downsized to \(128 \times 128\) pixels. Temporally, frames are sampled from one full visible cycle in each cine loop \(\text {AXC}\), where \(\text {AXC}\in \{\text {A2C}, \text {A4C}\}\). To extract one cycle from each \(\text {AXC}\) cine, we find the of R peaks in its available electrocardiograms (ECG) and trim the cine to frames \(R_1^{\text {AXC}}\) to \(R_2^{\text {AXC}}\). An equal number of \(F=25\) frames are uniformly sampled from each sequence (Fig. 3).
3 Methods
We propose the network in Fig. 4 for binary LVEF classification. This network is consisted of spatial feature extraction (FE) blocks as well as RNN-based layers for temporal learning.
Dual-view Spatial Feature Learning: We rely on CapsuleNet [18] and DenseNets [11] for frame feature extraction (FE), as they have been recently proved successful in spatial feature learning. Initially, sampled synchronous A2C and A4C frames are fed into FE blocks. The flattened output of an FE for a frame t is a feature vector \(\mathbf {X}_{m,t}^{\text {AXC}}\) of length \(M\times 1\); \(m=1:M\). In the dual-view framework, \(\mathbf {X}_{m,t}^{\text {A2C}}\) and \(\mathbf {X}_{m,t}^{\text {4XC}}\) are then concatenated to form a dual-view feature vector \(\mathbf {X}_{m,t}^{\text {A2C+A4C}}\) of length \(2M\times 1\). For an exam with two streams and sequence length of F frames, a feature matrix \(\mathbf {X}_{m,t}^{\text {A2C+A4C}}\) of size \(2M\times F\) is constructed, where \(t=1:F\). \(\mathbf {X}_{m,t}^{\text {A2C+A4C}}\) is a dense representation of the cardiac cycle based on two views.
RNNs for Temporal Encoding: The other key components in the network are the RNN blocks, which enable sequential and temporal learning. We investigated various RNNs, including cascades of uni- and bi-directional Long Short Term Memory (LSTM) and Gated Recurrent Unit (GRU). The RNN blocks take in \(\mathbf {X}_{m,t}^{\text {A2C+A4C}}\) at F separate time steps and output an array of the learned sequential features. This output is further pushed to a cascade of two fully connected (FC) layers, with ReLU and Softmax activation functions, respectively.
Training: The proposed architecture is implemented in Python using Keras with TensorFlow backend [4]. Dropout and batch normalization layers are used after FE blocks to prevent overfitting. The start points of the sampled frames are selected at random within the range \(R_1^{\text {AXC}}\) to \(R_2^{\text {AXC}}\). Augmented data is created on the fly via randomly generated transforms, including rotation, scaling, cropping and gamma transformation on the intensities.
5 Discussion and Conclusion
In this paper, we introduced a new framework based on DenseNet, CapsuleNet and RNN layers for estimating LVEF from echo cines in A2C and A4C standard echo views. Our results suggest that A2C alone is a less reliable view for LVEF estimation, while A4C alone appears to be a much more robust option with the current framework. However, the most accurate results is achieved by combining both apical views. This observation is also aligned with anecdotal clinical evidence, where A2C views are more difficult to obtain over A4C, and are more likely to be foreshortened [16], hence LVEF estimation from A2C can be less reliable. LSTM and GRU often performed equivalently, although the highest accuracy was obtained using GRU blocks. The results also consistently suggest that bidirectional recurrent layers are equivalent to or better than unidirectional ones. The optimal deep model, consisted of DenseNet + bidirectional GRU, achieved a success rate of 83.1% on the test set for detecting high-risk LVEF. We observed that DenseNet achieved a higher accuracy, compared to CapsuleNet. Given the performance of CapsuleNet on public data sets [18], this was inconsistent with our initial expectations. However, we suspect that this is due to the small size of our training set for learning such a complex, yet subtle, problem. DenseNets have been proven effective for learning spatial features in relatively small training sets [11]. It is worth mentioning that based on our analysis of the main diagnostic report database, only an approximate 70.1% of the (\(\mathbf {Y}_{\text {Simpson's}}\)) and (\(\mathbf {Y}_{\text {Eyeballed}}\)) labels agree. While these cases were excluded from the presented study, we suspect that the accuracy of the clinical ground truth labels may be similarly compromised to some extent.
A key pattern recognized from the results is the link between model performance, the quality of apical images, and view synchronization (Fig. 6). Misclassified images generally have unclear LV boundaries, which causes a great deal of variance in the appearance of the heart and its motion. Also, despite the automatic and manual view classification, confusion between the four apical views (A2C, three-chamber, A4C and five-chamber) appears to remain a challenge and a potential source of error (e.g., Fig. 6(c)). Thus, a bottom-up approach for improving LVEF accuracy can be through improving the quality of the input data. Abdi et al. recently proposed a deep-learning solution for automatic estimation of echo quality [1], which can be used to provide feedback to ultrasound operators for improving the quality of data acquisition.
A resolvable limitation of the proposed solution is the dependence on ECG, for phase detection and synchronization. ECG is not available in point-of-care. Moreover, visual inspection of the results revealed correlation between misclassification and apparent improper synchronization (see e.g., Fig. 6(d), which shows asynchronous A2C and A4C views based on the valve state). We believe improving the phase detection can contribute to achieving more accurate results. Alternatively, a cine-based cardiac phase detection can be implemented into the network. A possible solution has been proposed by Dezaki et al. [7] for A4C images, which can be similarly extended to A2C. This method is capable of automatically identifying ES and ED, which could be used to achieve potentially richer temporal sampling of systolic and diastolic phases.
One possible option to eliminate phase-dependence altogether is through having two separate RNN streams; one per A2C and A4C views. This decouples the two views from one another, enabling the use of potentially informative cines in full. However, this architecture causes a large sudden increase in the network size, and is still less successful for LVEF estimation based on our experiments thus far. This is most likely because the inputs of the RNN blocks, i.e. the frame feature vectors, are denser and richer when constructed from two complimentary views, allowing for more effective temporal learning. This may change should we increase our training set.
While a binary risk-based LVEF classification tool could assist with immediate decision making in point-of-care, it suffers from a flaw: it imposes a sharp boundary on the true regression labels (\(\mathbf {Y}_{\text {Simpson's}}\)). This can be amended by adding a medium-risk class, or more classes of \(\mathbf {Y}_{\text {Eyeballed}}\). We plan to include exams from other ultrasound machines to obtain enough data for this multi-class classification.
Given that LV localization appears to be the key step in some LVEF estimation approaches proposed for cardiac MRÂ [12], another question worth exploring is whether LV localization helps with LVEF accuracy in echo. While the motion of the atria and right ventricle can contain subtle information about LVEF, excluding them decreases variance from the neighbouring chambers. Existing encoder-decoder segmentation networks can be modified and used to localize, track and accordingly crop LV throughout the cine.
References
Abdi, A.H., et al.: Quality assessment of echocardiographic cine using recurrent neural networks: feasibility on five standard view planes. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10435, pp. 302–310. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66179-7_35
Bresser, P., De Beer, J., De Wet, Y.: A study investigating variability of left ventricular ejection fraction using manual and automatic processing modes in a single setting. Radiography 21(1), e41–e44 (2015)
Cameli, M., Mondillo, S., Solari, M., et al.: Echocardiographic assessment of left ventricular systolic function: from ejection fraction to torsion. Heart Fail. Rev. 21(1), 77–94 (2016)
Chollet, F., et al.: Keras (2015). https://github.com/keras-team/keras
Cole, G.D., Dhutia, N.M., Shun-Shin, M.J., et al.: Defining the real-world reproducibility of visual grading of left ventricular function and visual estimation of left ventricular ejection fraction: impact of image quality, experience and accreditation. Int. J. Cardiovasc. Imaging 31(7), 1303–1314 (2015)
Deo, R.C., Zhang, J., Hallock, L.A., et al.: An end-to-end computer vision pipeline for automated cardiac function assessment by echocardiography. CoRR (2017). http://arxiv.org/abs/1706.07342
Dezaki, F.T., et al.: Deep residual recurrent neural networks for characterisation of cardiac cycle phase from echocardiograms. In: Cardoso, M., et al. (eds.) DLMIA/ML-CDS -2017. LNCS, vol. 10553, pp. 100–108. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67558-9_12
Dong, S., Luo, G., Sun, G., et al.: A left ventricular segmentation method on 3D echocardiography using deep learning and snake. In: 2016 Computing in Cardiology Conference (CinC), pp. 473–476. IEEE (2016)
Gu, B., Shan, Y., Sheng, V.S., et al.: Sparse regression with output correlation for cardiac ejection fraction estimation. Inf. Sci. 423, 303–312 (2018)
Gudmundsson, P., Rydberg, E., Winter, R., et al.: Visually estimated left ventricular ejection fraction by echocardiography is closely correlated with formal quantitative methods. Int. J. Cardiol. 101(2), 209–212 (2005)
Huang, G., Liu, Z., Weinberger, K.Q., et al.: Densely connected convolutional networks. In: IEEE CVPR (2017)
Kabani, A.W., El-Sakka, M.R.: Ejection fraction estimation using a wide convolutional neural network. In: Karray, F., Campilho, A., Cheriet, F. (eds.) ICIAR 2017. LNCS, vol. 10317, pp. 87–96. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59876-5_11
Kim, C., Hur, J., Kang, B.S., et al.: Can an offsite expert remotely evaluate the visual estimation of ejection fraction via a social network video call? J. Dig. Imaging 30(6), 718–725 (2017)
Leclerc, S., Grenier, T., Espinosa, F., Bernard, O.: A fully automatic and multi-structural segmentation of the left ventricle and the myocardium on highly heterogeneous 2D echocardiographic data. In: 2017 IEEE International Ultrasonics Symposium (IUS), pp. 1–4. IEEE (2017)
Ngo, T.A., Lu, Z., Carneiro, G.: Combining deep learning and level set for the automated segmentation of the left ventricle of the heart from cardiac cine magnetic resonance. Med. Image Anal. 35, 159–171 (2017)
Nosir, Y., Vletter, W.B., Boersma, E., et al.: The apical long-axis rather than the two-chamber view should be used in combination with the four-chamber view for accurate assessment of left ventricular volumes and function. Eur. Heart J. 18(7), 1175–1185 (1997)
Organization, W.H.: Global health observatory (GHO) data (2017). http://www.who.int/gho/mortality_burden_disease/causes_death/top_10/en/
Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Advances in Neural Information Processing Systems, pp. 3859–3869 (2017)
Wood, P.W., Choy, J.B., Nanda, N.C., et al.: Left ventricular ejection fraction and volumes: it depends on the imaging method. Echo 31(1), 87–100 (2014)
Xue, W., Lum, A., Mercado, A., Landis, M., Warrington, J., Li, S.: Full quantification of left ventricle via deep multitask learning network respecting intra- and inter-task relatedness. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10435, pp. 276–284. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66179-7_32
Zhang, J., Gajjala, S., Agrawal, P., et al.: A web-deployed computer vision pipeline for automated determination of cardiac structure and function and detection of disease by two-dimensional echocardiography. arXiv:1706.07342 (2017)
Zhen, X., Wang, Z., Islam, A., et al.: Multi-scale deep networks and regression forests for direct bi-ventricular volume estimation. Med. Image Anal. 30, 120–129 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Behnami, D. et al. (2018). Automatic Detection of Patients with a High Risk of Systolic Cardiac Failure in Echocardiography. In: Stoyanov, D., et al. Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. DLMIA ML-CDS 2018 2018. Lecture Notes in Computer Science(), vol 11045. Springer, Cham. https://doi.org/10.1007/978-3-030-00889-5_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-00889-5_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00888-8
Online ISBN: 978-3-030-00889-5
eBook Packages: Computer ScienceComputer Science (R0)