Keywords

1 Introduction

The unquestioned cardiomyogenic potential of human embryonic stem cells [1] and the well-established protocols for their isolation and maintenance [2] make them one of the most promising sources of cardiomyocytes (CMs) for applications such as cell-based cardiac repair [3] and drug screening [4]. However, their use is still hampered by the current limited understanding of the phenotypic traits of human stem cell-derived cardiomyocytes (hESC-CMs) and their relationship to the phenotypes of adult CMs [3]. To characterize the phenotype of hESC-CMs, prior work [5] studies the expression of specific genes and ion channel-encoding subunits. Alternatively, [6, 7] apply thresholding to simple features extracted from the cell’s action potential (AP). However, simple classification methods based on handcrafted features and subjective criteria not only discard most of the information contained in the AP, but also are hardly transferable.

Recently, automatic methods have been proposed to analyze the heterogeneity of hESC-CMs APs using the whole AP as an input. For instance, the existence of different clusters was studied via a spectral grouping-based algorithm in [8], and the metamorphosis distance proposed in [9] was adapted in [10] to classify embryonic APs by computing their distances to adult APs with known phenotype. While this new method shows better interpolation and clustering results [11], it is too computationally intensive to be applicable to large-scale datasets.

In this paper we propose a new method for classifying hESC-CMs APs based on recurrent neural networks (RNNs) with long short term memory (LSTM) units [12]. LSTMs have recently re-gained popularity for time series classification because of their great performance in applications to speech recognition [13] and activity recognition [14]. However, while LSTMs have also been successfully applied to the analysis of physiological signals [15, 16], standard LSTMs are not directly applicable to the classification of hESC-CMs because of the lack of labels for embryonic APs. In this context, the contribution of this paper is to propose a semi-supervised approach that exploits the abundance of labels for adult APs, which can be obtained via simulation of electrophysiological models for the typical adult phenotypes (atrial, ventricular, etc.). The proposed semi-supervised approach uses a novel loss function to train an LSTM that combines a classification loss for adult APs (supervised part) and a contrastive loss for embryonic APs (unsupervised part). For the supervised part we use synthetic APs obtained from computational models of adult atrial [17] and ventricular [18] CMs, while for the unsupervised part we compute similarities between APs, making efficient use of Euclidean and metamorphosis distances.

Experiments on a dataset of 6940 hESC-CMs APs show that our semi-supervised approach provides smooth clustering results that are comparable to those presented in [11] in terms of Davies-Bouldin Index (DBI), and also confirm the benefit of integrating information from both adult and embryonic APs. Furthermore, the semi-supervised approach is able to use the Euclidean metric more effectively than previous methods, considerably outperforming the 1-nearest neighbor scheme (\(87.88\%\) vs \(62.90\%\) of agreement with the best result published in [11]). When the metamorphosis distance is used, our method achieves 94.73% of agreement with the best results published in [11], but it is significantly less computationally expensive when applied to new data.

2 Methods

Let the sequence \(\mathbf {x}_j^e=\{x_j^e (k) \in \mathbb {R}\}_{k=1}^{T}\), where T is the total number of samples in one cycle length, represent the jth embryonic AP. Let \(\mathbf {x}_i^a\) be the ith adult AP and let \(y_i^a\in \{0,1\}\) be its ground truth label, where \(y_i^a=0\) denotes atrial and \(y_i^a=1\) denotes ventricular. We consider the problem of assigning a label \(\hat{y}_{j}^e\) to each \(\mathbf {x}_j^e\), where \(\hat{y}_{j}^e=0\) denotes atrial-like and \(\hat{y}_{j}^e=1\) denotes ventricular-like.

A simple approach is to use a 1 nearest-neighbor (1NN) classifier with the Euclidean distance \({d}_{E}({\mathbf {x}}_{j}^e,{\mathbf {x}}_{i}^a)=\frac{1}{{\sigma }_{M}}\root \of {\sum _{k=1}^{T}{\left( {x}_{j}^e(k)-{x}_{i}^a(k)\right) }^{2}}\), where \(\sigma _M\) is a normalization parameter. However, the Euclidean distance can be affected by nuisance factors such as changes in AP shape induced by the maturation process.

An alternative approach is to use 1NN classification with the metamorphosis distance, which generates an interpolation path x(ks) between an embryonic AP, \(x(k,0) = x_j^e(k)\), and an adult AP, \(x(k,S) = x_i^a(k)\), that minimizes the amount of deformation between the two, which depends on a certain velocity \(\mathbf {v}\):

$$\begin{aligned} d^2_M(\mathbf {x}_j^e,\mathbf {x}_i^a) = \min _{\mathbf {x},\mathbf {v}} \sum _{s=0}^{S-1} \Vert v(k,s)\Vert _V^2 + \tfrac{1}{\sigma ^2_M}\Vert x(k+v(k,s),s+1) - x(k,s)\Vert _2^2, \end{aligned}$$
(1)

where \({\Vert \cdot \Vert }_{V}^{2}\) is a Sobolev norm and \({\sigma }_{M}^2\) is a balancing parameter (see [10, 11]). However, the metamorphosis distance is computationally intensive to evaluate.

2.1 Classifier Architecture

Long Short-Term Memory (LSTM) units [12] are recurrent blocks whose key elements are input gates i(k), forget gates f(k) and output gates o(k) that modulate the evolution of the state c(k) and its output h(k) at time k as follows

$$\begin{aligned} \begin{array}{ll} i(k)&{}=\sigma \left( {W}_{i}x(k)+{U}_{i}h (k\!-\!1)+{b}_{i}\right) \in {\mathbb {R}}^{p}\\ f(k)&{}=\sigma \left( {W}_{f}x(k)+{U}_{f}h(k-1)+{b}_{f}\right) \in {\mathbb {R}}^{p}\\ o(k)&{}=\sigma \left( {W}_{o}x(k)+{U}_{o}h(k-1)+{b}_{o}\right) \in {\mathbb {R}}^{p}\\ c(k)&{}=f(k)\circ c(k-1)+i(k) \circ \tanh \left( {W}_{c}x(k)+{U}_{c}h(k-1)+{b}_{c}\right) \in {\mathbb {R}}^{p}\\ h(k)&{}=o(k) \circ \tanh \left( c(k)\right) \in {\mathbb {R}}^{p}\\ \end{array} \end{aligned}$$
(2)

where p denotes the layer dimension, \(\sigma (z)=\frac{1}{1+{e}^{-z}}\) is the sigmoidal function, x(k) is the input sequence at time k and \(\circ \) denotes the Hadamard product.

Fig. 1.
figure 1

Network architecture: one hidden LSTM layer (\(p=3\)) and a sigmoid unit.

The proposed architecture for the classifier is depicted in Fig. 1: an RNN with one hidden LSTM layer of dimension \(p=3\), and one sigmoid unit as the output layer (64 parameters in total). This sigmoid unit operates only in the last value of the cell output h(T), once all the sequence x(k) has been processed by the LSTM layer.

2.2 Semi-supervised Objective Function

We use the binary crossentropy loss \(\ell (y,\hat{y}) = -y\log (\hat{y})-(1-y)\log (1-\hat{y})\) to quantify how close the LSTM prediction \(\hat{y} = \sigma (h(T)^TW+b)\) is to label y. More specifically, given \(N_a\) adult APs \(\{\mathbf {x}_i^a\}\) and their labels \(\{y_i^a\}\), our supervised loss is

$$\begin{aligned} \frac{1}{N_a} \sum _{i=1}^{N_a} \{ -y_i^a\log (\hat{y}_i^a)-(1-y_i^a)\log (1-\hat{y}_i^a) \}. \end{aligned}$$
(3)

Now, while we do not have labels for the embryonic APs \(\{\mathbf {x}_j^e\}\), we can still use \(\ell (\hat{y}_{j}^e,\hat{y}_{j'}^e)\) to compare the predicted labels for two different embryonic APs. Intuitively, we would like similar APs to have the same labels, and dissimilar APs to have different labels. Therefore, given \(N_e\) APs, we use a contrastive loss

$$\begin{aligned} \frac{1}{N_e\left( N_e-1\right) } \sum _{j=1}^{N_e}\sum _{j' \ne j}{{s}_{(j,j')}\cdot \ell \left( \hat{y}_{j}^e,\hat{y}_{j'}^e\right) +\left( 1-{s}_{(j,j')}\right) \cdot \ell \left( (1-\hat{y}_{j}^e),\hat{y}_{j'}^e\right) }, \end{aligned}$$
(4)

where \({s}_{(j,j')}\) represents the similarity between AP \({\mathbf {x}}_{j}^e\) and AP \({\mathbf {x}}_{j'}^e\). We define the similary between two APs based on their distance \(d\left( {\mathbf {x}}_{j}^e,{\mathbf {x}}_{j'}^e\right) \) (Euclidean or metamorphosis) as \({s}_{(j,j')}=\exp \left( -\frac{{d}^{4}\left( {\mathbf {x}}_{j},{\mathbf {x}}_{j'}\right) }{{\sigma }_{s}^{4}}\right) \), where \({\sigma }_{s}\) is chosen as \({\sigma }^{4}_{s} = \overline{{d}^{4}}\), where d is the distance variable and the top bar denotes average operator.

After combining the supervised and unsupervised terms of the loss, we obtain

$$\begin{aligned}&\frac{1-\lambda }{N_a} \left( \sum _{j=1}^{N_a}{\ell \left( {y}_{j}^{a},\hat{y}_{j}^{a}\right) }\right) + \frac{\lambda }{N_e-1} \sum _{j=2}^{N_e}{s}_{(j,j-1)}\cdot \ell \left( \hat{y}_{j}^{e},\hat{y}_{j-1}^{e}\right) \nonumber \\&\qquad \quad +\left( 1-{s}_{(j,j-1)}\right) \cdot \ell \left( (1-\hat{y}_{j}^{e}),\hat{y}_{j-1}^{e}\right) ,\qquad \end{aligned}$$
(5)

where \(\lambda \) is a balancing parameter between supervised and unsupervised parts. Instead of making pairwise comparisons between all APs, we propose to compare an AP \({\mathbf {x}}_{j}^e\) with the previous one \({\mathbf {x}}_{j-1}^e\), so fewer distance computations are needed.

2.3 Clustering Quality Index

Since no ground truth labels are available for embryonic APs, the Davies-Bouldin Index (DBI) [19] is considered as a measure of clustering quality. Let \({\varOmega }_{0}=\left\{ {\mathbf {x}}_{j}^e \mid \hat{y}_{j}^e < 0.5\right\} \) and \({\varOmega }_{1}=\left\{ {\mathbf {x}}_{j}^e \mid \hat{y}_{j}^e\ge 0.5\right\} \) be the sets containing the different clusters, let \({S}_{y}\) be the mean distance from elements of class y to the average signal of the same class, \({\mu }_{y}(k)= \frac{1}{|{\varOmega }_{y}|} \sum _{{\mathbf {x}}_{j}^e\in {\varOmega }_{y}} {x}_{j}^e(k)\), and let \({M}_{01}\) be the distance between the averages \({\mu }_{0}(k)\) and \({\mu }_{1}(k)\). The DBI is defined as the ratio between the intra-cluster dispersion and the distance between clusters

$$\begin{aligned} DBI\left( {\varOmega }_{0},{\varOmega }_{1}\right) = \frac{{S}_{0}+{S}_{1}}{{M}_{01}}, \end{aligned}$$
(6)

and should be as small as possible. For computational reasons, and since the Euclidean distance \(d_E\) is a good approximation of the metamorphosis distance \(d_M\) for small distances, the intra-cluster dispersions \({S}_{0}\) and \({S}_{1}\) are computed using \(d_E\), whereas the distance between clusters \({M}_{01}\) is computed using \(d_M\).

3 Experiments

3.1 Adult CMs APs Data

A population of 2000 synthetic adult APs was generated by using computational models. The O’hara-Rudy model (ORd) [18] and the Nygren model [17] were paced at 1.5 Hz with 1000 random sets of parameters each (varying between \(80\%\) and \(120\%\) of their nominal values) to generate ventricular and atrial mature CMs APs, respectively. The parameters varied were the maximum conductances and permeabilities of ion channels (\({g}_{Na}\), \({g}_{NaL}\), \({g}_{{t}_{0}}\), \({g}_{Kr}\), \({g}_{Ks}\), \({g}_{K1}\), \({g}_{{NC}_{X}}\), \({g}_{Kb}\), \({g}_{pCa}\), \({P}_{Ca}\), \({P}_{NaK}\),\({P}_{Nab}\), \({P}_{Cab}\) in ORd model, and \({g}_{CaL}\), \({g}_{Ks}\), \({g}_{Kr}\), \({g}_{K1}\), \({g}_{Nab}\) and \({g}_{Cab}\) in Nygren model). Normalization was applied to each AP so that its maximum voltage and resting membrane potential are 1 and 0, respectively. The Sparse Modeling for Representatives Selection (SMRS) method [20] was then applied to select a subset of \(N_a = 300\) templates shown in Fig. 2a.

Fig. 2.
figure 2

Action potentials: (a) 300 adult CMs, (b) 6940 hESC-CMs.

3.2 hESC-CMs Data

A population of \(N_e = 6940\) hESC-CMs APs obtained from 9 cell aggregates paced at 1.5 Hz and optically mapped at a sampling rate of 500 Hz was obtained in [21]. The APs were averaged over beating cycles, a \(5\times 5\) boxcar spatial filter was applied for denoising, and then they were normalized (see Fig. 2b). Only 1600 APs (fixed and coming from 2 cell aggregates) were used for training, but labels were predicted for the whole dataset.

3.3 Implementation Details

The classifier architecture was implemented in Keras [22] with TensorFlow backend and trained using the RMSProp optimizer (learning rate 0.003) using batches of 19 APs (3 adult and 16 embryonic). 90 batches were used for training and 10 batches for validation, completing \(N_a = 300\) adult APs and \(N_e = 1600\) embryonic APs in total. The metamorphosis parameter was set as \({\sigma }_{M}=0.3\).

Three cases are studied: Supervised learning \(\lambda =0\) (Sup-LSTM), Semi-supervised learning \(\lambda =0.1\) with Euclidean distances (Semi-LSTM-E), and Semi-supervised learning \(\lambda =0.1\) with metamorphosis distances (Semi-LSTM-M). In each case the network was trained 5 times (100 epochs for the Sup-LSTM case and 200 epochs for the Semi-LSTM cases), and the average of the classification results at the last epoch is analyzed.

3.4 Experimental Results

The average classification results generated by the RNN LSTM in the studied cases are shown in Fig. 3 for the 9 cell aggregates. In all cases the proposed classifier generates smooth classification regions and suggests heterogeneity in most of the cell aggregates, which coincides with previous findings [11, 21]. Observe that the classification result produced by semi-supervised learning is significantly different from the one produced by supervised learning, with the former being significantly better in terms of DBI. This emphasizes that adult and embryonic APs intrinsically belong to different domains, and therefore classifying embryonic APs with a network trained only with adult APs is not adequate.

Fig. 3.
figure 3

LSTM classification results (each pixel corresponds to one hESC-CM AP). Blue indicates atrial-like phenotype and red indicates ventricular-like phenotype.

Table 1 compares our results to those of the method presented in [11] (1NN classifier with \(N_a = 20\) synthetic adult AP templates). Observe that supervised learning shows significantly higher DBI than the rest, which is expected since it does not consider hESC-CMs data during training. On the other hand, the semi-supervised learning scheme outperforms the 1NN scheme when Euclidean distances are used (DBI 0.2458 vs 0.2558). 1NN with Euclidean distances was replicated with the same 300 adult AP templates used to train the network (see Table 1), confirming that the improvement in clustering quality observed in the semi-supervised scheme is not attributable to the number of templates used, but to the method itself: Euclidean metric is a good approximation of metamorphosis for small distances, so it performs better when distances within hESC-CMs domain are computed (proposed semi-supervised framework) than when distances between hESC-CMs and adult CMs domains are computed (1NN).

Table 1. Comparing the results of the proposed method (LSTM) with the results presented in [11]. Accuracy* is computed assuming 1NN classification with metamorphosis distance as ground truth (E: Euclidean, M: Metamorphosis).
Fig. 4.
figure 4

Accuracy* vs DBI. 1NN M as ground truth (E: Euclidean, M: Metamorphosis).

1NN metamorphosis results presented in [11] show the best clustering quality (DBI 0.2297), followed by the Semi-LSTM-M (DBI 0.2390). The classification accuracy assuming 1NN metamorphosis as the ground truth was computed and plotted vs the DBI in Fig. 4. The use of metamorphosis distance in semi-supervised learning not only produces lower DBI but also consistently generates better classification accuracy than when the Euclidean distance is used (small dots in Fig. 4 represent single trials results, and squares represent the average classification per case). An improvement of \(24.98\%\) in the classification accuracy is observed between 1NN and the semi-supervised learning scheme when 300 templates and only Euclidean distances are used, achieving \(87.88\%\) accuracy without any metamorphosis distance computation.

4 Conclusion

The proposed method not only successfully integrates labeled data from a different domain to solve the task, but also proves to be a powerful framework to improve the performance of Euclidean-based methods in the classification of hESC-CMs APs. Moreover, it reaches \(94.73\%\) of agreement with the state-of-the-art, trading off accuracy with computational complexity: whereas the classification of a new sample in state-of-the-art method requires to solve 20 computationally intensive optimization problems (6.74 sec/sample in 2 8-core computer nodes with 8 2.3 GHz CPUs per node [11]), in the proposed method it just needs to be processed by a small RNN with fixed weights (less than 6 sec for the whole 6940 APs dataset in one 2.2 GHz CPU with 2 cores, 4 threads).