Europe PMC

This website requires cookies, and the limited processing of your personal data in order to function. By using the site you are agreeing to this as outlined in our privacy notice and cookie policy.

Abstract 


Current cardiac implantable devices (IDs) are equipped with a set of sensors that can provide useful information to improve patient follow-up and prevent health deterioration in the postoperative period. In this paper, data obtained from an ID with two such sensors (a transthoracic impedance sensor and an accelerometer) are analyzed in order to evaluate their potential application for the follow-up of patients treated with a cardiac resynchronization therapy (CRT). A methodology combining spatiotemporal fuzzy coding and multiple correspondence analysis (MCA) is applied in order to: 1) reduce the dimensionality of the data and provide new synthetic indexes based on the "factorial axes" obtained from MCA; 2) interpret these factorial axes in physiological terms; and 3) analyze the evolution of the patient's status by projecting the acquired data into the plane formed by the first two factorial axes named "factorial plane." In order to classify the different evolution patterns, a new similarity measure is proposed and validated on the simulated datasets, and then, used to cluster observed data from 41 CRT patients. The obtained clusters are compared with the annotations on each patient's medical record. Two areas on the factorial plane are identified, one being correlated with a health degradation of patients and the other with a stable clinical state.

Free full text 


Logo of halLink to Publisher's site
IEEE Trans Biomed Eng. Author manuscript; available in PMC 2008 Dec 10.
Published in final edited form as:
PMCID: PMC2597199
HALMS: HALMS333691
PMID: 18838359

Exploring time series retrieved from cardiac implantable devices for optimizing patient follow-up

Abstract

Current cardiac implantable devices (ID) are equipped with a set of sensors that can provide useful information to improve patient follow-up and to prevent health deterioration in the postoperative period. In this paper, data obtained from an ID with two such sensors (a transthoracic impedance sensor and an accelerometer) are analyzed in order to evaluate their potential application for the follow-up of patients treated with a cardiac resynchronization therapy (CRT). A methodology combining spatio-temporal fuzzy coding and multiple correspondence analysis (MCA) is applied in order to: i) reduce the dimensionality of the data and provide new synthetic indices based on the “factorial axes” obtained from MCA, ii) interpret these factorial axes in physiological terms and iii) analyze the evolution of the patient’s status by projecting the acquired data into the plane formed by the first two factorial axes named “factorial plane”. In order to classify the different evolution patterns, a new similarity measure is proposed and validated on simulated datasets, and then used to cluster observed data from 41 CRT patients. The obtained clusters are compared with the annotations on each patient’s medical record. Two areas on the factorial plane are identified, one being correlated with a health degradation of patients and the other with a stable clinical state.

Keywords: time-series, trajectories, monitoring, cardiac implantable devices, data mining

I. Introduction

Cardiac resynchronization therapy (CRT) is indicated for patients suffering from drug-refractory congestive heart failure (CHF) associated with intraventricular dyssynchrony [1]. CRT improves hemodynamic parameters, ejection fraction or distance covered in the 6 minutes walking test [2]. Furthermore, CRT has shown to decrease hospitalizations for patients treated with the implantable devices (ID). Although the efficiency of this treatment has been proven, 20 to 30% of patients show either no improvement or worsening of their symptoms [3].

Individual follow-up of implanted patients is a key to understand the difference between responders and non-responders, and to prevent severe health degradation. Besides regular follow-up visits, during the post-operative period, an everyday follow-up is possible with the new IDs recently developed for CRT. They offer an increased storage capability of data acquired by the ID, providing information on the ID itself (e.g. event counters of pacing and sensing activities) or on the state of the patient (e.g. arrhythmias, electrograms) and on its activity [4]. Recorded data are very promising towards the home monitoring of patients, the prediction of adverse events or the reduction of hospitalizations. However, this source of information is under-exploited because data are large, multivariate, time-dependent and heterogeneous, and consequently difficult to interpret for caregivers.

The objective of the present study is to propose a methodology to process this amount of multivariate data, in order to i) evaluate and extract the information content of the time-dependent data downloaded from the pacemaker memory, ii) define synthetic indices which are easy to interpret and iii) characterize and compare different populations of patients. Given the dimensionality of the recorded data, methods of data reduction are investigated. The interest of the multidimensional analysis of the data recorded in the ID memory to objectively assess the patients’ response to the therapy and the validity of the exploratory techniques to process these data have been shown in two previous studies, using principal component analysis (PCA) [5] and multiple correspondence analysis (MCA) associated with a spatio-temporal fuzzy coding of the time-series [6]. The former method has been successfully used to differentiate a test population (patients with rate-responsive pacemakers) from a population of patients suffering from CHF by jointly exploiting a number of physiological variables and using a simple representation for the temporal dimension. Providing an appropriate adaptation of its table of analysis, MCA has been successfully applied to the analysis of the evolution of time-series across time, and is then used here as well. MCA performs a reduction of the dimensionality of the data and provides synthetic indices called “factorial axes”. A plane formed by two factorial axes is called “factorial plane”. Each patient is finally represented by trajectories on the factorial plane.

From a methodological point of view, several questions are raised: i) how to link the factorial axes with the variables acquired from the ID? ii) do patients with similar trajectories on the factorial plane have a similar clinical state, and if yes, how to cluster patients according to their evolution in the factorial plane? and iii) are the obtained clusters consistent with the clinical data available from the patients? The study addresses the clustering of trajectories with different numbers of points in the factorial plane, which implies the choice of appropriate distance measure and clustering method. This problem is related to temporal clustering (i.e. the clustering of time-series) [7], [8].

The paper is organized as follows. In section II, the clinical protocol and the ID data are presented. Then, in section III, the proposed methodology is described and the issues arising from the clustering of trajectories are presented and solutions are proposed. The proposed methodology is tested on simulated datasets and applied on the real recorded data in Section VI. The validity of the obtained clusters is evaluated in comparison with the medical records of patients participating in the protocol.

II. Data recorded by cardiac implantable devices

A. Patients

Forty-one patients (34 males, 7 females) participated in the present study. The mean age was 64 (minimum 38 and maximum 87). They suffered from refractory heart failure (RHF) associated with intraventricular dyssynchrony and present a thin QRS complex (< 120 ms), a NYHA class from III to IV and a left ventricular ejection fraction (LVEF) = 25% (± 7). They were candidate for cardiac resynchronization therapy and were then implanted with cardiac implantable devices. The patients were informed about the research protocol and gave their fully informed consent for participating in this study.

B. Description of the follow-up time-series

Data stored in the ID memory are retrieved in individual records at the end of the third, the sixth and the twelfth postoperative months. Each record covers a three-month length period. These data result from two sensors [9]: a transthoracic impedance sensor which reflects the respiratory activity of the patient and its intensity of effort, and an 1-D accelerometer which is linked to the intensity of the physical activity of the patient.

By combining information from the two sensors, the activity level of the patients is classified automatically by the device into two states: exercise and rest. For each state, 24-hour cumulative values of a number of variables are computed and recorded in the ID memory over 30-day follow-up periods. More details concerning the two sensors can be found in [10], [11]. The final set of thirteen physiological variables is listed in Table I and is constituted of seven recorded variables and of six additional variables deducted from the seven recorded ones.

TABLE I

List of 7 physiological variables recorded in the cardiac implantable devices and 6 variables computed from these recorded variables.

DescriptionNamesUnits
Total duration within the activity levelDurE1s
Cumulative values of accelerationACCEM·s−2(g)
Cumulative values of impedanceImpEImpRmillivolts (mV)
Cumulative number of ventilation cyclesNbBreathsENbBreathsRNbVC
Cumulative number of cardiac cyclesNbCardCycENbCC
“Mean”2 activity intensityActIntg·s−1
ImpE over ACCEImpOverAccmV·g−l
“Mean” heart rateHeartRateEBeats per minute (bpm)
“Mean” impedance minute ventilationImpMinVentEmV·min−1
“Mean” ventilation frequencyVentFreqENbVC·min−1
ImpE over ImpRImpRatenone
1Subscripts E and R are for Exercise and Rest, respectively.
2The duration of each Exercise and Rest period that occurs within 24 hours is unknown. Only the cumulative duration is known. Consequently, this “mean” is not the average of the variable values over 24 hours, excepted if all the periods are of the same duration.

III. A methodology for clustering multivariate time-series

An overview of the proposed methodology is provided in Figure 1. The analysis is first performed on a reference population to determine significant factorial axes and clusters of similar evolutions in the factorial planes. In a follow-up situation, new data would regularly be retrieved from patients’ implantable cardiac devices. The results of the analysis (i.e. factorial axes and clusters) would then be used by projecting the new data on the factorial planes and assigning the evolution of the patient to an existing cluster towards the diagnosis (i.e. the patient is improving or degrading). In this paper, only the analysis is presented.

An external file that holds a picture, illustration, etc.
Object name is halms333691f1.jpg

Overview of the proposed methodology. The analysis is performed on the reference population and leads to the determination of clusters of similar evolutions – according to an appropriately chosen dissimilarity measure – on the factorial plane defined by the smoothed multiple correspondence analysis (SMCA). A practical application of this methodology would be the projection of subsequent follow-up data as supplementary individuals on the factorial axes defined during the analysis and the assignation of the obtained trajectories to the “closest” cluster, in the sense of the chosen distance.

The analysis consists of 2 successive steps, namely fuzzy coding of the data and multidimensional analysis with smoothed multiple correspondence analysis (SMCA), described in the following subsections.

A. Space-time fuzzy coding

A coding of the recorded data is required, as MCA has been at first conceived for categorical variables. MCA exploits disjunctive tables Z = (zij)(i,j)[set membership][1,R]×[1,C] where zij is the membership value of the ith object to the jth modality. As a coding of the multivariate time-series, a fuzzy space-time windowing, defined by Loslever and Bouilland for characterizing and coding biomechanical temporal data [12], is proposed. Instead of an indicator matrix Z (i.e. zij [set membership] {0,1}), the MCA analyses a fuzzy version of Z where zij [set membership] [0,1] with the condition Σj[set membership]Jv, zij = 1, Jv being the set of modalities of the vth attribute (variable). In statistics, one modality of a variable is one possible level of this variable. In classical crisp coding, continuous variables are divided into several modalities (whose number depends on the distribution of the variable). For example, the variable “age” of a population can be split into 3 modalities “age ≤ 30”, “30 < age < 60” and “age ≥ 60”.

As depicted in Figure 2, the fuzzy space-time windowing divides the time domain of each variable into NT overlapping windows. The membership value of the qth time sample tq to the time window Tj is denoted μTj(tq) and falls between [0,1]. The membership values of each data point in the NT time windows meet the condition j=1NTμTj(tq)=1. The amplitudes of each variable (referred to as spatial domain) are coded in a similar manner with NA spatial fuzzy windows verifying the same properties.

An external file that holds a picture, illustration, etc.
Object name is halms333691f2.jpg

Temporal and spatial (amplitude) fuzzy coding of a continuous signal.

From the membership values in the time and the spatial (amplitude) domains, the membership value of the space-time window Wi,jn, for a given time-series (signal) TSk and the variable Vn, is denned as:

μWi,jn(TSk)=q=1QμTj(tq)·μAi,n(Vn(tq))q=1QμTj(tq)
(1)

where Vn(tq) is the value taken by the nth variable at time unit tq, μTj(tq) is the membership value of the time window Tj for the time unit tq, μAi,n(Vn(tq)) is the membership value of the ith space window Ai,n for the Vn(tq) value, and Q is the number of time units in TSk. With this definition, μWi,jn (TSk) is the weighted average of the space membership values with the time membership values as weights and verifies i=1NAμWi,jn(TSk)=1. This property is required to maintain the statistical context and to allow μWi,jn (TSk) to be interpreted as the frequency of appearance of the signal in the space-time window Wi,jn.

B. Multiple correspondence analysis (MCA)

MCA is of great interest to explore the data recorded by ID, as it handles both quantitative and qualitative data and captures nonlinear relationships between variables. It deals with a two-way cross-table with the observations (also called statistical individuals) as rows and the variables (or attributes) characterizing these observations as columns in the table. In MCA, rows and columns of the cross-table play a symmetric role and can be represented on the same plot. Another advantage of this method is the possibility of displaying supplementary variables and individuals jointly with the variables and individuals of analysis. They are not involved in the MCA but their projection on the factorial plane: i) refines and enriches the interpretation of the MCA factorial axes by relating them to meaningful variables (e.g. age, sex, etc.) and ii) enables the characterization of supplementary individuals according to their location with respect to individuals of analysis. Consequently in MCA, data acquired from other patients’ ID can be represented on the factorial plane jointly with the reference population: it is possible to study their evolution with respect to the evolution of the patients of analysis.

Being in a follow-up frame, the temporal information contained in the recorded data is primordial and has to be taken into account. Inherently MCA does not exploit time, but the temporal dimension can be introduced artificially. A simple way consists in representing each time sample (or time window) of a time-series by one statistical individual (a row of the cross-table of analysis) and in applying MCA to the resulting table [12]. Data can then be organized in a table such as Table II, where a time-series is represented by as many rows as it has time samples or time windows. This method is simple and leads to a rather easy interpretation of the results. However, the temporal dimension is not explicitly exploited by the subjacent model of data representation (e.g. the same factorial axes would be obtained by introducing the lines on the analysis table in any particular order). Details and examples on the computation and interpretation of MCA can be found in [13]. With this convention, each time-series (i.e., in this study, each three-month length period of a given patient) is represented by a trajectory onto the factorial plane.

TABLE II

Construction of the table Z for multiple correspondence analysis (MCA) applied to fuzzy coded data and implicitly exploiting the temporal dimension.

VariablesVn
Fuzzy space windowsAi,n
TS1 at fuzzy time window 1μWi,1n(TS1)
TS1 at fuzzy time window 2μWi,2n(TS1)
TSk at fuzzy time window jμWi,jn(TSk)

Ai,n is the fuzzy space window corresponding to the ith modality of the nth variable, TSk is the kth time-series and μWi,jn (TSk) is the membership value of the nth variable to the fuzzy time-space window Wi,jn for the kth time-series.

One of our objectives is to cluster the patients according to the evolution of their trajectories in the factorial plane defined by MCA. To facilitate this clustering, it is possible to smooth the trajectories in the factorial plane by applying a weighted and smoothed temporal average on the table Z analyzed by MCA. This method is named smoothed multiple correspondence analysis (SMCA) and has been introduced by Benali and Escofier [14]. The final table of analysis is S = P · Z, where P = (pij)(i,j)[set membership][1,L]2 is a proximity matrix denning the weighted and smoothed temporal average and is such as j=1Lpij=1.

IV. Application of the fuzzy coding and multidimensional analysis on the recorded database

The first two steps of the proposed methodology, namely the fuzzy spatio-temporal coding and the SMCA, are applied to the time-series available in the recorded database. In this study, a fuzzy window set T = {T1, · · ·, TJ, · · ·, TNT}, where each Tj is 7-day long, is considered. The length of the fuzzy time windows has been chosen by considering the patients’ behaviors, being quite similar from one week to another. Each trajectory, representing a three-month length period (i.e. around 13 7-day long), links then around 13 points.

For each variable Vn, NA = 3 modalities are considered: “Low” corresponding to the spatial fuzzy window A1,n = [−∞, median(Vn)], “Medium” corresponding to A2,n = [prctile(Vn, 2.5), prctile(Vn, 97.5)] and “High” corresponding to A3,n = [median(Vn), +∞], where prctile(Vn,p) is the pth percentile of the variable Vn. The 3 spatial fuzzy windows are denoted with the suffixes “−H” for the higher level (modality), “−M” for the medium level and “−L” for the lower level.

The elements of the proximity matrix P are defined as:

pij=pji={0.2ifj=i+10.1ifj=i+20.05ifj=i+31k=1,kjLpikifj=i0otherwise
(2)

performing a temporal average of the statistical individuals and respecting the condition j=1Lpij=1.

The protocol provided 58 records for 41 patients. SMCA is thus applied to 793 statistical individuals related to these 41 patients, i.e. to an array of 793 rows and 39 columns (13 variables with NA = 3 spatial fuzzy windows).

The 1st and 2nd factorial axes represent respectively 68.0% and 15.7% of the total variance, showing that the majority (83.7%) of the information is contained in the first factorial plane. So a great part of the variance of the initial data can be represented by only two factorial axes when 13 variables were initially considered. These first two factors define synthetic indices for patient follow-up but their interpretation, from physiological and functional points of view, has to be performed. Consequently, this study will focus on the clustering of trajectories on this factorial plane.

The projections of the variables and individuals of analysis on the first factorial plane of the SMCA are provided in Figures 3 and and4,4, respectively. The first axis is mainly defined by the lower (−L) and higher (−H) modalities of ImpE, DurE, NbCardCycE, NbBreathsE, ACCE and ImpRate, which reflect the time spent in exercise and the intensity of the efforts made by the patient. Consequently, in Figure 4, the more the individuals are located to the right of the plane, the lower is the time spent in exercise and the less important are the efforts they make. The second axis is mainly defined by the medium (−M) and extreme (−H, −L) modalities of ImpMinVentE, ImpOverAcc, ImpR and HeartRateE, which define the ventilation activity in terms of amplitude, frequency and flow rate, especially in rest. This axis can be interpreted as an “axis of cardiovascular efficiency”. In Figure 4 the more the individuals are located to the lower part of the plane, the less important is their ventilation in rest (i.e. their cardiovascular system is more “efficient”), independently of the daily activity duration and intensity (i.e. of the position along the first axis).

An external file that holds a picture, illustration, etc.
Object name is halms333691f3.jpg

Variables of analysis represented on the first plane of the Smoothed Multiple Correspondence Analysis (SMCA). For each variable, only the first (Low, −L) and the third (High, −H) levels are labelled, unlabelled squares correspond to the second levels (Medium, −M).

An external file that holds a picture, illustration, etc.
Object name is halms333691f4.jpg

Individuals of analysis represented on the first plane of the Smoothed Multiple Correspondence Analysis (SMCA). Each record of each patient (corresponding a three-month period) is represented by one trajectory. The first point in time for each trajectory is marked up by a circle.

As it can be seen on Figure 4, trajectories present different locations and evolutions on the factorial plane, with a high overlapping.

V. Clustering trajectories on the factorial plane

The aim of the present study is to cluster patients according to their clinical state during the follow-up period, which, in terms of methodology, corresponds to the clustering of trajectories on the factorial plane according to their location and evolution. Considering the characteristics of the trajectories, the clustering methodology has to address the following issues:

  • Unsupervised: no a priori knowledge on the clusters is required.

  • Similarity measure: relevance in comparing trajectories on the factorial plane having different numbers of points and possibly subjects to nonlinear spatio-temporal deformations.

  • Location and evolution: both locations and evolutions of the trajectories on the factorial plane have to be taken into account, as they are both informative.

Each of the previous three points is described in the following sections.

A. An appropriate clustering algorithm

Among the unsupervised clustering algorithms, k-means and agglomerative hierarchical clustering (AHC) are two classical methods.

The k-means algorithm implies the definition of centroids for each cluster, which in the present study is not trivial as the considered objects are trajectories possibly constituted of different numbers of points within the same cluster. The k-means method seems then not relevant for this particular problem.

Agglomerative hierarchical clustering only requires the definition of the dissimilarity matrix between the objects (i.e. the trajectories) and of the aggregation link. It is chosen as the clustering technique is the present study with the complete link as an aggregation link.

B. A relevant similarity measure between trajectories

The main difficulty is the definition of a similarity measure corresponding to the trajectories on factorial planes. As mentioned above, these trajectories can potentially be subject to deformations and are constituted of different numbers of points. Consequently, the Euclidian distance is not suitable. Among the measures of dissimilarity, dynamic time warping (DTW) and longest common subsequence (LCSS) both allow stretching in time and comparing time-series of different lengths. DTW has been widely used as a measure of dissimilarity in time-series clustering, indexing and retrieving [15], in speech or handwriting recognition [16]. LCSS has also been studied as a similarity measure for heterogeneous multivariate time-series or for multidimensional trajectories [17]. DTW presents the advantage over LCSS to be non-parametric and seems thus more appropriate for unsupervised clustering.

The computation of the DTW vector can be adapted in two dimensions to deal with two trajectories instead of two time-series. Given one trajectory Trajk = {(Trajk(xi),Trajk(yi))}i[set membership][1, m] constituted of m data points and one trajectory Trajl = {(Trajl(xj),Trajl(yj))}j[set membership][1,n] constituted of n data points, the DTW vector is denoted DTWk,l(i, j). It is defined according to the equation:

DTWk,l(i,j)=Dk,l(i,j)+min[DTWk,l(i1,j1),DTWk,l(i,j1),DTWk,l(i1,j)]
(3)

where Dk,l(i, j) is the Euclidian distance between the coordinates (Trajk(xi),Trajk(yi)} and (Trajl(xj),Trajl(yj)). Then, the distance DTW between the two trajectories is DTWk,l = DTWk,l(m, n).

C. A similarity measure considering both locations and evolutions of the trajectories in the factorial plane

The positions of the modalities on the factorial plane in Figure 3 indicate that two trajectories with similar shapes but located at different positions on the factorial plane are related to different modalities. This observation underlines the fact that both their location and their evolution, i.e. both their coordinates and their derivatives, in the factorial plane are informative to cluster similar trajectories and to compute the dissimilarity matrix. The DTW can then be used with the following modifications.

Given the kth trajectory Trajk, its derivative is:

dTrajk={(Trajk(xi)Trajk(xi1),Trajk(yi)Trajk(yi1))}i[1,m].
(4)

The DTW vector dDTWk, l(i, j) between the two derivatives dTrajk and dTrajl is defined according to equation 3 and the distance DTW between the two derivatives is dDTWk,l = dDTWk,l(m,n).

Thus in this study, given Deuclidk,l¯=euclid(mean(Trajk),mean(Trajl)) the Euclidian distance between the means of the coordinates of Trajk and Trajl in the factorial plane, the dissimilarity measure between the two trajectories Trajk and Trajl is defined as:

DMk,l=dDTWk,l+Deuclidk,l¯,
(5)

where dDTWk,l takes into account the derivatives of the trajectories and Deuclidk,l¯ their locations.

The N × N dissimilarity matrix used for the AHC is then DM = {DMk,l}k,l[set membership][1,N]2, N being the number of trajectories to be clustered. After computation of the AHC with complete linkage and the dissimilarity matrix DM, a dendrogram is obtained.

D. Cluster validity criterion

In AHC, in order to choose the threshold of cut in the dendrogram (i.e. the number of clusters) and to verify the validity of the clustering, cluster validity indices are used (for a review, see [18]). As no information on the data is available (unsupervised clustering), only internal validation indices are suitable. They are based on computing the properties of the resulting clusters, as the intra- and inter-cluster distances. The internal validation index used in the study is the mean silhouette value, denoted S and described in [19]. S is in the interval [−1, +1], where values close to −1 indicate a wrong clustering and values close to +1 indicate a correct clustering.

VI. Results obtained with the clustering

Before applying the proposed clustering method, namely agglomerative hierarchical clustering with complete linkage and the dissimilarity matrix based on dynamic time warping, it is tested on two simulated datasets, described in the following sections.

A. Tests on simulated datasets

The aim of the first test is to explore the capability of the proposed similarity measure to take into account both locations and evolutions of the trajectories on the factorial plane. The second test interests in the complete methodology described in Figure 1, from the fuzzy space-time coding to the clustering.


1. Test on the similarity measure

From two trajectories selected among the trajectories obtained with the real dataset (cf. Figure 4), three prototypes of trajectories are computed, the third being obtained by inverting the time samples of the first one.

Nine trajectories are simulated from the first prototype at different locations in the factorial plane, 6 from the second prototype and 12 from the third prototype. For each trajectory, a white noise is added to obtain slightly different trajectories for the same prototype. The simulated dataset of 27 artificial trajectories is represented in Figure 5. Figure 5 shows that the simulation reproduces the characteristics of the real dataset, namely the overlapping of several trajectories of different shapes at different locations on the factorial plane.

An external file that holds a picture, illustration, etc.
Object name is halms333691f5.jpg

Simulated dataset of 27 trajectories obtained from three real trajectories and represented on the first factorial plane. The first point in time for each trajectory is marked up by a circle.

The clustering method proposed above is applied to the simulated dataset with three dissimilarity measures based on the DTW which is alternately computed on: i) the coordinates of the trajectories, ii) the derivatives of the trajectories, iii) the derivatives with addition of the Euclidian distance between the means of the coordinates as proposed in equation 5.

Figure 6 illustrates the resulting dendrogram for each dissimilarity measure and provides the mean silhouette value against the number of clusters. A dendrogram is a tree-like plot where each step of hierarchical clustering is represented as a fusion of two branches of the tree into a single one. The branches represent clusters obtained on each step of hierarchical clustering. This representation eases the choice of the number of clusters. The dendrogram is cut at the threshold (horizontal dashed line) producing the number of clusters corresponding to the maximum mean silhouette value.

An external file that holds a picture, illustration, etc.
Object name is halms333691f6.jpg

Dendrograms and mean silhouette value vs. the number of clusters, for three dissimilarity measures based on the DTW. The clustering is applied to the simulated dataset of 27 trajectories. The dendrogram is cut at the threshold whose value is obtained after the maximum of the mean silhouette.

For each of the three measures, the mean silhouette value S presents one global maximum, indicating a value for the number of clusters K to be chosen, and leads to a different clustering. For the DTW on coordinates, maxK S is obtained for K = 3 and it tends to group the trajectories close in the sense of their location on the factorial plane. For the DTW on derivatives, maxK S is obtained for K = 3, but it regroups trajectories only according to their evolution. For the DTW on derivatives with the addition of the Euclidian distance between coordinates, maxK S is obtained for K = 9 and each obtained cluster is composed of trajectories with similar evolutions and close locations. Despite the overlapping of several trajectories with different evolutions at the same location, the proposed dissimilarity measure is able to group similar trajectories. Moreover, it is suitable to distinguish between shapes (independent of any notion of time) and evolutions (with start- and end-points), as trajectories created with the first prototype and with the third prototype are assigned to two different clusters, even if they have similar shapes.

2. Test of the complete method

In this section, the noise robustness of the proposed methodology, described in Figure 1, is tested. Among the trajectories obtained with the real database (cf. Figure 4), seven trajectories with different evolutions and locations are selected. The corresponding time-series (i.e. the 13 variables of analysis) are retrieved. For each selected trajectory, a white noise is added independently to each of its 13 variables with a given signal-to-noise ratio (SNR). The original time-series and 10 realizations of the noisy time-series are coded by fuzzy space-time coding and constitute the individuals of analysis for the SMCA. The 77 resulting trajectories are then clustered by AHC with the DTW on derivatives with the addition of the Euclidian distance between coordinates as a dissimilarity measure (cf. equation 5). Figure 7 presents the mean silhouette value against the number of clusters for SNR = 3 dB, and the clusters obtained for the maximum value of the silhouette value being K = 7. Despite the noise, the correct number of clusters is determined by the internal validation indices, the trajectories being grouped according to their evolution and location.

An external file that holds a picture, illustration, etc.
Object name is halms333691f7.jpg

Test of the complete method on 77 trajectories computed from 7 selected trajectories of the real dataset. For each selected trajectory, a white noise is added independently to each of its 13 variables with a given signal-to-noise ratio (SNR = 3 dB), and the resulting 77 trajectories are clustered. For each cluster, individuals of analysis (in gray) and individuals of the given cluster (in black) are represented in the first plane of the SMCA. The first point in time for each trajectory is marked up by a circle.

It appears that above SNR = 0 dB the clustering is correct and is not disturbed by the noise added to the time-series and that under SNR = 0 dB, S is unable to provide the correct value of K. The method seems then robust to noise on the variables of analysis, which can be explained by two of its steps: the time fuzzy coding and the smoothing performed during the SMCA. The mean silhouette value increases when the length of the time fuzzy windows increases and is higher with the SMCA than with the MCA. The averaging performed by both the fuzzy coding and the SMCA smoothes the time-series and the trajectories, respectively, improving the performance of the clustering.

The previous two tests, performed on datasets close to the real database, have proven the validity of the proposed approach in terms of clustering methodology and dissimilarity measure relevant for the processed trajectories, and of noise robustness.

B. Performance of the clustering on the recorded database

In this section, the recorded database, constituted of 58 trajectories on the first factorial plane, is clustered with the proposed approach. Figure 8 provides the mean silhouette value S against the number of clusters K. The number of clusters is chosen at K = 10, after the maximum value of S. The resulting clusters are provided in Figure 8, where for each cluster individuals of analysis are in gray and individuals of the given cluster are in black. One can notice that trajectories within a given cluster have visually similar evolutions and close locations.

An external file that holds a picture, illustration, etc.
Object name is halms333691f8.jpg

Mean silhouette value S against the number of clusters K. Ten clusters are determined with the proposed methodology for the recorded database (maxK S for K = 10). For each cluster, individuals of analysis (in gray) and individuals of the given cluster (in black) are represented in the first plane of the SMCA. Each record of each patient (corresponding to a three-month period) is represented by one trajectory. The first point in time for each trajectory is marked up by a circle.

To evaluate the methodology, the resulting clusters have to be compared with the appreciation of the physicians on the evolution of the patients during each of the three-month length periods. For the present protocol, information is available on the global state of each patient, updated at the end of each 3-month period of the follow-up, and each “adverse event” is reported (date and type). In the present study, the records with adverse events other than cardiac events are discarded, as a non cardiac event like a fall or a bronchitis can have very different effects on patients with RHF. As no everyday report of patients’ clinical state is available, the difficulty resides in defining the actual evolution of a given patient: has a patient undergoing a cardiac adverse event at the beginning of his/her three-month length period and recovering rapidly after hospitalization a favorable or an unfavorable evolution? In this context, indices like the specificity or the sensitivity are difficult to compute. Consequently, the present study can only focus on the relation between the different areas of the factorial plane defined by SMCA and the health of the patients as reported at the end of each follow-up period. More data would be necessary to study the evolutions of patients on this factorial plane.

According to the analysis performed on the position of the individuals of analysis relatively to the position of the modalities of analysis (cf. Figures 3 and and4),4), the evolution of a patient (during one of his/her three-month length period) is a priori i) favorable if the corresponding trajectory is located on the left of the plane or evolves from the right to the left of the plane, and ii) unfavorable if the corresponding trajectory is located on the right part of the plane or evolves from the left to the right of the plane. The study of each cluster is necessary to confirm this division of the factorial plane into several areas with different meanings in terms of patient’s health.

On the left of the plane, clusters 2, 4 and 5 contain 13 trajectories in total. The medical reports indicate that all the corresponding patients were, during the given period, in a favorable state. On the right of the plane, clusters 8 and 9 contain 5 trajectories in total, associated with patients all undergoing a health deterioration during the given period. These five clusters enable the definition of two distinct areas in the factorial plane, the left one associated with a favorable state and the right one with a health deterioration.

The other five clusters are interesting as their trajectories show transitions from one area to the other. Cluster 1 comprises 27 trajectories, with small evolutions on the upper half-plane, overlapping both right and left half-planes. During the given period, patients whose trajectories belong to cluster 1 were all in good health, except one patient. According to the medical records on this patient, he was tired during the period of interest (3 months post-op) and died 1 month after the end of this period, the date of his/her adverse event is not reported. Figure 9 shows that the corresponding trajectory (solid line) is evolving from the upper-plane to the right of the plane, with a loop around the 7th post-op week. The trajectory in cluster 3 is very specific, evolving bottom-to-top firstly and right-to-left secondly. This patient underwent an adverse event (heart failure) 28 days after the beginning of the period of interest, corresponding to the changing of direction in his/her trajectory, and recovered rapidly which explains why the trajectory evolves from right to left. Cluster 6 groups 6 trajectories evolving in the same direction (bottom-right to top), associated with patients all, except one, presenting a favorable state during the given period. This patient (bold line in Figure 9) underwent a cardiac adverse event around 2 months after the beginning of the period of interest. He was hospitalized for 1 week and recovered rapidly, as can be seen from his/her trajectory finally evolving right to left. In Figure 9, the arrow corresponds to the reported date of the adverse event, and one can notice that a changing in the direction of the trajectory has occurred around 5 weeks before the adverse event. Cluster 7 comprises 2 trajectories evolving from top-right to top-left of the plane and corresponding to one patient with a favorable evolution and one patient with an unfavorable evolution. This patient underwent a cardiac adverse event around 2 weeks before the end of the period of interest, as can be seen with the abrupt U-turn in his/her trajectory (dashed line in Figure 9). Cluster 10 groups 4 trajectories with two different evolutions, two trajectories have very small dynamics, associated with patients in good health, and the two others have large evolutions from the top to the right of the plane, associated with patients undergoing adverse events during the given period.

An external file that holds a picture, illustration, etc.
Object name is halms333691f9.jpg

Three specific trajectories of patients undergoing an adverse event during the period of interest. Solid line: trajectory in cluster 1, bold line: trajectory in cluster 6, dashed line: trajectory in cluster 7. Reported dates of adverse events are indicated by an arrow. The first point in time for each trajectory is marked up by a circle.

The clustering performed on the real dataset provides 10 clusters, grouping trajectories with similar evolutions and close locations, as expected. The obtained clusters seem consistent with the medical records of the patients, but no quantitative evaluation is available. Two areas in the factorial plane have been identified, the bottom-right quarter-plane is related with a health degradation and the bottom-left one with a stable clinical state. The other two areas, although being distinguished by the clustering algorithm, are not so easily identifiable.

VII. Discussion and conclusion

This study was designed to show the informative potential of acceleration and impedance data recorded in implantable devices and to evaluate the appropriateness of a multiple correspondence analysis in this particular context. A clinical protocol has been designed to provide data on patients that suffer from heart failure.

MCA has been chosen as it is a multivariate method that exhibits linear and non-linear relationships between variables. However, MCA demands to transform continuous variables into nominal ones, i.e. to code amplitude and time domains of the variables by means of crisp or fuzzy modalities. MCA does not explicitly exploit time, but the temporal dimension is introduced in this study with a basic solution that consists in representing each time sample (or time window) of a time-series by one statistical individual (a row of the table of analysis) and to perform MCA. Statistical individuals are represented by trajectories onto the factorial plane and their temporal evolution can then be exploited. Consequently, the proposed method can be useful for graphically follow the evolution of a given patient’s state by means of a synthetic representation that takes into account the most pertinent information in the data.

In the present study, it has been possible to evaluate and discuss the synthetic indices provided by the first two factorial axes of the MCA and to explain them according to functional and physiological points of view. The 58 records provided by the clinical protocol have been represented by trajectories on the first factorial plane of the MCA. The definition of an appropriate similarity measure has been discussed and has enabled the clustering of the trajectories into groups of trajectories with similar locations and evolutions on the factorial plane. The proposed distance has been validated on simulated datasets and enables the clustering of trajectories in the factorial plane according to their location and/or their evolution depending on the problem to be solved. Experiments on simulated datasets regarding the noise robustness also show the relevance of the association of fuzzy spatio-temporal coding and smoothed MCA. The proposed method is robust to white noise with SNR as low as 3 dB.

A database constituted of clinical observations from 41 patients has also been analyzed by using a data mining approach so as to characterize the data. Two areas in the factorial plane, corresponding to two large groups of patients, have been identified, the bottom-right quarter-plane being related with a health degradation and the bottom-left one with a stable clinical state. Most of the trajectories projected on this first factorial plane were correctly clustered according to their location and shape. Discussed individually, these clusters were efficient in grouping trajectories corresponding to similar patients’ clinical state. The present study has shown that patients undergoing an adverse event often present trajectories with abrupt variations. A detection of such phenomena would permit to identify patients with a critical evolution.

In the future, additional data would enable the identification and detection of typical evolutions related to health deterioration. Rules may be extracted from the location of modalities on the factorial plane, and enable the definition of thresholds on time-series to generate alarms associated with adverse events or health deterioration. These alarms could be sent from the pacemaker, via a data communication device (such as a mobile phone, PDA, etc.), to a telemonitoring center. These results are encouraging and may be useful for the definition of new selection criteria of candidate patients for CRT. Finally, the proposed methodology can be generalized to other monitoring problems.

Biographies

• 

Marie Guéguin received the telecommunications and signal processing engineering degree from the École Supérieure d’Ingénieurs en Électronique et Électrotechnique (ESIEE Paris), France in 2003, the Master’s degree in Biomedical Engineering from the University of Paris 12, France in 2003 and the PhD degree in signal processing and telecommunications from the University of Rennes 1, France in 2006. She is currently working as a postdoctoral fellow in Laboratoire Traitement du Signal et de l’Image (LTSI), INSERM U642, at the University of Rennes 1, France. Her research interests are signal processing, data analysis, and modeling methods. Her work has been applied to auditory evoked potentials, objective assessment of speech quality and cardiology.

An external file that holds a picture, illustration, etc.
Object name is halms333691b1.gif

• 

Emmanuel Roux received the PhD degree in industrial and human automation and computer science from the University of Valenciennes, France, in 2002. His PhD research work has been performed at the CNRS, LAMIH (UMR CNRS/Université de Valenciennes et du Hainaut-Cambrésis). He worked as a Postdoctoral Fellow at the INSERM, Epidemiology and Biostatistics unit (U780-IFR 69), Villejuif, France, at the Université de Rennes 1, LTSI (INSERM U642), Rennes, France, and at the IRD, LMTG (UMR CNRS/Université de Toulouse III/IRD), Toulouse, France. He is currently researcher at the IRD, ESPACE unit (S140), Montpellier, France. His research interests are multivariate and heterogeneous data analysis and mining, with descriptive multivariate and machine learning approaches. His work has been applied to biomechanics, pharmacovigilance, cardiology and he is currently interested in the relationships between the environnement, the ecology and health.

An external file that holds a picture, illustration, etc.
Object name is halms333691b2.gif

• 

Alfredo I. Hernández received the M.S. degree in electronic engineering (biomedical option) from the Simón Bolívar University in 1996 in Caracas, Venezuela and his Ph.D. degree in signal processing and telecommunications in 2000 from the University of Rennes 1, France. He is working since 2001 as a full-time researcher at the French National Institute of Health and Medical Research (INSERM) with the Signal and Image Processing Laboratory (LTSI) of the University of Rennes 1. His research interests are in biomedical digital signal processing and model-based biosignal interpretation.

An external file that holds a picture, illustration, etc.
Object name is halms333691b3.gif

• 

Fabienne Porée received the PhD degree in signal processing and telecommunications from University of Rennes 1, France, in 2001. She is Associate Professor at the Institut Universitaire de Technologie de Rennes. She is working in Laboratoire Traitement du Signal et de l’Image (LTSI), INSERM U642, at the University of Rennes 1. Her research interests include biomedical signal processing and analysis and statistical processing methods.

An external file that holds a picture, illustration, etc.
Object name is halms333691b4.gif

• 

Philippe Mabo is professor of cardiology since 1992 and Chief of the department of cardiology of the University Hospital of Rennes, France, since 2004. He received his M.D. degree from the University of Rennes in 1987 and worked in the department of cardiology of the University Hospital of Rennes successively as assistant professeur and full time physician from 1987 to 1992. His predominant clinical activities and research interests concern the electrophysiology, the implantation and follow-up of the implantable cardiac devices, and the cardiac pacing especially in the field of “new indications” and “new techniques”. Pr. Mabo is member of the French Society of Cardiology, the European Society of Cardiology, the European Heart Rhythm Association and the Heart Rhythm Society.

An external file that holds a picture, illustration, etc.
Object name is halms333691b5.gif

• 

Laurence Graindorge is a clinical engineer in ELA Medical company (now Sorin Group) since 1994. She is working on the design of new algorithms and new sensors for pacemakers and defibrillators. Her work interests also include phase I and phase II clinical evaluation of implantable devices.

An external file that holds a picture, illustration, etc.
Object name is halms333691b6.gif

• 

Guy Carrault received his PhD in 1987 in Signal processing and telecommunications from the Université de Rennes 1. He is working in the Signal and Image Processing laboratory of Université de Rennes 1 since 1984. He is currently professor at the Institut Universitaire de Technologie de Rennes. His research interests include detection and analysis of electrophysical signals by means of nonstationnary and statistical processing methods, as well as intelligent instrumentation design.

An external file that holds a picture, illustration, etc.
Object name is halms333691b7.gif

References

1. Cazeau S, Alonso C, Jauvert G, Lazarus A, Ritter P. Cardiac resynchronization therapy. Europace. 2004;5(Suppl 1):S42–S48. [Abstract] [Google Scholar]
2. Cazeau S, Leclercq C, et al. Effects of multisite biventricular pacing in patients with heart failure and intraventricular conduction delay. N Engl J Med. 2001;344:873–880. [Abstract] [Google Scholar]
3. Leclercq C, Kass DA. Retiming the failing heart: principles and current clinical status of cardiac resynchronization. J Am Coll Cardiol. 2002;39:194–201. [Abstract] [Google Scholar]
4. Germany R, Murray C. Use of device diagnostics in the outpatient management of heart failure. Am J Cardiol. 2007;99:11G–16G. [Abstract] [Google Scholar]
5. Roux E, Hernandez A, Graindorge L, Carrault G, Mabo P. Multivariate analysis of follow-up physiological data recorded by cardiac implantable devices. Computers in Cardiology. 2006;33:765–768. [Google Scholar]
6. Guéguin M, Roux E, et al. Clustering follow-up time-series recorded by cardiac implantable devices. 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society; 2007. pp. 3848–3851. [Europe PMC free article] [Abstract] [Google Scholar]
7. Antunes C, Oliveira A. Temporal data mining: an overview. Workshop on Temporal Data Mining at the 7th International Conference on Knowledge Discovery and Data Mining (KDD); 2001. [Google Scholar]
8. Srivatsan L, Sastry PS. A survey of temporal data mining. Sadhana. 2006;31(2):173–198. [Google Scholar]
9. Simon R, Ni Q, et al. Comparison of impedance minute ventilation and direct measured minute ventilation in a rate adaptative pacemaker. PACE. 2003;26:2127–2133. [Abstract] [Google Scholar]
10. Bonnet J, Geroux L. Active implantable medical device having a control function responsive to at least one physiological parameter. Patent US 5 722 996 Al. 1998.
11. Bonnet J. Implantable active medical device enslaved to at least one physiological parameter. Patent US 6 336 048 Bl. 2002.
12. Loslever P, Bouilland S. Marriage of fuzzy sets and multiple correspondence analysis: Examples with subjective interval data and biomedical signals. Fuzzy Sets and Systems. 1999;107:255–275. [Google Scholar]
13. Abdi H, Valentin D. Multiple correspondence analysis. In: NS, editor. Encyclopedia of Measurement and Statistics. Thousand Oaks (CA): Sage; 2007. pp. 651–657. [Google Scholar]
14. Benali H, Escofier B. Smooth factorial analysis and factorial analysis of local differences. In: Coppi R, Bolasco S, editors. Multiway Data Analysis. North-Holland: 1989. pp. 327–339. [Google Scholar]
15. Liao TW. Clustering of time series data - a survey. Pattern Recognition. 2005;38:1857–1874. [Google Scholar]
16. Myers C, Rabiner L. A comparative study of several dynamic time-warping algorithms for connected word recognition. The Bell System TechnicalJournal. 1981;60(7):1389–1409. [Google Scholar]
17. Vlachos M, Kollios G, Gunopulos D. Discovering similar multidimensional trajectories. 18th International Conference on Data Engineering (ICDE); San Jose, CA, USA; 2002. pp. 673–684. [Google Scholar]
18. Brun M, Sima C, et al. Model-based evaluation of clustering validation measures. Pattern Recognition. 2007;40:807–824. [Google Scholar]
19. Rousseeuw P. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics. 1987;20:53–65. [Google Scholar]

Citations & impact 


Impact metrics

Jump to Citations

Citations of article over time

Alternative metrics

Altmetric item for https://www.altmetric.com/details/35403351
Altmetric
Discover the attention surrounding your research
https://www.altmetric.com/details/35403351

Smart citations by scite.ai
Smart citations by scite.ai include citation statements extracted from the full text of the citing article. The number of the statements may be higher than the number of citations provided by EuropePMC if one paper cites another multiple times or lower if scite has not yet processed some of the citing articles.
Explore citation contexts and check if this article has been supported or disputed.
https://scite.ai/reports/10.1109/tbme.2008.926673

Supporting
Mentioning
Contrasting
0
1
0

Article citations