1 Introduction

Atrial fibrillation (AF) is the most common arrhythmia encountered at advanced age. The prevalence of AF remains lower than 1% among the general population, but increases considerably from the age of 60 years [15]. Paroxysmal AF (PAF) is often the precursor to sustained AF [20]. Given that sustained AF increases the likelihood of suffering myocardial infarctions and strokes [15], its accurate recognition by means of non-invasive techniques is of great interest to regular clinical practice. The prediction of PAF maintenance can help to choose the appropriate intervention that may terminate the arrythmia and prevent its chronification, given that approximately 18% of patients who have intermittent AF degenerate into permanent AF 4 years later [1]. Otherwise, the prediction of the spontaneous termination of PAF episodes could avoid unnecessary therapies and their associated clinical costs [5].

The PhysioNet/Computers in Cardiology Challenge 2004 [25] marked the start of continuous attempts to predict the early termination of PAF episodes by using the electrocardiogram (ECG). The challenge database will be extensively explained later in Sect. 2. Several groups based their study on the atrial activity (AA) overall peak frequency plus additional spectral characteristics as the main peak power [29] or time–frequency pattern [26]. Other groups tried to predict the evolution of PAF episodes by means of linear classifiers based on the main peak frequency and the mean RR interval [9, 16]. In [28], the fibrillation frequency, fibrillation amplitude and exponential decay are extracted from a frequency-shifted and amplitude-scaled version of a log-spectral profile. The challenge is approached in [24] from a clinician’s point of view by using parameters such as the f-wave polarity, the f-wave peak interval or the amplitude modulation of AA, and a support vector machine is used as classifier. A recent publication extends the work carried out in [9] by including stepwise discriminant analysis and a greater number of spectral parameters [12].

In this work, an analysis of AA spectral parameters organization is carried out with the aim of predicting PAF episode termination. We centered our work on disclosing the differences in spectral parameters organization between terminating and non-terminating PAF episodes. The used database was suitable for our objective, because it allowed us to study the signals 1 s before the eventual transition from PAF to normal sinus rhythm. Nevertheless, this study must be seen just as a starting point for predicting the PAF evolution within several hours or days, which would have greater clinical significance. The analysis of spectral parameters is made in terms of mathematical regularity of their series. The organization is measured by using the entropy estimator sample entropy (SampEn) [30, 32]. Entropy estimators have already been used in the characterization of biomedical signals different from ECG [17]. The promising innovation of the present study is the application of SampEn to a group of direct and derived spectral parameters with the aim of predicting the termination of AF episodes with notable accuracy.

The paper is organized as follows. In Sect. 2, we describe the database used for the analysis. Sect. 3 presents the global process and defines the spectral parameters. The results obtained by the analysis are reported in Sect. 4. We make a discussion on the results in Sect. 5. The limitations of the study are explained in Sect. 6. Finally, the conclusions of the study are drawn in Sect. 7.

2 Materials

For the present work, we used the database of PhysioNet/Computers in Cardiology Challenge 2004. Each recording in the database is a 1 min segment of AF that has been extracted from a long-term ECG recording. Lead V1 was chosen from each recording since previous works have shown that AF is dominant in this lead [29]. The database is divided into a learning set and a test set. The learning set contains ten segments of non-terminating AF (group N) and ten segments of AF that terminate within 1 s after the end of the record (group T). The test set contains 30 records, approximately half of them belonging to group N and the rest to group T.

Butterworth filtering of eight order and pass-band from 1 to 50 Hz is applied to each recording. The original sampling rate (f s ) of the Holter system was 128 samples per second, but the ECG recordings were interpolated by a factor of 8 using cubic spline fitting, so that a f s equal to 1,024 resulted. The resultant time-domain higher resolution allowed us to obtain a better cancellation of the QRS complex and a higher length of parameter sequences.

3 Methods

3.1 Overall process

The ECG recording analysis is completed in five main steps: extraction of the AA, computation of the spectrogram, curve fitting, construction of spectral parameter series and SampEn computation. The SampEn values are assessed by using univariate and discriminant analyses.

In order to use the ECG as a suitable tool for the analysis of AF, we need to separate the AA from other cardioelectric signals. The extraction of the AA during AF requires nonlinear signal processing since spectra of atrial and ventricular activities (VA) overlap and, in consequence, they cannot be separated by simple linear filtering [8]. One common approach to obtain the AA is average beat substraction (ABS) [34], which relies on the fact that AA and VA are uncoupled during AF episodes [8]. Average beat substraction (ABS) is the most extended and accepted technique for AA extraction when, as in this work, only one lead is used. Therefore, ABS was chosen among other extraction techniques like independent component analysis (ICA) [19, 33] or principal component analysis (PCA) [11, 31].

After the extraction of the AA signal, its spectrogram [3] is computed using Hamming windows of 1,024 samples in length and 75% overlap. An example of AA spectrogram is depicted in Fig. 1a. Then cubic spline fitting is applied to each frequency slice that constitute the spectrogram. In order to facilitate the spectral parameters extraction, the cubic spline fitting curve is interpolated so that the resulting frequency increment is 0.01 Hz. In this way, the spectral parameters of the spectrogram are obtained more accurately. Next, the local maxima and minima of the interpolated fitting curve are detected. Only singular points from the main peak to the end of the spectrum are taken into account. This is exemplified in Fig. 1b for a spectrogram slice at t = 50 s. Several mathematical series of parameters, described in Sect. 3.2, are constructed from these points. Finally, the SampEn of all aforementioned series is obtained as an estimation of their mathematical regularity. The size of series is around 600 elements, which is large enough since the SampEn is meaningfully applied to more than 100 data points [30]. All SampEn values were computed with four significant digits. Given that the length of the spectral parameter series is the same as the length of database recording, i.e. 1 min, the resultant series time resolution is 0.1 s.

Fig. 1
figure 1

Time–frequency plot of a typical AA signal. (a) AA spectrogram computed using Hamming windows of 1,024 samples in length and 75% overlap. (b) Spectrogram slice at t = 50 s, interpolated fitting curve, local maxima and minima, and spectral parameters f p1, f p2, A 1 and A 2

The spectrogram was selected among a collection of time–frequency distributions. We will elucidate this in Sect. 3.3. On the other hand, the selection of the cubic spline model is justified because in several preliminary trials it provided the best fitting in comparison with Gaussian, polynomial, rational, Weibull, power and exponential models. A cubic spline is a piecewise function of third-order polynomials [4, 27].

The SampEn values are evaluated by the t test and by discriminant analysis, the results of which are shown in Sect. 4. The objective persued by the discriminant analysis is to know if the reliability of the decision can be improved by any combination of the aforementioned parameters. The discriminant analysis is carried out in two stages. First, the discriminant function is obtained by considering the learning data set. Then, this discriminant line is applied to the test set in order to validate the results.

Given that the electrical remodeling present in the heart when AF occurs is a far-from linear process [6], the non-linear index SampEn was chosen to estimate the regularity of the series. Other non-linear indices, such as the approximate entropy (ApEn) [30], which has been previously used in several biomedical applications [32], were also tested. Nevertheless, the SampEn was the index that achieved the higher percentage of correctly classified AF episodes. The SampEn [32] measures the regularity level of the series. That is, they quantify how predictable series are depending on the number of times that repetitive patterns are present in them. The SampEn appears as a natural evolution of the ApEn with the aim of reducing the bias of this estimator [32]. The SampEn is defined as follows.

Let x[n] be a time series of length N. The distance between any two patterns of the series, X m (i), X m (j), of length m is defined as:

$$ d[X_m(i),X_m(j)]=\max_{k=0,\ldots, m-1}(|x(i+k)-x(j+k)|) $$
(1)

Given a pattern X m (i), we calculate B m i (r) as:

$$ B_i^m(r)={{1}\over {N-m-1}}B_i $$
(2)

where B i is the number of patterns of length m that fulfill d[X m (i), X m (j)] < r with 1 ≤ i, j ≤  Nm, j ≠ i, and r is the parameter that defines the criterion of similarity between patterns [30]. Next B m(r) is computed as:

$$ B^m(r)={{1}\over {N-m}}\sum_{i=1}^{N-m}B_i^m(r) $$
(3)

Defining A m i (r) and A m(r) in the same way as B m i (r) but considering now patterns of length m + 1, the SampEn is computed by the following expression:

$$ \text{SampEn}(m,r)=-\ln{\left\{{{A^m(r)}\over {B^m(r)}}\right\}} $$
(4)

3.2 Spectral parameters

In this subsection, we define the parameters used in the study. One relevant difference from other works, where similar parameters are used, is the application of SampEn to their series as an estimation of mathematical regularity. The first spectral parameter we consider is the main peak frequency (f p1), which is known to be highly relevant in the characterization of AF [7] (see Fig. 1). The second parameter is the main peak magnitude, A 1. The third and fourth parameters are the second largest frequency peak (f p2) and its related peak magnitude, A 2. The extraction of these four parameters can be visualized in Fig. 1. The fifth parameter is the spectral concentration (SC) around f p1. The SC is defined as [10]:

$$ \text{SC}=\frac{\sum_{{f=0.82}f_{p1}}^{1.17f_{p1}} P_{\text{AA}}(f)}{\sum_{f=0}^{0.5f_s} P_{AA}(f)} $$
(5)

where P AA is the power spectral density of the AA signal, f is the frequencies vector, f s is the sampling rate (1,024 Hz), and f p1 is the main peak frequency of the AA. The bandwidth considered for the SC computation is of 2 Hz for a typical f p1 of 6 Hz, which justifies the selection of the lower and upper limits of the sum [10]. Other two parameters related to the width of the spectrum main lobe have been used: the 3-dB width of the peak, w3 dB, and the power in the 3-dB band, pb3dB. These two last parameters have been used in [12] to characterize AF. Two derived parameters, Δf p and \(\bar{A}_2\) are referred to the spectral shape of AA. Similar parameters are used in [35]. The first derived parameter is the normalized distance between f p1 and f p2, which is expressed as:

$$ \Updelta f_{p}=\frac{{(f_{p1}-f_{p2})}}{{f_{p1}}} $$
(6)

The second derived parameter is the normalized amplitude of the second largest peak, which is defined as:

$$ \bar{A}_2=\frac{{A_2}}{{A_1}} $$
(7)

The deviation of the main and second peak frequencies from their respective mean values are also computed as a dispersion measurement:

$$ d_1=f_{p1}-{{\mathbf{E}}}(f_{p1}) $$
(8)
$$ d_2=f_{p2}-{{\mathbf{E}}}(f_{p2}) $$
(9)

where E(·) represents the average value over the periodogram. Finally, the median frequency (MF) is obtained as the center of mass of the spectrum:

$$ MF=\frac{\sum_{f=0}^{0.5f_s} |{\text{FT}}_{\text{AA}}(f)|\cdot f}{{\sum_{f=0}^{0.5f_s} f}} $$
(10)

where FTAA is the Fourier transform of AA. This parameter was previously used in other works to characterize the ventricular fibrillation [13].

3.3 Selection of the time–frequency distribution

Although time–frequency distributions had been previously used in forecasting the evolution of AF episodes [12, 26], the SampEn was for the first time applied to series of spectral parameters in the present work. Given that the optimal time–frequency distribution for the proposed method was a priori unknown, a preliminary study with the aim of choosing it was advisable. This preliminary study was made by considering several time–frequency distributions and a reduced number of spectral parameters. The parameters chosen to do this prior survey were f p1 and SC. The f p1 was chosen because it is known from previous works that f p1 contains information on the AF termination [26, 29]. More detailed results of this preliminary study can be found in [37]. The SC was selected because it is a significant feature of real AF recordings [10]. Once the optimal time–frequency distribution was identified, the number of spectral parameters was extended to 12.

The computed time–frequency distributions, all of them belonging to the Cohen’s class [3, 14], were the spectrogram (SP), Wigner–Ville (WV), pseudo-Wigner–Ville (PWV), Margeneau–Hill (MH), pseudo-Margeneau–Hill (PMH), Page (PG), pseudo-Page (PPG), Zhao–Atlas–Marks (ZAM) and Choi–Williams (CW). All these distributions were obtained with the same resolution of 1 s in time and 1 Hz in frequency. In Table 1, we present the t test SampEn bilateral significance for f p1 and SC computed for these nine time–frequency distributions. The spectrogram was the one that reached the least bilateral significance in both f p1 and SC cases, remaining lower than 0.001. In view of the significance values obtained, the spectrogram was chosen among the studied time-frequency distributions.

Table 1 SampEn bilateral significance for f p1 and SC between groups N and T computed for nine time–frequency distributions: spectrogram (SP), Wigner–Ville (WV), pseudo–Wigner–Ville (PWV), Margeneau–Hill (MH), pseudo-Margeneau–Hill (PMH), Page (PG), pseudo-Page (PPG), Zhao–Atlas–Marks (ZAM) and Choi–Williams (CW)

4 Results

The results of the t test applied to the SampEn of the numerical series for the learning set are summarized in Fig. 2. These results reveal that it is possible to distinguish between terminating and non-terminating AF in 6 of the 12 parameters, considering a parameter to be relevant when its bilateral significance is lower than 0.05. These six relevant parameters are f p1, f p2, Δf p , A 1, d 1 and SC, the bilateral significances of which are, respectively, 0.001, 0.005, 0.003, 0.004, 0.015 and 0.001. These results were computed for m = 2 and r = 0.25 because the optimal classification was reached with these values. In order to choose m and r, we compared the bilateral significance of every spectral parameter for m = 1 or m = 2 and r between 0.1 and 0.25 times the standard deviation of the time series, as suggested by Pincus [30]. As an example, the SampEn of f p1 bilateral significance for the tested values of m and r is presented in Table 2, which justifies the aforementioned choice of m and r. For the rest of the spectral parameters, the best outcomes were also achieved for m = 2 and r = 0.25. A more generic strategy for optimal selection of m and r is developed in [23].

Fig. 2
figure 2

Results of the t test for the SampEn of all the spectral parameters. (a) Mean and standard deviation of SampEn for groups N and T, (b) SampEn bilateral significance between groups. A parameter is considered relevant when its bilateral significance is lower than 0.05

Table 2 SampEn of f p1 bilateral significance for different tested values of m and r

The area under the receiver operative characteristic (ROC) curve for the relevant parameters are, expressed in the same order, 89.7, 76.3, 77.7, 79.9, 76.3 and 72.8%, where SampEn of f p1 stands out because of its highest value. In the rest of the parameters the bilateral significance obtained is higher than 0.05, and thus they are considered mathematically irrelevant. Subsequently, only the relevant parameters are considered for the discriminant analysis. The mean SampEn in type N recordings is higher than in type T recordings for all these relevant parameters. This means that spectral parameters of N recordings are less regular than those of T recordings. An example of this can be seen in Fig. 3 for parameter f p1.

Fig. 3
figure 3

(a) Type T and (b) type N f p1 series example with respective SampEn values 0.0153 and 0.1556. Higher SampEn indicates lower regularity of the signal

The learning set ROC curve for the SampEn of f p1 is depicted in Fig. 4, which has been fitted using the one-term exponential model. A decision threshold of 0.1173 was chosen to optimize the result for the learning test. By considering this value of threshold, 19 of 20 learning recordings were classified correctly. This decision threshold results from the smoothed ROC curve in a sensitivity of 91% and 1-specificity of 14%. Taking the same threshold for the test set, 26 of 30 recordings were classified correctly. This resulted in a percentage of recordings properly classified equal to 95% for the learning set and equal to 86.67% for the test set. The results obtained by this classification are presented in Fig. 5 for every recording. A higher dispersion of results is observed for the test set than for the learning set. Therefore, despite the difference in dispersion between learning and test sets, the decision threshold of 0.1173 was correctly chosen.

Fig. 4
figure 4

Learning set ROC curve fitted using the one-term exponential model for the SampEn of f p1. Decision point chosen for sensitivity 91% and 1−specificity 14%

Fig. 5
figure 5

Classification of type N and T episodes using a threshold value for the SampEn of f p1 equal to 0.1173. 95% of the learning set and 86.67% of the test set recordings were classified correctly

The previous analysis revealed that the SampEn of the spectral parameters f p1, f p2, Δf p , A 1, d 1 and SC have a bilateral significance lower than 0.05 and, in consequence, all of them are suitable for use in discriminant analysis, which was the next step taken. Furthermore, the SampEn of the AA was also computed and a mean difference of 0.2508, with the greatest mean value for the N group, and a bilateral significance equal to 0.004 were figured out by the t test. This fact suggested the inclusion of this parameter in the discriminant analysis along with the spectral parameters. This made it possible, in addition, to combine the information obtained from both time and frequency domains.

The cross-correlations for the SampEn of spectral significant parameters and AA are depicted in Fig. 6. The highest cross-correlation, 0.902, occurs between the SampEn of f p2 and Δf p . Also the pairs of variables f p1 and SC, A 1 and d 1 present high values of cross-correlation; thus, it is reasonable to expect in advance that these pairs of variables do not appear in the resultant discriminant model.

Fig. 6
figure 6

Cross-correlations for the SampEn of the spectral significant parameters and AA. Pairs of variables with high cross-correlation do not contribute to the resultant discriminant model

The homogeneity of covariance matrices was checked by the Box’s M test [18]. Since the significance of this test was low, 0.013, separate groups of covariance matrices were considered for the analysis. The variable selection was performed by forward stepwise analysis and minimization of the Mahalanobis’ distance [22]. The statistic F indicates which variable must be added to the model in each step. This variable must have the largest statistic F [18], with a minimum value that was set to 3.84 corresponding to a significance level of 0.054. The process was completed in three steps. The SampEn of f p1, Δf p and AA were added to the model by the discriminant analysis, the rest of the parameters being discarded because of their low F. As expected, none of the aforementioned pairs of variables with high cross-correlation appears in the final discriminant model.

The discriminant function is a plane given by the equation x 3 = −0.0355 · x 1 − 0.338 · x 2 + 0.4653, where x 1, x 2 and x 3 represent the SampEn of f p1, Δf p and the AA, respectively. The standardized canonical coefficients of the discriminant function are 1.880 for x 1, 0.816 for x 2, and 1.662 for x 3. This function is depicted in Fig. 7. Only two of the recordings were misclassified by this function. A small value of Wilk’s lambda test [18] significance (p < 0.001) was obtained, which indicates the great discriminatory ability of the function. All of the cases used to create the model, i.e. the learning set, were correctly classified. Regarding the test set, 15 of 16 type N cases and 13 of 14 type T cases were correctly classified (see Table 3). Expressing this results in percentages, 100% of the learning set recordings were classified correctly. In the test test, 93.75 % of N recordings and 92.85% of T recordings were classified correctly. The global percentage of test recordings properly classified was 93.33%.

Fig. 7
figure 7

Two-dimensional representations of the results taken by SampEn pairs of (a) AA and f p1, (b) AA and Δf p , (c) f p1 and Δf p , and 3D plot showing the hyperplane defined by the discriminant function. As much as 100% of the learning set recordings were classified correctly. In the test set, 93.75% of N recordings and 92.85% of T recordings were classified correctly

Table 3 Type N and T correctly classified recordings for both learning and test sets by using the discriminant analysis

The method was also tested when the SampEn of AA was excluded from the discriminant analysis. Under these conditions, the discriminant analysis did not improve the t test results, which suggested the combination of the time and frequency domains.

5 Discussion

In this paper, we introduced a new method to predict the spontaneous termination of AF episodes. This method is based on the mathematical regularity of spectral parameters. The t test has revealed the existence of significant differences in group means for six of the studied spectral parameters. Consequently, the selection of spectral parameters for the study, based on previous works on AF, can be considered appropriate. This test discloses that the mean SampEn of the relevant parameters is higher in the non-terminating than in the terminating AF episodes. Consequently, the non-terminating recordings seem to present more complex dynamics than the terminating ones. This corroborates the physiological organization increase of AA prior to AF termination, which was previously reported through invasive atrial electrograms [21, 36]. Furthermore, results indicate that clinically relevant information on AF organization can be obtained from the spectral parameters of the surface ECG. A further interesting challenge could be in finding a suitable relation between the SampEn of spectral parameters and the pathophysiological mechanisms of AF.

Additionally, t test revealed that the SampEn of f p1 has the highest predictive power among all the studied spectral parameters. In previous studies of the AF termination [9, 16, 29], f p1 has been also revealed as a good predictor of AF termination. The main difference between our work and those studies is that we consider the mathematical regularity of spectral parameters in opposition to direct mean values.

The major relevancy of SampEn for parameter f p1 is confirmed by the discriminant analysis. The canonical coefficient for SampEn of f p1 (1.880) is the highest; thus, it is the variable that contributes the most to the discriminant function. A slightly lower canonical coefficient is associated with the SampEn of AA (1.662), which indicates that both variables, and consequently both time and frequency domains, have a similar weight in the prediction of AF termination. A smaller value of the canonical coefficients is related to SampEn of Δf p (0.816). Nonetheless, we must not underestimate any of the variables. Since Δf p is a description of the spectral shape, we can confirm that the variability of this shape along time contributes to the resultant discriminant function. On the other hand, all these three canonical coefficients have the same positive sign, which points out that the loss of mathematical regularity is an indicator of more likely maintenance of AF.

The discriminant analysis has provided an improvement in the results with respect to the classification by threshold (5% for the learning set and 6.66% for the test set). For that reason, it is worth considering the discriminant analysis in predicting the evolution of AF, because this improvement in the classification of AF could be of great importance in routine clinical practice. Finally, Fig. 6 shows that variables of the resultant discriminant function have cross-correlations between them that are not negligible. This means that these variables share information, that is, they are not totally independent. This would explain, to some extent, the high percentage of correctly classified episodes by the SampEn of f p1 and the additional percentage due to the discriminant analysis.

In comparison with previous classifiers of the challenge, only the winner team [29] classified correctly more recordings than our method, one in the learning set. The results of the rest of the classifiers were enhanced by our method. Furthermore, this new strategy provides an assessment of spectral parameters regularity that is not present in previous works.

6 Study limitations

The study presents several limitations. First, the database contains a reduced number of short recordings. A wider database would be advisable in order to validate the method’s performance over a larger population. Longer recordings, which comprise AF onset and offset, would allow to study the AA organization evolution during the complete AF episode. Second, no information on the medication of patients is provided in the database. Different drugs could have different effects on the registered wave form and consequently the results could be altered. Third, although Holter recordings were oversampled to 1,024 Hz, we must take this with caution, since this oversampling cannot be considered to be strictly the same as an original sampling rate of 1,024 Hz. Finally, the limited number of leads from Holter ECG compelled us to use ABS to extract the AA. A 12-lead ECG database of AF episodes would make the application of ICA suitable. Since ICA provides a unified AA that does not depend on the lead, this might help to improve the results.

7 Conclusions

A new method based on the mathematical regularity of spectral parameters has been introduced as an original and improved way to predict the evolution of paroxysmal AF episodes. From the results we can deduce, first, that the future evolution of AF affects not only the values of spectral parameters, but also their variability in time. Second, the SampEn of the spectral parameters is higher for the non-terminating than for the terminating episodes. Furthermore, the spectral parameters mathematical regularity might be used to predict spontaneous paroxysmal AF termination and could provide information on the organization of atrial activation in AF. The good results obtained make this new method a useful tool that can help clinicians in the management of AF. Future research will center on assessing results by using a wider database and studying the organization evolution of AA during long-term AF recordings. This would help to make a decision on whether to cardiovert the patient or not depending on the expected evolution of the arrhythmia. In addition, a minor adjustment of the method for its real-time application could be interesting too. This would certainly require a deeper study of its robustness to noise, given that the degrading influence of noise has been proved in other applications where SampEn was used as a regularity estimator [2].

From the good results of the method, we can expect that the application of this method would not merely be limited to the prediction of PAF termination in the future. It could also be applied to any biomedical signal in which regularity of spectral parameters can help to differentiate among patient groups. Nonetheless, its performance in other environments can be checked only by using the method in the specific application.