Keywords

1 Introduction

The QRS complex is the main pattern of the electrocardiogram (ECG) waveform [5] and it is composed of three time-sequenced fiducial points, i.e., Q, R, and S points. This structure reflects the electrical activity of the heart during the ventricular contraction, the time of its occurrence, as well as its shape, providing much information regarding the health of an individual. Due to its peculiar rhythm and shape, the QRS complex is used as the basis for automatic determination of the heart rate, and as an entry point for classification schemes of the cardiac cycles. In that sense, QRS detection provides the fundamentals for almost all automated ECG analysis algorithms [10].

Detecting the QRS complex is useful in various scenarios such as diagnosing arrhythmia, identifying tachycardia and also for biometrics purposes [1, 9]. However, the QRS complex detection (or simply QRS detection) is a difficult task, not only due to physiological variability of the ECG wave but also due to several types of noise that can be present in the signal [12]. Noise sources include muscle noise, artifacts due to electrode displacement, power-line interference, baseline wander, and harmonic interference (T waves with a similar frequency) [12].

Deep learning is a hot topic in machine learning field nowadays and since 2012, several outstanding results have been reported in different tasks [7,8,9]. In a similar way, in [2] arrhythmia classification is done using CNN as a promising tool to recognize patterns in ECG signals. In [9], the authors have reported state-of-the-art results using convolutional network for biometric task through the cardiac signal. In this work, a CNN-based approach is proposed in order to detect the so-called QRS complex.

Preliminary experiments in the MIT-BIH arrhythmia database have shown that CNN is a promising tool for QRS detection, yielding figures comparable to the state-of-the-art methods. This approach could be used together with traditional QRS detectors algorithms to decrease the number of false positives and making detection more reliable. Together with the new deep learning hardware technology in development today, such as the Tensor Processing Unit (TPU) proposed by Google, FPGA base deep learning accelerators [18], or NVIDIA TX1Footnote 1, approaches based on deep learning will be the natural choice for embedding future mobile and wearable medical equipment.

The remainder of this work is organized as follows. We present and describe the related works in Sect. 2. The proposed approach and benchmark dataset is described in Sect. 3. Experimental results are described and discussed in Sect. 4. Finally, conclusions and future work are outlined in Sect. 5.

2 Related

In this section, some related works regarding QRS detection problem are described. All works described here reported state-of-the-art results and their experiments are performed on MIT-BIH database which has been widely used in literature. Several techniques have been employed.

The work proposed by Poli et al. [14] is a combination of linear-nonlinear polynomial filters and genetic algorithm to detect the QRS complex in MIT-BIH database. The first one is used to improve the quality of the signal, and the second to minimize detections errors.

Zhang and Lian [19] also used the MIT-BIH Arrhythmia Database, reaching expressive results, 99.81% sensitivity and 99.80% positive prediction. The goal is to create a method to be used in wearable ECG devices. The authors applied two strategies to improve the signal quality softening the impulsive noise and removing the baseline drift.

The focus of the work proposed by Chen et al. [4] is to perform a real-time QRS detection. For enhancing the accuracy and attenuating the noise, the authors introduced a wavelet-based procedure in line with a moving average-based computing method. This approach achieved about 99.5% of detection rate. According to the authors, their approach is trustworthy even when the signal quality is not the best.

Nallanthambi et al. [11] proposed a different approach. First, they changed the ECG analog input data to pulse. After that, they encoded the signal with the time-based Integrate and Fire approach, which allowed them to implement a logical decision rule for QRS detection. They highlighted how fast the process is and how it can be combined with logical hardware to reduce the power consumption.

3 Approach

This work presents an approach to detect QRS complexes from ECG signals with the well known CNN. To allow comparison with the literature, MIT-BIH Arrhythmia Database [10] is chosen since it is the most popular benchmark for ECG QRS detection and it is also the one recommended by the Association for the Advancement of Medical Instrumentation (AAMI) and described in ANSI/AAMI EC57:1998/(R)2008 (ANSI/AAMI) [3].

3.1 MIT-BIH Arrhythmia Database

MIT-BIH database provides annotations for all the recorded heartbeats, which includes patient conditions and fiducial points (read Q, R, S, T points and amplitude values) used as labels. This information is essential to train and evaluate the machine learning models.

The database was acquired using a Holter, that printed the results in tapes. Furthermore, it was digitized and the annotations were included on it by trained physicians. The frequency rate (360 samples per second per channel) was chosen to accommodate the use of simple digital notch filters to remove 60 Hz (powerline frequency) interference.

Total of 48 half-hour ECG recordings obtained from 47 individuals studied by the BIH Arrhythmia Laboratory between 1975 and 1979 was selected to compose the database. Of these 48 records, 23 (the “100 series”) were chosen at random from a collection of over 4000 Holter tapes, and the other 25 (the “200 series”) were selected to include examples of uncommon but clinically important arrhythmias that would not be well represented in a small random sample. The individuals included 25 men, aged 32 to 89 years, and 22 women, aged 23 to 89 years [10].

3.2 Method

In this section, we present our approach based on the convolutional network. The proposed model is considered a deep learning approach since it contains several layers and can learn representations automatically by means of supervised learning. Convolutional networks have four basic operations: convolution, activation (ReLu), pooling and normalization and all these operations are well described in the literature [7].

In our specific case, the ECG signal has a length of 833 ms (300 samples) and several kernels orders of magnitude smaller, four as shown in Fig. 1. The data is feed into the network in the raw state, without any filtering or pre-processing.

Fig. 1.
figure 1

One dimensional convolution process for ECG signals. The Figure illustrates a convolution process between a raw ECG and a kernel size of \(1\times 4\).

Two different network architectures are evaluated. The first is very deep and have small receptive fields, i.e., small filter kernels on the first layer, while the second uses larger filter kernels. Both architectures are inspired by [16] and [6], respectively. An example of architecture with large receptive fields is illustrated in Fig. 2.

Fig. 2.
figure 2

CNN architecture with large receptive fields three convolutional, pooling and fully connected layers.

The QRS detection problem designed here is a binary classification problem: QRS is detected (positive class) or not (negative class). Thus, positive and negative samples must be defined. Aiming that, for each heartbeat of MIT-BIH database, three positive samples are extracted: centralized heartbeat signal, heartbeat signal shifted by \(+5\) samples, heartbeat signal shifted by \(-5\) samples. And eight negative samples are also extracted from each MIT-BIH database heartbeat: heartbeat shifted by \([\pm 30, \pm 50, \pm 80, \pm 120]\). Examples of generated samples are shown in Fig. 3.

Positive samples are defined as a signal where the QRS complex is centered on the 800 ms (or 300 samples) window, while the negative ones are based on the same signal shifted in order to intentionally misaligned the QRS complex. This way the CNN can learn to detect centralized QRS complex as a positive sample.

Fig. 3.
figure 3

Positive and negative samples from MIT-BIH records.

For training, network weights are optimized using gradient descend and the cost function used is softmax-loss. Since the generated data is large in size, the dropout is used to avoid over-fitting.

4 Experimental Results and Discussion

The computational resources used here includes an Intel (R) Core i7-5820K CPU @ 3.30 GHz 12 core, 64 GB of DDR4 RAM and a GeForce GTX TITAN X GPU. In this work, we use the MatConvNet toolbox [17] linked with NVIDIA CuDNN.

For the experiments, a total of 1280 heartbeats per record (total of 44 records of MIT-BIH) are extracted (excluding peacemaker patient records). Among those heartbeats, 480 heartbeats per record are selected for the training phase (400 for training and 80 for validation) and 800 heartbeats per record are reserved for the final evaluation. A total of 63360 samples are used as positive samples and 168960 as negative. This class imbalance is intentional in order to keep the detection window narrowed (\([+5, -5]\) samples from the center).

For CNN weights optimization, stochastic gradient descent is used on batches of size 100, momentum coefficient of 0.9 trained for 10 epochs. We have used 10% dropout after two fully connected layers and weight decay with coefficients value of \(5\times 10^{-4}\). The initial learning rate is \(10^{-2}\), decreasing by a factor of 10 when accuracy stagnates on validation.

The heartbeats used came from two different groups on the database: normal and arrhythmic. Between the normal ones, there are heartbeats of three types: Normal, Left and Right bundle branch block. Among the arrhythmic, heartbeats of three types are considered: Atrial premature beat, Aberrated atrial premature beat and Nodal (junctional) premature beat and Fusion of normal ventricular and normal beat. The architecture that obtained better accuracy on validation data is composed by four conv layer \([96(1\times 49)|128(1\times 25)|256(1\times 9)|512(1\times 9)]\), four pooling layer \([(1\times 2) \text{ with } \text{ stride } 2]\), three fully connected \([4096(1\times 29)|4096(1\times 1)|2(1\times 1)]\), dropout of 50% and softmax cost function.

Fig. 4.
figure 4

Algorithm applied - 10 to 10 ms. Register 114 from MIT-BIH.

When both architectures are compared, we found that the deepest network does not achieve the best result, the opposite of what is expected. This same situation can be found on a similar work [15], in which a deeper architecture does not result in the best response. Very deep networks bring instability to the training when using the gradient descent method to optimizing weights [13] and this may explain the performance of the deepest architecture. To overcome this problem, some authors segment the networks and perform the training in parts. That is, the weights are initialized from scratch only in the first stage of training [13].

Table 1. Comparison of the QRS detection with other published algorithms on the MIT-BIH arrhythmia database.

The data evaluation is created from a continuous set of 800 heartbeats for each record. A window of 300 samples is shifted by 20 samples (56 ms) and the generated signal is presented to the CNN model. With this approach, 96000 true events (QRS) and 160000 negative events (non-QRS) were generated. Results for best CNN model was 95376 True Positives (TP) and 624 False Negatives (FN), 2661 False Positives (FP) and 157339 True Negatives (TN). The total accuracy was 98.72%. Table 1 compares the proposed method with other works in the literature. To evaluate the performance, the sensitivity (Se), positive prediction (+P) were used, and are given by following equations, respectively: \(Se(\%)={TP}/{(TP+FN)}, \text{ and } +P(\%)={TP}/{(TP+FP)}.\)

These metrics (\(Se(\%)\) and \(+P(\%)\)) are widely used in the literature and are also the metrics recommended by AAMI [3], which means that these metrics are officially recommended for certification of algorithms/techniques embedded in medical equipment.

A practical application of the method can be seen in Fig. 4, where a 300 samples window of one MIT-BIH recording is shifted 10 to 10 ms, in a sliding window fashion. The red pulse indicates when a QRS complex is detected.

In this type of application, the occurrence of false positives is a major problem since the false positive often leads to heart rate miscalculation and thus causing unnecessary alarm for the medical staff. In real scenarios, physicians tend to lose confidence when they perceive excessive false positive events. The method proposed here, based on CNN, could be used in medical equipment, along with traditional methods that already embed those devices, to help reduce false positives.

5 Conclusions

In this paper, a QRS detection method in electrocardiogram based on convolutional network is presented. To the best of our knowledge, this is the first time that CNN is used for such aim. Experiments have shown that the proposed method is comparable to state-of-the-art methods in the literature (See Table 1) and could be a promising research path. It is worth noting that, although the results of the methods presented in Table 1 used the same database, the evaluation protocol is different, regarding the number of instances for training and testing.

Despite no pre-processing technique was considered in the present methodology, a filtering stage could impact on final results. This is a promising research direction to increase the accuracy of CNN-based QRS-detectors.

It is worthwhile noting that improvements can also be achieved by investigating other CNN architectures and other techniques for allowing training more deeper networks, such as the ones employed in [13]. For future works, CNN-based QRS detectors could be merged to traditional QRS-detector methods aiming to reduce the false positive rate and increase final accuracy.