Heart sound classification based on log Mel-frequency spectral coefficients features and convolutional neural networks

doi:10.1016/j.bspc.2021.102893

Biomedical Signal Processing and Control

Volume 69, August 2021, 102893

https://doi.org/10.1016/j.bspc.2021.102893 Get rights and content

Highlights

•
The MFSC feature based on dynamic frame length was proposed.
•
The modeling of the heart sound state by the Gaussian mixture model improves the segmentation accuracy.
•
The sample size was expanded by dividing the heart sound into the cardiac cycle.
•
The performance of the classifier was improved by the majority voting algorithm.
•
The MFSC features have achieved excellent performance in two-classification and multi-classification problems.

Abstract

In view of the important role of heart sound signals in diagnosing and preventing congenital heart disease, a novel method about feature extraction and classification of heart sound signals was put forward in this study. Firstly, the heart sound signals were de-noised by using the wavelet algorithm. Subsequently, the improved duration-dependent hidden Markov model (DHMM) was used to segment the heart sound signal according to the heart cycle. Then, the dynamic frame length method was used to extract log Mel-frequency spectral coefficients (MFSC) features from the heart sound signal based on the heart cycle. Afterward, the convolution neural network (CNN) was used to classify the MFSC features. Finally, the majority voting algorithm was used to get the optimal classification results. In this paper, two-classification and multi-classification models were built. An accuracy of 93.89% for two-classification and an accuracy of 86.25% for multi-classification were achieved using the novel method.

Introduction

According to datasets from the World Health Organization, cardiovascular disease is the main reason for the rise of global mortality, and it seriously affects people's life expectancy [1]. The heart sound signal provides a comprehensive assessment of the cardiovascular system, and it includes a large amount of physiological or pathological information about the heart. So the heart sound signal is the primary tool for screening and diagnosing various pathological conditions of the human heart [2].

With the advent of the electronic stethoscope, collecting heart sound signals is no longer a problem. In the last decade, intelligent computer-aided diagnosis (CAD) systems have developed rapidly for heart disease [3]. The CAD systems' main idea is to assist doctors in diagnosing heart disease by using a dedicated computer system to provide a second opinion. However, the heart sound signal is a weak and low-frequency signal. It is incredibly vulnerable to the external environment's noise when collecting heart sound signals, which will significantly hinder the processing and analysis of heart sound signals. Therefore, the effective denoising, analysis, and classification of heart sound signals are the key steps in CAD systems.

A complete CAD system usually consists of three sections. In the first section, the heart sound signal is segmented into each cardiac cycle automatically. By this procedure, these cardiac cycles are further subdivided into their principal components (such as S₁ and S₂). The sample size of heart sound signals can be expanded, which is very helpful for the CAD system based on deep learning algorithms. Many different types of heart sound segmentation algorithms have been proposed. For example, Hilbert envelope segmentation based on multi-scales [4], feature amplitude threshold segmentation [5], neural network segmentation [6], and hidden Markov model (HMM) segmentation [7]. In the above method, the reasonable state sequence is inferred from the relationship between the observation sequence and the hidden state sequence based on the HMM method. The heart's state is unknown when collecting the heart sound signal, but the collected heart sound signal is observable. The method of HMM meets the requirements of the heart sound segmentation task. At present, most of the segmentation algorithms based on HMM are unsupervised learning of data, but there is a problem in the HMM used by these algorithms. The duration of the state is not well modeled. The probability of heart sound transfer to the next state is not affected by the duration of the current state, which leads to the inaccuracy of the HMM-based algorithm for heart sound state segmentation.

The feature extraction, which is the second section, is usually based on the segmented heart sound signals. Since the noise, such as breathing sounds, is always contained in collected heart sound signals, it is difficult to extract useful features from heart sound signals. The accuracy of classification results will be influenced if the segmented cardiac signals are used directly. Therefore, extracting useful features is key to improve the accuracy of the heart sound classification. For this reason, many scholars have proposed many methods of feature extraction to describe heart sound signals. Deng et al. [8] used discrete wavelet decomposition to extract autocorrelation features from the sub-band envelope calculated according to the heart signal's sub-band coefficients and used a support vector machine to classify heart sounds. Zhang et al. [9] proposed a method to extract heart sound features based on scaled spectrum and tensor decomposition. Kay et al. [10] combined with the time domain characteristics of the cardiac cycle, using continuous wavelet transform, Mel-frequency cepstral coefficient (MFCC), and complex features (such as spectral entropy, standard deviation, skewness, and kurtosis) to define heart sound signal. Malik et al. [11] used the variable-length window and peak amplitude threshold to extract heart sound features for classification. The current mainstream feature extraction methods were mainly based on basic features, time–frequency domain features, and fusion features. Although the methods are different, to keep the final extracted features unified in dimension, most studies first intercept the heart sound signal according to the fixed length and then extract the features. However, each individual's heart rate is different, which will lead to significant differences in the amount of information among fixed-length heart sound segments of different individuals. In fact, the heart sound signal is quasi-periodic, and each heart cycle contains enough information for classification. There is a difficult problem in heart sound classification based on the heart cycle. The dimension of the extracted feature is unequal due to the different lengths of the heart cycle.

In the final section, the features extracted from the segmented heart sounds are processed by the classifier. Heart sound classification algorithms can be roughly divided into two categories: based on the classical pattern recognition algorithm and based on the deep learning algorithm. There are several classification algorithms belonging to classical pattern recognition method, such as HMM [12], Support Vector Machine (SVM) [13], K-Nearest Neighbor (KNN) [14]. Some classifiers, such as Back Propagation Neural Network [15], Convolutional Neural Network (CNN) [16], Artificial Neural Network (ANN) [17] belong to deep learning algorithms. In the heart sound classification algorithm, the advantage of the classical pattern recognition algorithm is that it can classify directly according to the features extracted from the signal without being limited by the amount of data. It can often achieve good results in small data samples and the characteristics of pathological information. The disadvantages of this kind of algorithm are over-reliance on feature extraction steps, strong interference by signal noise, and low robustness. It is difficult to apply to the CAD system. The advantage of the deep learning algorithm is to make full use of data-driven to minimize the loss function by learning the mapping rules of input objects and their prediction results. Under the condition that the amount of data is sufficient and the network structure is reasonable, the accuracy and universality can reach a high level. The disadvantage of this kind of algorithm is poor interpretability. It is difficult to scientifically explain the key information extracted. Another disadvantage is that using deep learning algorithms requires a large amount of data. The phenomenon of over-fitting is easy to occur when using small data samples to train the neural network.

In this study, a new log Mel-frequency spectral coefficients (MFSC) feature based on dynamic frame length is proposed in this paper. The proposed features can well reflect the characteristics of the heart sound signal in the time–frequency domain. Moreover, it solves the problem of the uneven distribution of characteristic samples caused by the unequal length of the heart sound signal after segmentation. The overall framework of heart sound classification was shown in Fig. 1. Firstly, the heart sound signals were de-noised by using the wavelet algorithm. Then, the improved DHMM was used to find the optimal state sequence. Thirdly, the optimal state sequence index was used to segment the heart sound according to the cardiac cycle. Fourth, MFSC features based on dynamic frame length were extracted for each cardiac cycle. Afterward, the features were input into the convolutional neural network to classify the heart sound signals. Finally, the majority voting algorithm was used to optimize the classification results. Both two-class and multi-class problems were tested by the novel method. It has been proved that the proposed features can make the classification accuracy higher for both two-class and multi-class tasks.

Before feature extraction, the wavelet denoising algorithm was used to eliminate the noise of heart sound signals, and the improved heart sound segmentation algorithm was used to expand the data set. In the classification stage, the majority voting algorithm was also used to reclassify the results of individual cardiac cycle classification, which improves the accuracy of the model. These methods make the novel algorithm robust. It is expected to be applied to the CAD systems in clinical environments.

Section snippets

Data description

The heart sound data in this paper were collected in Fuwai Yunnan Cardiovascular Hospital and the First Affiliated Hospital of Kunming Medical University, as well as in the heart sound sample database collected during the screening of congenital heart disease in various areas of Yunnan Province. The volunteers' age in the heart sound sample database ranged from 6 months to 18 years old. All the volunteers signed the informed consent form. All the classified diseases of heart sound signals in

Result

The heart sound signals used in this paper are from the database in section 2.1. Because the data distribution needs to be equal before the classification task, 300 heart sound signals are randomly selected from normal samples for the multi-classification task. Each heart sound sample set is divided into three mutually exclusive groups, which 65% of the set is used to train the network, 15% for verification, and 20% for testing the network. Due to the previous heart sound segmentation steps,

Discussion

As can be seen from Table 8, the algorithm proposed in this paper is superior to other algorithms in sensitivity, specificity, and accuracy. Among all the algorithms, the worst performance is the algorithm proposed in reference [36]. This algorithm uses a wavelet-based depth convolution neural network to extract the features of each state after heart sound segmentation. The complex feature vector is composed of basic state statistical features (time, frequency, etc.) and power density spectrum

Conclusion

Based on the above results and analysis, the following conclusions can be drawn.

1)
Complex heart sound feature extraction steps, especially the heart sound classification based on deep learning technology, do not improve the performance of heart sound classification. The specific performance is that the performance of the training set is good, but the performance of the test set is poor, and the generalization is not great.
2)
The heart sound sample data based on the heart cycle plays a great role in

CRediT authorship contribution statement

Haoran Kui: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Writing - original draft, Writing - review & editing, Visualization. Jiahua Pan: Validation, Resources, Data curation, Writing - review & editing, Supervision, Project administration, Funding acquisition. Rong Zong: Methodology, Data curation, Writing - review & editing. Hongbo Yang: Validation, Resources, Data curation. Weilian Wang: Methodology, Writing - review & editing,

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This work was funded by the Major Science and Technology Projects of Yunnan Province under Grants 2018ZF017, the National Natural Science Foundation of China under Grants 81960067, and the Yunnan Applied Basic Research Project under Grants 2018FE001.

Thanks to Fuwai Yunnan Cardiovascular Hospital for providing a clinical research environment and medical guidance for this study.

References (41)

S.W. Deng et al.
Adaptive overlapping-group sparse denoising for heart sound signals
Biomed. Signal Process. Control
(2018)
A. Moukadem et al.
A robust heart sounds segmentation module based on S-transform
Biomed. Signal Process. Control
(2013)
S.W. Deng et al.
Towards heart sound classification without segmentation via autocorrelation feature and diffusion maps
Future Generation Computer Systems
(2016)
W. Zhang et al.
Heart sound classification based on scaled spectrogram and tensor decomposition
Expert Syst. Appl.
(2017)
S.I. Malik et al.
Localization and classification of heartbeats using robust adaptive algorithm
Biomed. Signal Process. Control
(2019)
F. Safara et al.
Multi-level basis selection of wavelet packet decomposition tree for heart sound classification
Comput. Biol. Med.
(2013)
S. Babaei et al.
Heart sound reproduction based on neural network classification of cardiac valve disorders using wavelet transforms of PCG signals
Comput. Biol. Med.
(2009)
P. Chen et al.
Classification of heart sounds using discrete time-frequency energy feature based on S transform and the wavelet threshold denoising
Biomed. Signal Process. Control
(2020)
P. Laguna et al.
Automatic detection of wave boundaries in multilead ECG signals: validation with the CSE database
Comput. Biomed. Res.
(1994)
A. Poernomo et al.
Biased dropout and crossmap dropout: learning towards effective dropout regularization in convolutional neural network
Neural Networks
(2018)

M. Nabih-Ali et al.

A review of intelligent systems for heart sound signal analysis

J. Med. Eng. Technol.

(2017)

B. Xiao et al.

Follow the sound of children’s heart: a deep-learning-based computer-aided pediatric CHDs diagnosis system

IEEE Internet Things J.

(2019)

L.N. Sharma

Multiscale analysis of heart sound for segmentation using multiscale Hilbert envelope[C]//2015

E. Messner et al.

Heart sound segmentation—An event detection approach using deep recurrent neural networks

IEEE Trans. Biomed. Eng.

(2018)

S.E. Schmidt et al.

Segmentation of heart sound recordings by a duration-dependent hidden Markov model.

Physiol. Meas.

(2010)

E. Kay et al.

DropConnected neural networks trained on time-frequency and inter-beat features for classifying heart sounds

Physiol. Meas.

(2017)

P. Mayorga Ortiz et al.

Modelos acústicos HMM multimodales para sonidos cardiacos y pulmonares

Revista mexicana de ingeniería biomédica

(2014)

S.A. Singh et al.

Classification of unsegmented heart sound recording using KNN classifier

J. Mech. Med. Biol.

(2019)

L. Li et al.

Classification of heart sound signals with BP neural network and logistic regression[C]//2017 Chinese Automation Congress (CAC)

IEEE

(2017)

Z. Tan et al.

Classification of heart sound signals in congenital heart disease based on convolutional neural network

J. Biomed. Eng.

(2019)

Cited by (35)

Hilbert domain characterizations of wavelet packets for automated heart sound abnormality detection
2024, Biomedical Signal Processing and Control
Heart valve disease (HVD) is a common disease that affects millions of people worldwide. Early detection and treatment are essential for improving the prognosis of patients with HVD. Phonocardiogram (PCG) signals are a non-invasive and inexpensive way to assess the mechanical activity of the heart. In this study, a novel method for HVD detection using Hilbert domain mapping of wavelet packet of PCG signals is proposed. Two standard PCG databases are used to evaluate the proposed method. Packet instantaneous frequency deviation (PIFD) and packet instantaneous energy deviation (PIED) features are extracted from the PCG signals and used for classification. A support vector machine (SVM) and K-nearest neighbour (KNN) based error-correcting output code (ECOC) approach is used to handle multiclass classification and minimize classification error. The proposed method achieves an unweighted average recall (UAR) of 99.8% on database 1 and 99.32% on database 2, which outperforms other baseline methods. The results suggest that the proposed method is a promising approach for HVD detection using PCG signals.
A new approach based on a 1D + 2D convolutional neural network and evolving fuzzy system for the diagnosis of cardiovascular disease from heart sound signals
2024, Applied Acoustics
Diagnostic methods for cardiovascular disease diagnosis based on heart sound classification have been widely investigated for their noninvasiveness, low-cost, and high efficiency. Most current researches either use manually designed functions or deep learning-based methods to extract features from heart sound signals, but the heart sound signals have highly nonstationary and complex data patterns due to environmental noise and the differences between different stethoscopes. Therefore, using a single feature extraction method does not result in a good feature representation. Moreover, deep learning-based feature extraction methods for heart sound signals usually only use 1D convolution or 2D convolution, which limits the capability of neural networks to extract discriminative features. In addition, many studies do not consider the redundancy of features and the interpretability of decisions, which affects the performance and efficiency of the models. To solve the above problems, this paper first proposes a new convolutional neural network named the 1D + 2D convolutional neural network (1D + 2D-CNN) as a deep learning feature extractor, which combines 1D convolution and 2D convolution. The 1D + 2D-CNN contains two branches, and the feature maps obtained from the two branches are concatenated according to the channel. Then, a 10-layer convolutional network with an attention mechanism is introduced to enhance the feature extraction capability of the network. Second, the advantages and disadvantages when combining deep learning features with manual features in different scenarios are explored. In addition, the mean and variance of each dimensional feature of the dataset are calculated by class and feature selection is achieved by evaluating the importance of each dimension of the feature through a simple statistical formula. Finally, an evolving fuzzy system is used to classify the heart sound signals, as it can provide interpretability for decision-making. For the experimental part, the 2016 PhysioNet/CinC Challenge dataset (PCCD) and our collected publicly available pediatric heart sound dataset (PHSD) are used to evaluate the performance of the model by 10-fold cross-validation. The model in this paper achieves accuracies of 96.3 % and 99.1 % on these two datasets respectively, demonstrating its capability to reach the state-of-the-art level. We also open the code of our algorithms. Diagnostic methods for cardiovascular disease diagnosis based on heart sound classification have been widely investigated for their noninvasiveness, low-cost, and high efficiency. Most current researches either use manually designed functions or deep learning-based methods to extract features from heart sound signals, but the heart sound signals have highly nonstationary and complex data patterns due to environmental noise and the differences between different stethoscopes. Therefore, using a single feature extraction method does not result in a good feature representation. Moreover, deep learning-based feature extraction methods for heart sound signals usually only use 1D convolution or 2D convolution, which limits the capability of neural networks to extract discriminative features. In addition, many studies do not consider the redundancy of features and the interpretability of decisions, which affects the performance and efficiency of the models. To solve the above problems, this paper first proposes a new convolutional neural network named the 1D + 2D convolutional neural network (1D + 2D-CNN) as a deep learning feature extractor, which combines 1D convolution and 2D convolution. The 1D + 2D-CNN contains two branches, and the feature maps obtained from the two branches are concatenated according to the channel. Then, a 10-layer convolutional network with an attention mechanism is introduced to enhance the feature extraction capability of the network. Second, the advantages and disadvantages when combining deep learning features with manual features in different scenarios are explored. In addition, the mean and variance of each dimensional feature of the dataset are calculated by class and feature selection is achieved by evaluating the importance of each dimension of the feature through a simple statistical formula. Finally, an evolving fuzzy system is used to classify the heart sound signals, as it can provide interpretability for decision-making. For the experimental part, the 2016 PhysioNet/CinC Challenge dataset (PCCD) and our collected publicly available pediatric heart sound dataset (PHSD) are used to evaluate the performance of the model by 10-fold cross-validation. The model in this paper achieves accuracies of 96.3 % and 99.1 % on these two datasets respectively, demonstrating its capability to reach the state-of-the-art level. We also open the code of our algorithms.
QDRJL: Quaternion dynamic representation with joint learning neural network for heart sound signal abnormal detection
2023, Neurocomputing
At present, deep learning based heart sound diagnosis algorithms are mostly complex and large models for high accuracy, which are difficult to deploy on mobile devices due to the high number of parameters and large computational cost. The current mainstream approach for processing heart sound signals involves utilizing their Mel-frequency cepstral coefficients (MFCC) features. However, most existing methods have overlooked the multi-channel characteristics of MFCC. To address this issue, we propose a Quaternion Dynamic Representation with Joint Learning (QDRJL) neural network for learning MFCC multi-channel features. Our proposed approach combines quaternion dynamic convolution with dynamic weighting and the Quaternion Interior Learning Block (QILB). Finally, we present a global and energy joint learning branch for jointly learning MFCC features. The success of the proposed quaternion network depends on its ability to utilize the internal relations between quaternion-valued input features and the definition of the dynamic weight variables in the augmented quaternion domain. We assessed various state-of-the-art classification algorithms for detecting heart sounds and found that our proposed classifier achieved an accuracy of up to 97.2%, outperforming existing models. Our experimental evaluation, using the 2016 PhysioNet/CinC Challenge dataset, revealed that our model could reduce the number of network parameters to 25% due to quaternion properties.
Feature selection algorithms highlight the importance of the systolic segment for normal/murmur PCG beat classification
2023, Biomedical Signal Processing and Control
This paper proposes a method using statistical local and global features for classifying healthy and murmur heart sound recordings from phonocardiogram signals. Classification requires features extraction step that converts each signal into a sequence of feature vectors composed of static and dynamic energy coefficients computed from overlapped analysis windows. Firstly, we propose, for each heartbeat, to extract local features from the local consecutive regions (1st Sound, Systole, 2nd Sound, Diastole) and global ones from the global region. For each region, the features are the statistical features (mean and standard deviation) computed on the feature vector sequence plus the duration. Secondly, we propose to select the relevant features using filter approach based on mutual information criteria. The extraction and selection methods are validated using K nearest neighbor and Gaussian Mixture Models as classifiers. The classification system were evaluated on a sub-dataset of the public PASCAL heart sounds classifying challenge. Results showed that 12 features selected using the Max-Relevance Min-Redundancy selection strategy were sufficient to explain the two classes with 94.97% classification rate higher than 92.74% state-of-the-art rate. We also showed this selection strategy helped the system to be robust to the testing phase when using automatic segmentation rather than manual segmentation. This work demonstrates that local systolic segment features are the most relevant for murmur/normal classification, regardless of segmentation methods. It also shows that feature selection algorithms have potential to highlight certain relevant regions in signals, which is useful for aided diagnostic systems and basic research.
Detection of pulmonary arterial hypertension associated with congenital heart disease based on time–frequency domain and deep learning features
2023, Biomedical Signal Processing and Control
The heart sounds reflect the health of the heart. Its recording is the phonocardiogram (PCG). Pulmonary arterial hypertension associated with congenital heart disease (CHD-PAH) is a serious heart disease and is often associated with severe disability and death. The disease is not well characterized onset. The most patients are severe when they have been diagnosed and miss the best time to treat them. The objective of this study was to develop a computer aided diagnosis, which based on single cycle with multiple features, for detecting pulmonary arterial hypertension associated with congenital heart disease. It is a non-invasive and simple method which may be hopeful at early diagnosis of CHD-PAH. The original heart sounds were pre- processed first, in which a double-threshold adaptive segmentation method was used to segment the signal into each cardiac cycle first. Then the time–frequency domain features and wavelet packet energy features of cardiac cycle and S2 component are extracted. And convolutional neural network (CNN) is used to extract the depth features of cardiac cycle. The above features were combined into a fused feature vector. Normal, CHD and CHD-PAH were classified using XGBoost as the classifier. Finally, the majority voting algorithm is used to obtain the best classification result for multiple results corresponding to multiple cardiac cycles of the same person. Using this new method, a classification accuracy of 88.61% was achieved.
Detection of pulmonary hypertension associated with congenital heart disease based on time-frequency domain and deep learning features
2023, Biomedical Signal Processing and Control
The heart sounds reflect the health of the heart. Its recording is the phonocardiogram (PCG). Pulmonary hypertension associated with congenital heart disease (CHD-PAH) is a serious heart disease and is often associated with severe disability and death. The disease is not well characterized onset. The most patients are severe when they have been diagnosed and miss the best time to treat them. The objective of this study was to develop a computer aided diagnosis, which based on single cycle with multiple features, for detecting pulmonary hypertension associated with congenital heart disease. It is a non-invasive and simple method which may be hopeful at early diagnosis of CHD-PAH. The original heart sounds were pre-processed first, in which a double-threshold adaptive segmentation method was used to segment the signal into each cardiac cycle first. Then the time–frequency domain features and wavelet packet energy features of cardiac cycle and S2 component are extracted. And convolutional neural network (CNN) is used to extract the depth features of cardiac cycle. The above features were combined into a fused feature vector. Normal, CHD and CHD-PAH were classified using XGBoost as the classifier. Finally, the majority voting algorithm is used to obtain the best classification result for multiple results corresponding to multiple cardiac cycles of the same person. Using this new method, a classification accuracy of 88.61% was achieved.

View all citing articles on Scopus

View full text

Heart sound classification based on log Mel-frequency spectral coefficients features and convolutional neural networks

Highlights

Abstract

Introduction

Section snippets

Data description

Result

Discussion

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgement

Biomed. Signal Process. Control

Biomed. Signal Process. Control

Future Generation Computer Systems

Expert Syst. Appl.

Biomed. Signal Process. Control

Comput. Biol. Med.

Comput. Biol. Med.

Biomed. Signal Process. Control

Comput. Biomed. Res.

Neural Networks

A review of intelligent systems for heart sound signal analysis

J. Med. Eng. Technol.

Follow the sound of children’s heart: a deep-learning-based computer-aided pediatric CHDs diagnosis system

IEEE Internet Things J.

Multiscale analysis of heart sound for segmentation using multiscale Hilbert envelope[C]//2015

Heart sound segmentation—An event detection approach using deep recurrent neural networks

IEEE Trans. Biomed. Eng.

Segmentation of heart sound recordings by a duration-dependent hidden Markov model.

Physiol. Meas.

DropConnected neural networks trained on time-frequency and inter-beat features for classifying heart sounds

Physiol. Meas.

Modelos acústicos HMM multimodales para sonidos cardiacos y pulmonares

Revista mexicana de ingeniería biomédica

Classification of unsegmented heart sound recording using KNN classifier

J. Mech. Med. Biol.

Classification of heart sound signals with BP neural network and logistic regression[C]//2017 Chinese Automation Congress (CAC)

IEEE

Classification of heart sound signals in congenital heart disease based on convolutional neural network

J. Biomed. Eng.