Heart sound classification based on log Mel-frequency spectral coefficients features and convolutional neural networks

https://doi.org/10.1016/j.bspc.2021.102893Get rights and content

Highlights

  • The MFSC feature based on dynamic frame length was proposed.

  • The modeling of the heart sound state by the Gaussian mixture model improves the segmentation accuracy.

  • The sample size was expanded by dividing the heart sound into the cardiac cycle.

  • The performance of the classifier was improved by the majority voting algorithm.

  • The MFSC features have achieved excellent performance in two-classification and multi-classification problems.

Abstract

In view of the important role of heart sound signals in diagnosing and preventing congenital heart disease, a novel method about feature extraction and classification of heart sound signals was put forward in this study. Firstly, the heart sound signals were de-noised by using the wavelet algorithm. Subsequently, the improved duration-dependent hidden Markov model (DHMM) was used to segment the heart sound signal according to the heart cycle. Then, the dynamic frame length method was used to extract log Mel-frequency spectral coefficients (MFSC) features from the heart sound signal based on the heart cycle. Afterward, the convolution neural network (CNN) was used to classify the MFSC features. Finally, the majority voting algorithm was used to get the optimal classification results. In this paper, two-classification and multi-classification models were built. An accuracy of 93.89% for two-classification and an accuracy of 86.25% for multi-classification were achieved using the novel method.

Introduction

According to datasets from the World Health Organization, cardiovascular disease is the main reason for the rise of global mortality, and it seriously affects people's life expectancy [1]. The heart sound signal provides a comprehensive assessment of the cardiovascular system, and it includes a large amount of physiological or pathological information about the heart. So the heart sound signal is the primary tool for screening and diagnosing various pathological conditions of the human heart [2].

With the advent of the electronic stethoscope, collecting heart sound signals is no longer a problem. In the last decade, intelligent computer-aided diagnosis (CAD) systems have developed rapidly for heart disease [3]. The CAD systems' main idea is to assist doctors in diagnosing heart disease by using a dedicated computer system to provide a second opinion. However, the heart sound signal is a weak and low-frequency signal. It is incredibly vulnerable to the external environment's noise when collecting heart sound signals, which will significantly hinder the processing and analysis of heart sound signals. Therefore, the effective denoising, analysis, and classification of heart sound signals are the key steps in CAD systems.

A complete CAD system usually consists of three sections. In the first section, the heart sound signal is segmented into each cardiac cycle automatically. By this procedure, these cardiac cycles are further subdivided into their principal components (such as S1 and S2). The sample size of heart sound signals can be expanded, which is very helpful for the CAD system based on deep learning algorithms. Many different types of heart sound segmentation algorithms have been proposed. For example, Hilbert envelope segmentation based on multi-scales [4], feature amplitude threshold segmentation [5], neural network segmentation [6], and hidden Markov model (HMM) segmentation [7]. In the above method, the reasonable state sequence is inferred from the relationship between the observation sequence and the hidden state sequence based on the HMM method. The heart's state is unknown when collecting the heart sound signal, but the collected heart sound signal is observable. The method of HMM meets the requirements of the heart sound segmentation task. At present, most of the segmentation algorithms based on HMM are unsupervised learning of data, but there is a problem in the HMM used by these algorithms. The duration of the state is not well modeled. The probability of heart sound transfer to the next state is not affected by the duration of the current state, which leads to the inaccuracy of the HMM-based algorithm for heart sound state segmentation.

The feature extraction, which is the second section, is usually based on the segmented heart sound signals. Since the noise, such as breathing sounds, is always contained in collected heart sound signals, it is difficult to extract useful features from heart sound signals. The accuracy of classification results will be influenced if the segmented cardiac signals are used directly. Therefore, extracting useful features is key to improve the accuracy of the heart sound classification. For this reason, many scholars have proposed many methods of feature extraction to describe heart sound signals. Deng et al. [8] used discrete wavelet decomposition to extract autocorrelation features from the sub-band envelope calculated according to the heart signal's sub-band coefficients and used a support vector machine to classify heart sounds. Zhang et al. [9] proposed a method to extract heart sound features based on scaled spectrum and tensor decomposition. Kay et al. [10] combined with the time domain characteristics of the cardiac cycle, using continuous wavelet transform, Mel-frequency cepstral coefficient (MFCC), and complex features (such as spectral entropy, standard deviation, skewness, and kurtosis) to define heart sound signal. Malik et al. [11] used the variable-length window and peak amplitude threshold to extract heart sound features for classification. The current mainstream feature extraction methods were mainly based on basic features, time–frequency domain features, and fusion features. Although the methods are different, to keep the final extracted features unified in dimension, most studies first intercept the heart sound signal according to the fixed length and then extract the features. However, each individual's heart rate is different, which will lead to significant differences in the amount of information among fixed-length heart sound segments of different individuals. In fact, the heart sound signal is quasi-periodic, and each heart cycle contains enough information for classification. There is a difficult problem in heart sound classification based on the heart cycle. The dimension of the extracted feature is unequal due to the different lengths of the heart cycle.

In the final section, the features extracted from the segmented heart sounds are processed by the classifier. Heart sound classification algorithms can be roughly divided into two categories: based on the classical pattern recognition algorithm and based on the deep learning algorithm. There are several classification algorithms belonging to classical pattern recognition method, such as HMM [12], Support Vector Machine (SVM) [13], K-Nearest Neighbor (KNN) [14]. Some classifiers, such as Back Propagation Neural Network [15], Convolutional Neural Network (CNN) [16], Artificial Neural Network (ANN) [17] belong to deep learning algorithms. In the heart sound classification algorithm, the advantage of the classical pattern recognition algorithm is that it can classify directly according to the features extracted from the signal without being limited by the amount of data. It can often achieve good results in small data samples and the characteristics of pathological information. The disadvantages of this kind of algorithm are over-reliance on feature extraction steps, strong interference by signal noise, and low robustness. It is difficult to apply to the CAD system. The advantage of the deep learning algorithm is to make full use of data-driven to minimize the loss function by learning the mapping rules of input objects and their prediction results. Under the condition that the amount of data is sufficient and the network structure is reasonable, the accuracy and universality can reach a high level. The disadvantage of this kind of algorithm is poor interpretability. It is difficult to scientifically explain the key information extracted. Another disadvantage is that using deep learning algorithms requires a large amount of data. The phenomenon of over-fitting is easy to occur when using small data samples to train the neural network.

In this study, a new log Mel-frequency spectral coefficients (MFSC) feature based on dynamic frame length is proposed in this paper. The proposed features can well reflect the characteristics of the heart sound signal in the time–frequency domain. Moreover, it solves the problem of the uneven distribution of characteristic samples caused by the unequal length of the heart sound signal after segmentation. The overall framework of heart sound classification was shown in Fig. 1. Firstly, the heart sound signals were de-noised by using the wavelet algorithm. Then, the improved DHMM was used to find the optimal state sequence. Thirdly, the optimal state sequence index was used to segment the heart sound according to the cardiac cycle. Fourth, MFSC features based on dynamic frame length were extracted for each cardiac cycle. Afterward, the features were input into the convolutional neural network to classify the heart sound signals. Finally, the majority voting algorithm was used to optimize the classification results. Both two-class and multi-class problems were tested by the novel method. It has been proved that the proposed features can make the classification accuracy higher for both two-class and multi-class tasks.

Before feature extraction, the wavelet denoising algorithm was used to eliminate the noise of heart sound signals, and the improved heart sound segmentation algorithm was used to expand the data set. In the classification stage, the majority voting algorithm was also used to reclassify the results of individual cardiac cycle classification, which improves the accuracy of the model. These methods make the novel algorithm robust. It is expected to be applied to the CAD systems in clinical environments.

Section snippets

Data description

The heart sound data in this paper were collected in Fuwai Yunnan Cardiovascular Hospital and the First Affiliated Hospital of Kunming Medical University, as well as in the heart sound sample database collected during the screening of congenital heart disease in various areas of Yunnan Province. The volunteers' age in the heart sound sample database ranged from 6 months to 18 years old. All the volunteers signed the informed consent form. All the classified diseases of heart sound signals in

Result

The heart sound signals used in this paper are from the database in section 2.1. Because the data distribution needs to be equal before the classification task, 300 heart sound signals are randomly selected from normal samples for the multi-classification task. Each heart sound sample set is divided into three mutually exclusive groups, which 65% of the set is used to train the network, 15% for verification, and 20% for testing the network. Due to the previous heart sound segmentation steps,

Discussion

As can be seen from Table 8, the algorithm proposed in this paper is superior to other algorithms in sensitivity, specificity, and accuracy. Among all the algorithms, the worst performance is the algorithm proposed in reference [36]. This algorithm uses a wavelet-based depth convolution neural network to extract the features of each state after heart sound segmentation. The complex feature vector is composed of basic state statistical features (time, frequency, etc.) and power density spectrum

Conclusion

Based on the above results and analysis, the following conclusions can be drawn.

  • 1)

    Complex heart sound feature extraction steps, especially the heart sound classification based on deep learning technology, do not improve the performance of heart sound classification. The specific performance is that the performance of the training set is good, but the performance of the test set is poor, and the generalization is not great.

  • 2)

    The heart sound sample data based on the heart cycle plays a great role in

CRediT authorship contribution statement

Haoran Kui: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Writing - original draft, Writing - review & editing, Visualization. Jiahua Pan: Validation, Resources, Data curation, Writing - review & editing, Supervision, Project administration, Funding acquisition. Rong Zong: Methodology, Data curation, Writing - review & editing. Hongbo Yang: Validation, Resources, Data curation. Weilian Wang: Methodology, Writing - review & editing,

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This work was funded by the Major Science and Technology Projects of Yunnan Province under Grants 2018ZF017, the National Natural Science Foundation of China under Grants 81960067, and the Yunnan Applied Basic Research Project under Grants 2018FE001.

Thanks to Fuwai Yunnan Cardiovascular Hospital for providing a clinical research environment and medical guidance for this study.

References (41)

  • M. Nabih-Ali et al.

    A review of intelligent systems for heart sound signal analysis

    J. Med. Eng. Technol.

    (2017)
  • B. Xiao et al.

    Follow the sound of children’s heart: a deep-learning-based computer-aided pediatric CHDs diagnosis system

    IEEE Internet Things J.

    (2019)
  • L.N. Sharma

    Multiscale analysis of heart sound for segmentation using multiscale Hilbert envelope[C]//2015

  • E. Messner et al.

    Heart sound segmentation—An event detection approach using deep recurrent neural networks

    IEEE Trans. Biomed. Eng.

    (2018)
  • S.E. Schmidt et al.

    Segmentation of heart sound recordings by a duration-dependent hidden Markov model.

    Physiol. Meas.

    (2010)
  • E. Kay et al.

    DropConnected neural networks trained on time-frequency and inter-beat features for classifying heart sounds

    Physiol. Meas.

    (2017)
  • P. Mayorga Ortiz et al.

    Modelos acústicos HMM multimodales para sonidos cardiacos y pulmonares

    Revista mexicana de ingeniería biomédica

    (2014)
  • S.A. Singh et al.

    Classification of unsegmented heart sound recording using KNN classifier

    J. Mech. Med. Biol.

    (2019)
  • L. Li et al.

    Classification of heart sound signals with BP neural network and logistic regression[C]//2017 Chinese Automation Congress (CAC)

    IEEE

    (2017)
  • Z. Tan et al.

    Classification of heart sound signals in congenital heart disease based on convolutional neural network

    J. Biomed. Eng.

    (2019)
  • Cited by (35)

    View all citing articles on Scopus
    View full text