Analysis of unlabeled lung sound samples using semi-supervised convolutional neural networks

https://doi.org/10.1016/j.amc.2021.126511Get rights and content

Abstract

Lung sounds convey valuable information relevant to human respiratory health. Therefore, it is important to classify lung sounds for early diagnoses of respiratory disorders. In recent years, computerized lung sound analysis with machine learning algorithms has attracted researchers, especially the state-of-the-art convolutional neural network (CNN). However, most of these algorithms require a large number of labeled respiratory sound samples, which is time- and cost-consuming. Based on a four-layers CNN, this study proposes graph semi-supervised CNNs (GS-CNNs), which can classify respiratory sounds into normal, crackle and wheeze ones with only a small labeled sample size and a large unlabeled sample size. The graph of respiratory sounds (Graph-RS) with labeled and unlabeled respiratory sound samples as vertexes is first constructed, which can indicate not only the reasonable metric information but also the relationship of all the samples. Then, GS-CNNs are developed by adding the information extracted from Graph-RS to the loss function of the original CNN. The added information enables the GS-CNNs to regulate the structure of the original CNN, thus enhancing classification accuracy. The GS-CNNs are evaluated by experiments with the samples collected by electronic stethoscope. Results demonstrate that the proposed GS-CNNs outperform the original CNN, and that the more information from Graph-RS is used, the better recognition effect will be achieved.

Introduction

Lung disease ranks third among fatality causes worldwide, therefore it is important to monitor and determine the pathological conditions from human lungs. Since Lung sounds contain the information relevant to pulmonary pathology, physicians study lung sound characteristics to diagnose respiratory disorders. Auscultation is perhaps the oldest and most common method, which remains a manual task by using a stethoscope. It has many advantages, such as non-invasive, safe, and so on. However, the diagnosing results are subject to the doctors’ ability. If not done by a well-trained physician, this may lead to wrong diagnosis. Therefore, much attention has been paid to the development of computerized lung sound analysis systems. It uses an electronic stethoscope to record lung sounds, and uses the machine learning algorithms to classify the recording lung sounds, which help to overcome the limitations of traditional auscultation. The 10th International Respiratory Sounds Association (ILSA) Conference [1] defined the classification criteria for respiratory sounds. Generally speaking, respiratory sounds are divided into normal,wheeze, crackle, snore and so on.

Machine learning methods such as support vector machine (SVM), k-nearest neighbors (KNN), Gaussian mixed model (GMM) and artificial neural network (ANN) are applied in respiratory sound classification [2]. At present, ANN is one of the most feasible supervised machine learning methods. Kandaswamy et al. combined wavelet transform and ANN, and the method got 94.02% for testing accuracy in classification of respiratory sounds [3]. In [4], Mel Frequency Cepstral Coefficients (MFCC) features along with ANN are utilized, achieving accuracy of 75% for the crackle, 100% for the wheeze and 80% for the normal.

Convolutional neural network (CNN), which is a variant of the ANN, has become state-of-the-art solution for handling high dimensionality structured inputs. CNN also has been applied to classification of respiratory sounds successfully [5], [6], [7], [8]. In [5], CNN using spectrograms for classification of lung sounds showed to perform superior to SVM using MFCC features. Compared with other methods, CNN can directly use the original spectrograms as input, and carry out feature extraction and lung sound classification simultaneously. It not only avoids the sidedness of artificial design features, but also makes full use of the characteristics of spectrogram to avoid the loss of detail information. Therefore, CNN is the preferred method in this paper for classifying respiratory sounds.

CNN is supervised learning method, which requires sufficient labeled data. However, it is difficult to collect labeled respiratory sound samples. There are three main reasons. (1) Labeling lung sounds should be done by well skilled doctors, so it is expensive job both in time and money. (2) Abnormal lung sounds are intermittent. Therefore, the adventitious abnormal sound appears for only a short time within a long recording, which lead to the lack of abnormal samples [9]. (3) For rare lung disease, the corresponding respiratory sound has quite few amount of labeled data. On the other side, collecting unlabeled data by electronic stethoscope is simple. If the information contained in unlabeled samples is used, the shortcoming of CNN when facing limited labeled respiratory sound samples can be overcome. Therefore, the semi-supervised CNN for classifying respiratory sounds is studied in this paper.

One way to implement semi-supervised CNN is to do unsupervised and supervised learning successively. Firstly, the unsupervised learning methods are used to extract features from unlabeled data, and get the features set Φ={f1,f2,,fn}. For example, denoising autoencoders can learn robust high-level features from unlabeled image data proved to be successful in image classification [10]. Secondly, the values f1,f2,,fn are calculated from m labeled samples, and the set of feature vectors Ω={α1,α2,,αm} is got, where αj,j=1,2,m are n-dimensional column vectors. Finally, supervised learning is carried out using Ω as training set. This method extracts features by unsupervised learning, which overcomes the influence of disaster dimension and reduces the demand of CNN for sample size. This method was already applied for classifying respiratory sounds [11]. A denoising autoencoder was used to extract features from lung sounds, which requires a small proportion (5%) of dataset to be labeled, helping to achieve ROC curves with AUCs of 0.86 for wheeze and 0.74 for crackle.

Pseudo-label is another approach to make use of unlabeled data [12]. The pseudo-labels of unlabeled data pick up the class which has the maximum predicted probability and are used as true labels. Consequently, network can be trained in a supervised fashion. If the accuracy of pseudo-labels is high, this method solves the problem of lacking labeled lung sound samples succinctly.

The most promising method for semi-supervised neural network so far is to regulate the structure by adding the information of unlabeled data to loss function. Inspired by auto-encoder, network in [13] used consistency regulation. For a sample x, two transformations T1,T2 are applied to it without changing its category characteristic, and let x˜1=T1(x),x˜2=T2(x). For a network g, the consistency regulation requires consistency of the outputs g(x˜1),g(x˜2). Namely, their distance g(x˜1)g(x˜2) should be minimized. This method avoiding the demand of high accuracy of labeling unlabeled data compared to pseudo-labels, but it limits the features used for classification because of the transformations T1,T2. Mixed with other techniques like data augmentation, models with these regulations achieved excellent performance [14], [15].

However, these semi-supervised networks focus on the classification performance on test set, with little improvement on feature metric, which is important in lung sound analysis. A proper metric of feature space should satisfy clustering property, namely samples from the same class should be in the same cluster whether labeled or unlabeled. This suggests that metric can be improved with cluster regulation. In this paper, graph-based semi-supervised machine learning methods is used to CNN for cluster regulation. A graph is made up of vertices, nodes, or points which are connected by edges, arcs, or lines. Graphs can be used to model many types of relations and processes in physical, biological [16], computer science [17], [18], lung sounds [19], and so on. The graph of respiratory sound samples can show the relationship between labeled and unlabeled samples, and then the information implicated in the graph can be used to perform cluster regulation on CNN to learn better feature metric. To our knowledge, graph-based semi-supervised CNN for classifying respiratory sounds has not been addressed before.

In this paper, a novel graph-based semi-supervised learning method is proposed to improve both classification performance and featrue metric. The graph works in dynamic and differentiable manner, thus can be integrated into CNN for end to end training. Firstly, a graph of respiratory sound samples is constructed. The adjacent and distributive information of the respiratory sound samples is shown in the graph. By adding an additional loss function obtained from the graph to the original CNN loss function, the structure of the original CNN can be regulated. The graph-based semi-supervised CNN makes the best use of the information conveyed in respiratory sounds without any restrictions for unlabeled samples, which is crucial for respiratory sounds analysis.

This remainder of the paper proceeds as follows. Section 2 illustrates MFCC features of the respiratory sounds, and shows the rationality of choosing them as network input. In Section 3, the frame of the Graph semi-supervised CNN is provided, and the details of the graph of respiratory sounds are presented. And then a graph-based semi-supervised CNN for classifying respiratory sounds and a method for training it are proposed. Section 4 reports the experimental results and analysis the effect of different graph components and feature selections. Finally, Section 5 gives a conclusion.

Section snippets

MFCC spectrogram of respiratory sound

Extracting the features which can show the difference of different classes of lung sounds, is a major part of lung sounds classification. Features can be extracted from three domains: time domain, frequency domain and time-frequency domain. Methods including Spectral Analysis [20], Time-frequency analysis [21], [22], [23], [24], [25], Cepstrum Analysis [26], [27], [28] and Hilbert-Huang Transform [29], [30] are widely used in lung sound features extraction.

In the following study, respiratory

Graph semi-supervised CNN

In the following text, the graph semi-supervised CNN trained on the graph of respiratory sounds is abbreviated as GS-CNN, and the graph of respiratory sounds is abbreviated as Graph-RS. The input of GS-CNN is MFCC of respiratory sound, which is expressed as a matrix XR31×40. The output YR3 is probability that the sample belongs to the class of normal, crackle or wheeze.

Experiments

The respiratory sound data set in the experiments is gathered in hospitals in Shijiazhuang Province, China. The samples are collected by Luntech digital intelligent stethoscope produced by Shandong Langlang Technology Development Co., Ltd. And they are divided into normal, wheeze and crackle by specialists. Crackle sound and wheeze sound in respiratory sound have a short duration, and are often concentrated in the segments of inspiratory or expiratory phase. Therefore, we picked 500ms frames of

Conclusion

In this paper, we propose the GS-CNN to integrate the graph semi-supervised method and CNN for classifying lung sounds. The model includes a four-layers CNN for feature extraction and classification. And the cluster-oriented graph integrated to CNN shows the more reasonable metric information of the respiratory sound samples and extra new information of unlabeled ones, which produces regulation to CNN. The experiments demonstrate that the graph can extract information from both labeled and

Acknowledgment

This work was partly supported by the Open Foundation of Shaanxi Key Laboratory of Integrated and Intelligent Navigation (SKLIIN-20190202).

References (36)

  • F. Demir et al.

    Classification of lung sounds with CNN model using parallel pooling structure

    IEEE Access

    (2020)
  • P. Piirila et al.

    Crackles: recording, analysis and clinical significance

    European Respiratory Journal

    (1995)
  • P. Vincent et al.

    Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion

    Journal of Machine Learning Research

    (2010)
  • D. Chamberlain et al.

    Application of semi-supervised deep learning to lung sound analysis

    (2016)
  • D.-H. Lee

    Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks

    ICML 2013 Workshop : Challenges in Representation Learning (WREPL)

    (2013)
  • A. Rasmus et al.

    Semi-supervised learning with ladder networks

    arXiv: Neural and Evolutionary Computing

    (2015)
  • D. Berthelot, N. Carlini, I. Goodfellow, N. Papernot, A. Oliver, C. Raffel, MixMatch: A Holistic Approach to...
  • Q. Xie et al.

    Unsupervised data augmentation

    CoRR

    (2019)
  • View full text