Detecting pertussis in the pediatric population using respiratory sound events and CNN
Introduction
Pertussis, commonly known as whooping cough, is a respiratory tract infection caused by Bordetella pertussis coccobacillus. It spreads by air droplets and is highly contagious [18]. The number of pertussis cases has decreased since the development of a vaccine. However, neither immunization nor previous infection provide lifelong immunity to the disease [2]. There is a resurgence of pertussis infections which is attributed to waning immunity and bacteria mutation [23,34]. While pertussis affects all age groups, it is a significant cause of morbidity and mortality in young children [35], especially in developing countries, where access to timely diagnoses may not be available.
Following an incubation period, the typical progression of pertussis is in three distinct stages: catarrhal phase, paroxysmal phase, and convalescent phase [18]. The catarrhal phase characteristics are similar to other upper respiratory tract infections. This is followed by the paroxysmal phase. Cough is one of the symptoms of pertussis and it increases in severity at this stage, developing into a paroxysmal or hacking cough followed by a high-pitched intake of air that sounds like a whoop, hence the name whooping cough [35]. The residual cough can persist for weeks to months in the convalescent phase. In severe cases in infants it can lead to respiratory failure and death [20].
People with pertussis are infectious for weeks but, if given appropriate antibiotic treatment, the infectious period and spread is reduced and may also prevent complications [4]. Early treatment of pertussis is, therefore, crucial for managing this disease. We posit that the paroxysmal coughing and whooping sounds can be useful for screening pertussis, especially in the pediatric population which remains the most vulnerable age group. However, recognizing these respiratory sounds by parents/carers of the child can be unfeasible. In clinical practice, this is dependent on the skills and training of the clinicians.
In this work, we aim to develop an objective computational method for detecting respiratory sound events associated with pertussis, that is, the hacking cough and whooping, for the pediatric population. If disseminated widely, for example, as a smartphone application, such an objective assessment tool could prove useful as a screening tool for parents/carers. It could also be useful in developing countries and remote communities which lack access to health facilities and clinicians.
Detecting respiratory diseases using digital respiratory sounds, cough sounds, in particular, has generated interest recently such as in detecting childhood pneumonia [16], monitoring chronic obstructive pulmonary disease [9], and in detecting croup, which is common in children between the age of 6 months to 6 years and produces a distinctive barking cough [30]. Various signal processing and machine learning techniques have been proposed for the analysis and detection of cough sounds. Being a relatively new area of research, a number of techniques are inspired by other audio classification tasks such as speech recognition. One such measure is mel-frequency cepstral coefficients (MFCCs) [8]. MFCCs utilize mel-filters which are effective in revealing the perceptually significant characteristics of the speech spectrum in small time windows. Speech and cough share some similarities in the generation process and the physiology which could explain the widespread use and effectiveness of MFCCs in cough sound analysis tasks [10,16,27,29,30,37].
It is a common practice to complement MFCCs with other techniques. In [10,16,29], various temporal and spectral analysis techniques are employed for this purpose. In addition, wavelet transformation is applied in [16] in analysis of cough sounds for detecting pneumonia. Wavelets are effective at the decomposition of non-stationary signals in both the time and frequency domains and, in [16], the focus is particularly on picking the crackle sounds in pneumonia coughs.
Furthermore, the spectral information contained in cough sounds is more dominant in low frequencies than in high frequencies. The human auditory model offers a higher resolution for low frequencies than for high frequencies. In [30], this frequency selectivity property of the human cochlea is modeled using a gammatone filter to differentiate the barking cough sound of croup subjects from the cough sound of other respiratory diseases. A similar approach is also taken in [37].
Audio sound analysis, including cough sound analysis, is typically carried out in small time windows at different frequency localizations. These result in a high dimensional data which conventional classification methods may be unable to handle. A common approach is to reduce this data size into a smaller feature set using statistical methods. With MFCCs, for example, the mean and standard deviation of the coefficients have been used [30]. Similarly, the slope of the wavelet coefficients is used as wavelet feature (WF) in [16]. In [30], the time-frequency representation is formed using gammatone filters, referred to as gammatone-spectrogram or cochleagram, is divided into blocks and the second and third central moments are used as the cochleagram image features (CIF). In [16,29,30], feature extraction follows feature selection to further reduce the feature dimension and select the most dominant features for classification.
The use of conventional feature engineering techniques inevitably leads to loss of some information which causes poor classification performance and misdetection of respiratory diseases. More recently, these methods have been superseded by deep learning techniques due to their superior classification results. One such deep learning technique is convolutional neural network (CNN) [17]. CNN is originally an image classification technique which has the ability to learn distinguishing image characteristics directly from the raw image through various mathematical operations. In audio signal classification tasks, this arrangement is typically realized by transforming the signal into an image-like representation [21,32]. Time-frequency representation of audio signals is the most common approach for this purpose, such as the conventional spectrogram representation formed using short-time Fourier transform (STFT).
An overview of the proposed approach is given in Fig. 1. We take inspiration from conventional feature extraction techniques and the state of the art CNN for detecting pertussis using respiratory sounds. In particular, we represent the one-dimensional respiratory sound signals as two-dimensional time-frequency representations for classification using CNN. Our approach in forming the time-frequency representations is based on the feature extraction techniques from [16,29,30]. In particular, we use mel-filters, as used in computing MFCCs, to form mel-spectrogram; wavelet transform, as used in computing WF, to form wavelet scalogram, and gammatone filters, as used in computing CIF, to form cochleagram.
Furthermore, different time-frequency representations reveal spectral characteristics at different frequencies. In conventional machine learning, this information is combined, for example, using feature vector concatenation, to improve the classification performance. With CNN this can be achieved using late fusion whereby the outputs of CNN models trained on different representations are combined. This can be realized either by averaging the output scores [39] or using the output scores to train a secondary classifier [41]. In this work, we use late fusion to combine the CNN learning from different time-frequency representations, aiming to make more accurate predictions.
The proposed approach is evaluated on a dataset of respiratory sounds from children with suspected or confirmed pertussis and other respiratory diseases. Collecting physiological data is time consuming, expensive, and may require patient cooperation, which can be difficult with children. However, a rapid rise in the use of digital technology has prompted researchers to collect self-reported data from the public. In a similar study [29], researchers composed a dataset of respiratory diseases using online sources while researchers at Microsoft used web search queries of users with self-identified conditions [36]. More recently, researchers at the University of Cambridge collected COVID-19 related sounds of users with self-reported disease status through a website and a smartphone application. In this work, we use a dataset of respiratory sounds collated from the YouTube online video sharing platform and reviewed by a clinician.
In total, the dataset contains 42 recordings, each with multiple respiratory sounds. This makes it a relatively small dataset and CNN models trained on small datasets can be prone to overfitting. One method to reduce overfitting is mixup [40] which augments the dataset, mixing the features of different classes. It is a simple yet effective method with very low computational costs. In this work, we extend the mixup data augmentation technique to time-frequency representations of respiratory sounds.
The rest of the paper is organized as follows. An overview of the dataset and the proposed method is given in Section 2. The experimental setup and results are provided in Section 3 and discussion of the results and conclusions are in Section 4.
Section snippets
Dataset
The dataset used in this work was collated from YouTube. Various search terms were used to identify respiratory sound recordings from children with the following respiratory conditions: pertussis, asthma, bronchiolitis, croup, and pneumonia. The diagnosis of pertussis and other respiratory conditions in the videos was attributed by the information provided in the title and/or description of the videos and later checked by a clinician to assess the plausibility of the sounds and the reported
Experimental setup
In this work, we use a stratified 7-fold cross-validation which we found to give a good compromise between the number of training and validation samples in each fold. As such, 3 pertussis and 3 non-pertussis recordings are used for validating the model and the remaining recordings are used for training the model, in each fold. The respiratory sounds from a recording/subject are present either in the training or validation dataset, but not in both.
The number of respiratory sounds per recording
Discussion and conclusions
The dataset used in this work has been recorded in natural environments with SNR as low as 16 dB. The recordings are believed to be made using smartphones of different manufacturers and models and the training and validation procedure followed in this work is subject independent. All these increase the difficulty and complexity of the task. Despite these constraints, our method is empirically shown to achieve strong classification performance at the cough and particularly subject levels. In
Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
CRediT authorship contribution statement
Roneel V. Sharan: Conceptualization, Data curation, Methodology, Software, Investigation, Visualization, Writing - original draft, Writing - review & editing. Shlomo Berkovsky: Supervision, Writing - review & editing. David Fraile Navarro: Data curation, Writing - review & editing. Hao Xiong: Data curation, Writing - review & editing. Adam Jaffe: Writing - review & editing.
Declaration of Competing Interest
The authors report no declarations of interest.
References (41)
- et al.
Cough throughout life: children, adults and the senile
Pulm. Pharmacol. Ther.
(2007) - et al.
Alert system design based on experimental findings from long-term unobtrusive monitoring in COPD
Biomed. Signal Process. Control
(2021) Using mutual information in supervised temporal event detection: application to cough detection
Biomed. Signal Process. Control
(2014)- et al.
Complex sounds and auditory images
- et al.
Acoustic event recognition using cochleagram image and convolutional neural networks
Appl. Acoust.
(2019) - et al.
Cough detection by ensembling multiple frequency subband features
Biomed. Signal Process. Control
(2017) - et al.
Cough sound analysis can rapidly diagnose childhood pneumonia
Ann. Biomed. Eng.
(2013) Pertussis (whooping cough)
Pattern Recognition and Machine Learning
(2006)Pertussis (Whooping Cough)
(2019)
Support-vector networks
Mach. Learn.
Discussion paper 2002-119/4
The Origins of Logistic Regression
Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
IEEE Trans. Acoustics Speech Signal Process.
A cochlear frequency-position function for several species - 29 years later
J. Acoust. Soc. Am.
Batch normalization: accelerating deep network training by reducing internal covariate shift
arXiv preprint arXiv:1502.03167
Fundamentals of Digital Image Processing
What is the best multi-stage architecture for object recognition?
Adam: a method for stochastic optimization
arXiv preprint arXiv:1412.6980
Wavelet augmented cough analysis for rapid childhood pneumonia diagnosis
IEEE Trans. Biomed. Eng.
Imagenet classification with deep convolutional neural networks
Advances in Neural Information Processing Systems (NIPS)
Cited by (13)
Croup and pertussis cough sound classification algorithm based on channel attention and multiscale Mel-spectrogram
2024, Biomedical Signal Processing and ControlDetecting acute respiratory diseases in the pediatric population using cough sound features and machine learning: A systematic review
2023, International Journal of Medical InformaticsCough sound detection from raw waveform using SincNet and bidirectional GRU
2023, Biomedical Signal Processing and ControlAutomated Cough Sound Analysis for Detecting Childhood Pneumonia
2024, IEEE Journal of Biomedical and Health InformaticsA Machine Learning based Two-Step Cascading Method for Severe Pertussis Prediction
2023, Research SquareDetecting Childhood Pneumonia Using Handcrafted and Deep Learning Cough Sound Features and Multilayer Perceptron
2023, Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS