Fusion of acoustic and deep features for pig cough sound recognition
Introduction
Cough is an early sign of respiratory disease in pig houses (Racewicz et al., 2021). Typically, it is monitored by resident specialists at each site. However, it heavily relies on manual experience and the continuity of operations to ensure accuracy and timeliness. Therefore, it is highly beneficial to build a monitoring system for continuous and automatic pig cough detection. However, the task of identifying pig coughs is particularly challenging. The acoustics environment is quite complicated in a pig house, with other interference sounds such as screams and sneezes that have similar acoustic properties to coughs (Benjamin and Yik, 2019). To overcome these challenges, many researchers have focused on investigating acoustic features for providing good discrimination between coughs and non-coughs.
Traditionally, acoustic features have been manually extracted from audio waveforms to distinguish pig cough among various sound categories in pig houses. In particular, Mel-frequency cepstral coefficient (MFCC) was frequently used in pig cough classification and early disease detection (Chung et al., 2013). Besides, frequency and time domain features, such as power spectral density (PSD) and root mean square (RMS) were also considered in the classification (Exadaktylos et al., 2008, Ferrari et al., 2008). However, the performance of a single feature was not satisfactory, especially under field conditions. Although the overall classification accuracy was low, it still inspired researchers to robust the model by enhancing representative features (Guarino et al., 2008). Given by its non-stationary characteristic, sound data have poor robustness when the signal-to-noise ratio is low. In addition, sound content is associated with each sound feature, which leads to restrictions on certain kinds of acoustic features. Consequently, it is desirable to conduct an ensemble of various features based on sound properties in pig houses.
Recently, convolutional neural networks (CNNs) have been successfully applied in the field of sound classification in two ways. One is to complete the classification task in an end-to-end manner. For instance, Yin et al. (2021) employed fine-tuned AlexNet to recognize pig cough by transforming sound signals to time-frequency representation spectrograms. Nevertheless, this approach has a high cost in terms of time and hardware configuration resources. The other trend is extracting deep features from CNN and feeding the feature vectors to a light classifier. In other words, CNN is regarded as a feature extractor to reduce the extensive computational cost and accelerate classification efficiency. Additionally, STFT spectrograms as typical TFRs were frequently used in existing works related to the realm of animal sound recognition (Ko et al., 2018). Recently, constant-Q transform (CQT) has been widely exploited in speech analysis and environmental sound classification (Pham et al., 2019), giving a promising direction of investigating various TFRs to achieve more valuable features for pig cough recognition.
In this context, this work aimed to provide a robust and effective feature representation of sounds in pig houses to improve pig cough recognition performance. First, acoustic features from different domains were extracted from sound segments. To reduce feature redundancy, a feature selection strategy was adopted to construct a representative feature set. Then, we built a shallow CNN network to extract deep features. The early fusion analysis of layer and TFRs was investigated during the process to select the optimal deep feature representations. Finally, acoustic and deep features were combined and put into SVM for classification. Overall, the contributions of this work are summarized as follows:
(1) A novel acoustic and deep feature fusion framework for pig cough recognition is proposed.
(2) Deep features extracted from shallow CNN architecture are proven to be a feasible approach to enrich the acoustic features.
(3) The proposed method is evidenced to be a representative feature for pig cough recognition, and it outperforms the results of the existing CNN models.
The remainder of this paper is organized as follows. Section 2 describes the relevant works related to our dataset. The methods involved in the experiment are illustrated in Section 3. The experimental results are shown in Section 4. A discussion of the results is provided in Section 5. Finally, the conclusions are presented in Section 6.
Section snippets
Animals and housing
The data used in this study were collected in a large commercial pig house in Harbin, Heilongjiang Province, China. One hundred and twenty-eight pigs from the crossbred fattening stage (120d, ∼60 kg) of the Northeast Folk and the Great White breed were reared in the barn. Fig. 1 shows the layout of the pig house in our experiment. The barn had a size of 27.5 m × 12.8 m × 3.2 m (length × width × height), and it was subdivided into 21 pens, 12 of which were 4.15 m × 3.6 m (length × width) in two
Methods
In this work, we aimed to propose distinguishable features to complete pig cough recognition tasks effectively. The flowchart of the proposed method is illustrated in Fig. 2. First, acoustic features were obtained from the pre-processed sound segments. Second, one-dimensional sound signals were transformed into two-dimensional TFRs, and then deep features were extracted from the new CNN architecture based on various TFRs. Subsequently, acoustic and deep features were concatenated by early
Deep feature extraction
In order to reduce processing time, we resampled the datasets to 22050 Hz. The toolbox of LibROSA was utilized to extract the manually selected features and to generate both STFT and CQT spectrograms as input representations for CNN. The experiments were conducted using a configuration of an Intel(R) Core (TM) i7-10750H CPU running at 2.60 GHz with 16 GB of memory, and an NVIDIA GeForce GTX 1650 Ti GPU with 4 GB memory. The software used in this work was Python 3.7.
Table 1 shows the performance
Discussion
From Table 1, we confirm that CQT spectrograms show considerable potential as a tool for identifying pig cough sounds under field conditions. Pig screams, as a typical non-cough sound, are common to occur in pig houses. They dominate in high frequencies (5–10 kHz), while pig coughs generally range from 2.5 kHz to 8 kHz. Other sounds such as that of waterflow have lower frequencies below 4 kHz. Based on the characteristics of the various sound components in pig houses, CQT exhibits its distinct
Conclusions
In this work, we extracted deep features from a shallow CNN to enrich acoustic features, in order to improve the recognition performance of pig cough sounds based on the complementary nature of various sounds. We conclude that CQT is more suitable for sound recognition in a pig housing environment than traditional linear STFT. A possible extension to our work may be the application of other samples of bioacoustics for sound classification under field environment, which is of great significance
CRediT authorship contribution statement
Weizheng Shen: Conceptualization, Methodology, Funding acquisition, Investigation. Nan Ji: Software, Methodology, Writing – original draft, Visualization, Formal analysis. Yanling Yin: Writing – review & editing, Funding acquisition, Resources. Baisheng Dai: Funding acquisition. Ding Tu: Data curation. Baihui Sun: Supervision. Handan Hou: Supervision. Shengli Kou: Supervision. Yize Zhao: Supervision.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
Funding: This work was supported by the project of the National Natural Science Foundation of China [grant numbers 32172784, 31902210]; the National Key Research and Development Program of China [grant number 2019YFE0125600]; the University Nursing Program for Young Scholars with Creative Talents in Heilongjiang Province [grant number UNPYSCT-2020092]; and the China Agriculture Research System of MOF and MARA (CARS-36,CARS-35).
References (37)
- et al.
Impact of fully connected layers on performance of convolutional neural networks for image classification
Neurocomputing
(2020) - et al.
Livestock vocalisation classification in farm soundscapes
Comput. Electron. Agric.
(2019) - et al.
Real-time recognition of sick pig cough sounds
Comput. Electron. Agric.
(2008) - et al.
Cough sound analysis to identify respiratory infection in pigs
Comput. Electron. Agric.
(2008) - et al.
Combined application of power spectrum centroid and support vector machines for measurement improvement in optical scanning systems
Signal Process.
(2014) - et al.
Field test of algorithm for automatic cough detection in pig houses
Comput. Electron. Agric.
(2008) - et al.
Ensemble of handcrafted and deep features for urban sound classification
Appl. Acoust.
(2021) The role of sensors, big data and machine learning in modern animal farming
Sens. Bio-Sens. Res.
(2020)- et al.
Robust acoustic scene classification using a multi-spectrogram encoder-decoder framework
Digital Signal Process.
(2021) - et al.
Trends in audio signal feature extraction methods
Appl. Acoust.
(2020)
Variable importance analysis: a comprehensive review
Reliab. Eng. Syst. Saf.
Investigation of acoustic and visual features for acoustic scene classification
Expert Syst. Appl.
Recognition of sick pig cough sounds based on convolutional neural network in field situations
Inform. Process. Agric.
Precision livestock farming in swine welfare: a review for swine practitioners
Animals
Fusing mfcc and lpc features using 1d triplet cnn for speaker recognition in severely degraded audio signals
IEEE Trans. Inform. Forensic Secur.
Automatic detection and recognition of pig wasting diseases using sound data in audio surveillance systems
Sensors
Cited by (19)
Automatic detection of continuous pig cough in a complex piggery environment
2024, Biosystems EngineeringFeature fusion strategy and improved GhostNet for accurate recognition of fish feeding behavior
2023, Computers and Electronics in AgricultureDeep learning bird song recognition based on MFF-ScSEnet
2023, Ecological IndicatorsAn investigation of fusion strategies for boosting pig cough sound recognition
2023, Computers and Electronics in AgricultureFish school feeding behavior quantification using acoustic signal and improved Swin Transformer
2023, Computers and Electronics in AgricultureCitation Excerpt :Recently, deep learning has transformed computer vision and machine learning (LeCun et al., 2015). Deep learning has replaced machine learning to quantify the feeding behavior of fish (Shen et al., 2022). One of the concise 7-layer CNN (Convolutional Neural Networks) models built by LeNet5 can achieve 90 % accuracy (Zhou et al., 2019a).
Monitoring using artificial intelligence reveals critical links between housing conditions and respiratory health in pigs
2024, Journal of Animal Behaviour and Biometeorology