Fusion of acoustic and deep features for pig cough sound recognition

https://doi.org/10.1016/j.compag.2022.106994Get rights and content

Highlights

  • Both acoustic and deep features were investigated for pig cough.

  • A novel acoustic and deep feature fusion framework for pig cough recognition was proposed.

  • The proposed method reached an accuracy of 97.35%.

Abstract

The recognition of pig cough sound is a prerequisite for early warning of respiratory diseases in pig houses, which is essential for detecting animal welfare and predicting productivity. With respect to pig cough recognition, it is a highly crucial step to create representative pig sound characteristics. To this end, this paper proposed a feature fusion method by combining acoustic and deep features from audio segments. First, a set of acoustic features from different domains were extracted from sound signals, and recursive feature elimination based on random forest (RF-RFE) was adopted to conduct feature selection. Second, time-frequency representations (TFRs) involving constant-Q transform (CQT) and short-time Fourier transform (STFT) were employed to extract visual features from a fine-tuned convolutional neural network (CNN) model. Finally, the ensemble of the two kinds of features was fed into support vector machine (SVM) by early fusion to identify pig cough sounds. This work investigated the performance of the proposed acoustic and deep features fusion, which achieved 97.35% accuracy for pig cough recognition. The results provide further evidence for the effectiveness of combining acoustic and deep spectrum features as a robust feature representation for pig cough recognition.

Introduction

Cough is an early sign of respiratory disease in pig houses (Racewicz et al., 2021). Typically, it is monitored by resident specialists at each site. However, it heavily relies on manual experience and the continuity of operations to ensure accuracy and timeliness. Therefore, it is highly beneficial to build a monitoring system for continuous and automatic pig cough detection. However, the task of identifying pig coughs is particularly challenging. The acoustics environment is quite complicated in a pig house, with other interference sounds such as screams and sneezes that have similar acoustic properties to coughs (Benjamin and Yik, 2019). To overcome these challenges, many researchers have focused on investigating acoustic features for providing good discrimination between coughs and non-coughs.

Traditionally, acoustic features have been manually extracted from audio waveforms to distinguish pig cough among various sound categories in pig houses. In particular, Mel-frequency cepstral coefficient (MFCC) was frequently used in pig cough classification and early disease detection (Chung et al., 2013). Besides, frequency and time domain features, such as power spectral density (PSD) and root mean square (RMS) were also considered in the classification (Exadaktylos et al., 2008, Ferrari et al., 2008). However, the performance of a single feature was not satisfactory, especially under field conditions. Although the overall classification accuracy was low, it still inspired researchers to robust the model by enhancing representative features (Guarino et al., 2008). Given by its non-stationary characteristic, sound data have poor robustness when the signal-to-noise ratio is low. In addition, sound content is associated with each sound feature, which leads to restrictions on certain kinds of acoustic features. Consequently, it is desirable to conduct an ensemble of various features based on sound properties in pig houses.

Recently, convolutional neural networks (CNNs) have been successfully applied in the field of sound classification in two ways. One is to complete the classification task in an end-to-end manner. For instance, Yin et al. (2021) employed fine-tuned AlexNet to recognize pig cough by transforming sound signals to time-frequency representation spectrograms. Nevertheless, this approach has a high cost in terms of time and hardware configuration resources. The other trend is extracting deep features from CNN and feeding the feature vectors to a light classifier. In other words, CNN is regarded as a feature extractor to reduce the extensive computational cost and accelerate classification efficiency. Additionally, STFT spectrograms as typical TFRs were frequently used in existing works related to the realm of animal sound recognition (Ko et al., 2018). Recently, constant-Q transform (CQT) has been widely exploited in speech analysis and environmental sound classification (Pham et al., 2019), giving a promising direction of investigating various TFRs to achieve more valuable features for pig cough recognition.

In this context, this work aimed to provide a robust and effective feature representation of sounds in pig houses to improve pig cough recognition performance. First, acoustic features from different domains were extracted from sound segments. To reduce feature redundancy, a feature selection strategy was adopted to construct a representative feature set. Then, we built a shallow CNN network to extract deep features. The early fusion analysis of layer and TFRs was investigated during the process to select the optimal deep feature representations. Finally, acoustic and deep features were combined and put into SVM for classification. Overall, the contributions of this work are summarized as follows:

(1) A novel acoustic and deep feature fusion framework for pig cough recognition is proposed.

(2) Deep features extracted from shallow CNN architecture are proven to be a feasible approach to enrich the acoustic features.

(3) The proposed method is evidenced to be a representative feature for pig cough recognition, and it outperforms the results of the existing CNN models.

The remainder of this paper is organized as follows. Section 2 describes the relevant works related to our dataset. The methods involved in the experiment are illustrated in Section 3. The experimental results are shown in Section 4. A discussion of the results is provided in Section 5. Finally, the conclusions are presented in Section 6.

Section snippets

Animals and housing

The data used in this study were collected in a large commercial pig house in Harbin, Heilongjiang Province, China. One hundred and twenty-eight pigs from the crossbred fattening stage (120d, ∼60 kg) of the Northeast Folk and the Great White breed were reared in the barn. Fig. 1 shows the layout of the pig house in our experiment. The barn had a size of 27.5 m × 12.8 m × 3.2 m (length × width × height), and it was subdivided into 21 pens, 12 of which were 4.15 m × 3.6 m (length × width) in two

Methods

In this work, we aimed to propose distinguishable features to complete pig cough recognition tasks effectively. The flowchart of the proposed method is illustrated in Fig. 2. First, acoustic features were obtained from the pre-processed sound segments. Second, one-dimensional sound signals were transformed into two-dimensional TFRs, and then deep features were extracted from the new CNN architecture based on various TFRs. Subsequently, acoustic and deep features were concatenated by early

Deep feature extraction

In order to reduce processing time, we resampled the datasets to 22050 Hz. The toolbox of LibROSA was utilized to extract the manually selected features and to generate both STFT and CQT spectrograms as input representations for CNN. The experiments were conducted using a configuration of an Intel(R) Core (TM) i7-10750H CPU running at 2.60 GHz with 16 GB of memory, and an NVIDIA GeForce GTX 1650 Ti GPU with 4 GB memory. The software used in this work was Python 3.7.

Table 1 shows the performance

Discussion

From Table 1, we confirm that CQT spectrograms show considerable potential as a tool for identifying pig cough sounds under field conditions. Pig screams, as a typical non-cough sound, are common to occur in pig houses. They dominate in high frequencies (5–10 kHz), while pig coughs generally range from 2.5 kHz to 8 kHz. Other sounds such as that of waterflow have lower frequencies below 4 kHz. Based on the characteristics of the various sound components in pig houses, CQT exhibits its distinct

Conclusions

In this work, we extracted deep features from a shallow CNN to enrich acoustic features, in order to improve the recognition performance of pig cough sounds based on the complementary nature of various sounds. We conclude that CQT is more suitable for sound recognition in a pig housing environment than traditional linear STFT. A possible extension to our work may be the application of other samples of bioacoustics for sound classification under field environment, which is of great significance

CRediT authorship contribution statement

Weizheng Shen: Conceptualization, Methodology, Funding acquisition, Investigation. Nan Ji: Software, Methodology, Writing – original draft, Visualization, Formal analysis. Yanling Yin: Writing – review & editing, Funding acquisition, Resources. Baisheng Dai: Funding acquisition. Ding Tu: Data curation. Baihui Sun: Supervision. Handan Hou: Supervision. Shengli Kou: Supervision. Yize Zhao: Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

Funding: This work was supported by the project of the National Natural Science Foundation of China [grant numbers 32172784, 31902210]; the National Key Research and Development Program of China [grant number 2019YFE0125600]; the University Nursing Program for Young Scholars with Creative Talents in Heilongjiang Province [grant number UNPYSCT-2020092]; and the China Agriculture Research System of MOF and MARA (CARS-36,CARS-35).

References (37)

  • P. Wei et al.

    Variable importance analysis: a comprehensive review

    Reliab. Eng. Syst. Saf.

    (2015)
  • J. Xie et al.

    Investigation of acoustic and visual features for acoustic scene classification

    Expert Syst. Appl.

    (2019)
  • Y. Yin et al.

    Recognition of sick pig cough sounds based on convolutional neural network in field situations

    Inform. Process. Agric.

    (2021)
  • Amiriparian, S., Gerczuk, M., Ottl, S., Cummins, N., Freitag, M., Pugachevskiy, S., Baird, A., Schuller, B., 2017....
  • Amiriparian, S., Gerczuk, M., Ottl, S., Cummins, N., Pugachevskiy, S., Schuller, B., 2018. Bag-of-deep-features:...
  • M. Benjamin et al.

    Precision livestock farming in swine welfare: a review for swine practitioners

    Animals

    (2019)
  • A. Chowdhury et al.

    Fusing mfcc and lpc features using 1d triplet cnn for speaker recognition in severely degraded audio signals

    IEEE Trans. Inform. Forensic Secur.

    (2020)
  • Y. Chung et al.

    Automatic detection and recognition of pig wasting diseases using sound data in audio surveillance systems

    Sensors

    (2013)
  • Cited by (19)

    • Fish school feeding behavior quantification using acoustic signal and improved Swin Transformer

      2023, Computers and Electronics in Agriculture
      Citation Excerpt :

      Recently, deep learning has transformed computer vision and machine learning (LeCun et al., 2015). Deep learning has replaced machine learning to quantify the feeding behavior of fish (Shen et al., 2022). One of the concise 7-layer CNN (Convolutional Neural Networks) models built by LeNet5 can achieve 90 % accuracy (Zhou et al., 2019a).

    View all citing articles on Scopus
    View full text