Automatic recognition of animal vocalizations using averaged MFCC and linear discriminant analysis
Introduction
Many animals generate sounds either for communication or as a by-product of their living activities such as eating, moving, or flying. Automatic recognition of bioacoustic sounds is valuable for applications such as biological research and environmental monitoring; this is particularly true for detecting and locating animals. In our daily life, we often hear the animal vocalizations rather than see the animals. In general, the animals generate sounds to communicate with members of the same species and thus the animal vocalizations have evolved to be species-specific. Therefore, identifying animal species from their vocalizations is valuable to ecological censusing.
In general, the acoustic signal representing animal vocalizations can be regarded as a sequence of syllables. Thus, a better way to identify animals from their vocalizations is to use a syllable as the acoustic component. It is necessary to segment the syllables of animal vocalizations before the recognition process. Segmentation of speech or audio signals is often based on energy (Lamel et al., 1981, Li et al., 2001, Lu, 2001, Wold et al., 1996, Zhang and Kuo, 2001) and/or zero-crossing rate (Li et al., 2001, Lu, 2001, Tian et al., 2002, Wold et al., 1996, Zhang and Kuo, 2001). A disadvantage of using these segmentation methods to extract syllables from animal vocalizations is that the full syllable cannot be extracted exactly. To overcome this problem, we exploit the frequency information to segment the syllables of animal vocalizations (Harma, 2003).
Once the syllables have been properly segmented, a set of features will be calculated to represent each syllable. The most well-known features for speech/speaker recognition are linear predictive coefficients (LPCs) (Rabiner and Juang, 1993) or Mel-frequency cepstral coefficients (MFCCs) (Picone, 1993, Rabiner and Juang, 1993, Vergin et al., 1999). In this paper, we use the averaged MFCCs in a syllable to identify animals from their sounds due to the fact that MFCCs can represent the spectrum of animal sounds in a compact form. In the next section, we will describe the proposed recognition method for animal vocalizations.
Section snippets
The proposed recognition method for animal vocalizations
The recognition system consists of two parts: the training part and the recognition part. The training part is composed of three main modules: syllable segmentation, averaged MFCCs extraction, and linear discriminant analysis (LDA). The recognition part consists of four modules: syllable segmentation, averaged MFCCs extraction, LDA transformation, and classification. A detailed description of each module will be described below.
Experimental results
Two audio databases of 30 frog calls and 19 cricket calls derived from compact disk are used for the experiments (see Table 3, Table 4). The sampling frequency is 44,100 Hz and each sample is digitized in 16 bits. Most of the calls are field recordings with additional sounds in the background. Some of the calls are generated by multiple individuals vocalizing simultaneously. Each acoustic signal is first segmented into a set of syllables, in which half is used for training and half for testing.
Conclusions
In this paper we propose a method capable of identifying frogs/crickets automatically from the sounds they generate. Each syllable corresponding to a piece of vocalization is first segmented. The averaged MFCCs (AMFCC) over all frames within a syllable are used as vocalization features such that the effect of background noise can be attenuated. Linear discriminant analysis (LDA) is used to reduce the feature dimension and increase the classification accuracy. Experimental results have shown
Acknowledgments
The authors would like to thank the anonymous referees for their valuable comments that improved the representation and quality of this paper. This research was supported in part by Chung Hua University under contract CHU-94-TR-02 and the National Science Council of ROC under contract NSC-92-2213-E-216-020.
References (14)
- et al.
Classification of general audio data for content-based retrieval
Pattern Recognition Letters
(2001) The chorus song of cooperatively breeding laughing kookaburras: characterization and comparison among groups
Ethology
(2004)- et al.
Pattern Classification
(2000) Automatic identification of bird species based on sinusoidal modeling of syllables
Internat. Conf. on Acoust. Speech Signal Process.
(2003)- et al.
Automated recognition of bird song elements from continuous recordings using DTW and HMMs
Journal of the Acoustical Society of America
(1998) - et al.
An improved endpoint detector for isolated word recognition
IEEE Transactions on Acoustics, Speech, and Signal Processing
(1981) Indexing and retrieval of audio: A survey
Multimedia Tools and Applications
(2001)
Cited by (98)
Japanese Black cattle call patterns classification using multiple acoustic features and machine learning models
2023, Computers and Electronics in AgricultureMulti-view features fusion for birdsong classification
2022, Ecological InformaticsCitation Excerpt :Although the method of using bird images for recognition has made some achievements, it has the limitation of narrow recognition range (Anusha and ManiSai, 2022). However, the audio-based birdsong classification has no such limitation in the original data collection (Lee et al., 2006). The birdsong recognition is favored by many researchers because of its high efficiency, no damage, and wide range (Ma, 2016; Xia et al., 2011).
Deep Neural Network for Automatic Classification of Pathological Voice Signals
2022, Journal of VoiceBias correction for linear discriminant analysis
2021, Pattern Recognition LettersBased investigate of beehive sound to detect air pollutants by machine learning
2021, Ecological InformaticsBioacoustic signal classification in continuous recordings: Syllable-segmentation vs sliding-window
2020, Expert Systems with Applications