Explainable automated anuran sound classification using improved one-dimensional local binary pattern and Tunable Q Wavelet Transform techniques
Introduction
Anuran (which includes frogs and toads) is a family of short-bodied, tailless amphibians (Womack & Bell, 2020). They are carnivores, and many species live on earth (Yoshioka et al., 2020). Most of the species are found in the tropics (Covarrubias, González, & Gutiérrez-Rodríguez, 2021). But there are live many more anuran in other regions. There has been a severe decrease in the anuran population in recent years. Factors such as global warming, natural environment losses, alien species, environmental pollution, and the human factor emerge as the main reason for the decrease (Carrasco, de Souza, & de Souza Santos, 2021). Anurans play a vital role in the ecological balance and therefore need protection (Ferreira et al., 2018). Anuran is an important species of life for climate change and ecosystem analysis and is closely related to the environment (Xie et al., 2018). By monitoring the population of anuran, ecological changes can be detected early. Population change helps us understand what is happening in our environment (Colonna, Nakamura, & Rosso, 2018). Anuran sounds have different characteristics by species. Therefore, collecting information about their habitat to protect the anuran population and study their developmental processes is essential (Luque, Romero-Lemos, Carrasco, & Barbancho, 2018). It requires going to and observing the approximate habitats of living things that are generally used by ecologists and naturalists to obtain biodiversity data. Data collection using an acoustic sensor helps to observe a larger area and obtain temporal efficiency. The acoustic sensors generate huge amount of acoustic data and the methods of automatic analysis developed using these acquired data are in great demand (Alabi, 2021). Using sound-based methods, anuran calls can be collected without human intervention. Then, the species can be identified automatically using various methods.
Different methods are used for collecting and analyzing anuran and other animal sound signals. These methods can be manual or automatic. Observation in the field at certain times of the day is a manual method (Favorskaya and Pakhirka, 2019, Myers-Smith et al., 2019, Wood et al., 2020). However, it is a complicated process, and not necessary for a specialist to reach the observation site and collect data. Instead, easier and lower-cost automatic data collection methods are used (Hopp et al., 2012, Measey et al., 2017, Xie et al., 2015, Yuan and Ramli, 2012). Using various acoustic sensors (Cai et al., 2007, Saleem and Lee, 2015), sound from camera recordings (Weinstein, 2015), and collecting microphone and sound data (Gibb, Browning, Glover-Kapfer, & Jones, 2019) are common methods. Manual and computer-aided systems are used for data analysis. However, manual analysis of sound recordings collected from sound sensors is ineffective. Since it depends on expert experience, error rates are high. Therefore, using computer-aided intelligent systems increases the success rate. Artificial intelligence-based acoustic analysis studies have been prevalent in recent years. Factors such as low error rate, low data collection, cost, and ease are the focus of researchers on sound-based studies. Many intelligent methods such as machine learning (Huang, Yang, Yang, & Chen, 2009), deep learning (Li, Dai, Metze, Qu, & Das, 2017), artificial neural networks (Salamon & Bello, 2017), genetic algorithms (Qian, Zhang, Baird, & Schuller, 2017), and fuzzy logic (Pandeya & Lee, 2018) are used for sound recognition and classification.
This study proposes a machine learning-based method for classifying anuran sounds from different sources.
The primary motivation of our study is also to classify different anuran sounds with high accuracy. In this work, a new anuran sounds testbed is gathered, and a new lightweight and simple classification modality are presented. Hence, an anuran sound database with 26 classes collected from 4 sources has been created. In this work, our main objective is to propose an accurate feature engineering model for sound classification, and we have tested this model on an anuran sound dataset. As stated in the literature, deep learning models have dominated the machine learning research area since deep learning models attain high classification performances. Therefore, the deep learning structure has been mimicked in our proposed model. We have presented a new version of the 1D-LBP and TQWT has to generate a multilevel feature extraction. INCA feature selector is an automatic optimal feature selector employed to choose the important features.
Many artificial intelligence-based studies have been conducted in the literature to detect different animal sounds. Machine learning, fuzzy logic, genetic algorithm, and deep learning-based approaches are widely used. Often concentrated on the sounds of birds (Koh et al., 2019, Xie et al., 2019, Zhang et al., 2019), bats (Alonso et al., 2015, Henríquez et al., 2014, Oikarinen et al., 2019), insect (Ganchev and Potamitis, 2007, Hedrick, 2002, Noda et al., 2019) and anuran (Dena et al., 2019, Luque et al., 2018, Luque et al., 2019). The main purpose of the studies is to propose algorithms that achieve high accuracies. Studies on frog sounds are generally aimed at identification. A few of these are as follows. The authors (Alonso et al., 2017) proposed a model based on Mel-frequency cepstrum coefficients (MFCCs) and Gaussian mixture model (GMM) to identify anuran species from sound signals. They achieved a correct classification rate of 98.61% for 17 anuran species. Luque et al. (2018) showed that anuran sounds could be classified using nine frame-based classifiers. In their study, they compared the hidden Markov model and MFCCs methods. Sounds of four species of anuran were classified with an accuracy of 87.3%. Colonna et al. (2018) presented a method that uses low-level acoustic descriptors (LLDs) to segment anuran calls automatically. They reported a performance of 97% in classifying 14 anuran species. Yuan and Ramli (2012) suggested a model using the k-nearest Neighbor (k-NN) classifier with MFCC and linear predictive coding (LPC) feature extractors. They attained 98.1% and 93.1% accuracy using MFCC and LPC sound descriptors, respectively. Huang et al. (2014) presented a method using six statistical features, spectral centroid, signal bandwidth, spectral roll-off, threshold-crossing rate, spectral flatness, and average energy feature extractors. Fast-learning neural networks were used as classifiers, and their model yielded 93.4% accuracy in classifying nine species. Bedoya, Isaza, Daza, and López (2014) proposed an unattended methodology for the automatic identification of anurans. A fuzzy classifier and mel-frequency cepstral coefficients were used. Their model classified 13 anuran species with an accuracy rate between 99.38% and 100%. Xie et al. (2018) used naive Bayes and k-nearest neighbor classifiers to classify anuran species. They aimed to classify four different call types of 4 anuran species and achieved 84.0% species classification and 83.7% call classification rates. In their study, the calculated classification rates were low. Huang et al. (2009) proposed a method using spectral centroid, signal bandwidth, threshold-crossing feature extractors, kNN, and SVM classifiers. Their studies showed that five different frog species from the Microhylidae family were classified correctly between 89.05% and 90.30%. As can be seen from the literature review, the previously presented models used a limited number of classes, and some did not achieve high classification accuracy.
We collected a new anuran sound dataset with 26 classes to fill these gaps above. Then, we proposed an accurate hand-modeled sound classification architecture by mimicking deep learning models to generate features with multiple levels.
This study presents an automated anuran sound classification modality based on 1D-LBP (Kaya, Uyar, Tekin, & Yıldırım, 2014) and TQWT (Selesnick, 2011) feature generation network. The main motivation of this model is to demonstrate the bioacoustics sound classification ability of the handcrafted features. Furthermore, we collected a new bioacoustics sound dataset containing 26 classes. A successful feature engineering method has generally used a feature selection function. In this work, we have used an iterative feature selection, INCA, and a shallow classifier (kNN) to show the classification ability of the generated features.
Novelties and contributions of the proposed 1D-LBP and TQWT-based feature generation network are as follows:
- •
A new anuran sound dataset was collected. The collected dataset contains 1536 sounds of the 26 anuran species. This dataset was collected from variable sound sources and was also publicly presented. This dataset can be downloaded from https://www.kaggle.com/datasets/erhanakbal/toads-and-frogs-datasets-anuran URL.
- •
A new, improved version of the 1D-LBP is presented. As stated in the literature, 1D-LBP is a histogram-based feature generation function. Therefore, the histogram and statistics of the histogram are employed as features to improve the feature generation capability of 1D-LBP.
- •
TQWT is one of the most effective decomposition methods in the literature. This work presents a novel 1D-LBP and TQWT feature generation network. Comprehensive features are extracted using 1D-LBP and TQWT feature generation networks. Our developed model attained 99.35% classification accuracy for our collected anuran sounds dataset.
Section snippets
Material
Generally, the datasets used in the literature consist of a small number of species and examples. In addition, datasets in the literature typically consist of signals obtained from a single data source. Our dataset comprises 1536 signals obtained from different sources belonging to 26 classes. Thus, it contributes to obtaining more accurate results using our proposed method. In our study, a mixed dataset collected from 4 different sources was created to test the performance of the proposed
The proposed modality
In this work, we have proposed a new feature engineering model that has been used as handcrafted features. The TQWT decomposition method is used to create levels. 1D-LBP is employed as a feature generation function. As stated literature, 1D-LBP is a histogram-based feature generation model. In this work, the calculated histogram and statistical features are used together. Our used 1D-LBP version generates 270 (256 of them are histograms and 14 are statistical features) features. TQWT is a
Results
This section presents the classification results obtained using 26 anuran species with the proposed 1D-LBP and TQWT methods. The success of the method is presented using accuracy, F1-score, geometric mean, recall, and precision (Chicco and Jurman, 2020, Powers, 2020, Warrens, 2008) parameters.
The calculated results of our method are listed in Table 2.
The confusion matrix obtained using our proposed method is shown in Fig. 4.
In this work, we have used 10-fold cross-validation to generate the
Discussion and conclusions
This work presents a new anuran sound dataset and a new learning model to classify anuran sounds. Our anuran sound classification model also presents an improved feature generation function. This is an improved version of the 1D-LBP. Using this function and TQWT methods, a new feature generation network is presented to extract low-level, medium-level, and high-level features. Q and r parameters of the TQWT were selected to be 1 and 2, respectively. The dimension of the signal was halved at each
CRediT authorship contribution statement
Erhan Akbal: Conceptualization, Validation, Formal analysis, Investigation, Data curation, Writing – original draft, Writing – review & editing, Visualization. Prabal Datta Barua: Validation, Investigation, Writing – review & editing, Visualization. Sengul Dogan: Conceptualization, Methodology, Validation, Investigation, Resources, Writing – original draft, Writing – review & editing. Turker Tuncer: Methodology, Software, Validation, Writing – original draft, Writing – review & editing,
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (66)
- et al.
Automatic anuran identification using noise removal and audio activity detection
Expert Systems with Applications
(2017) - et al.
Advance in the bat acoustic identification systems based on the audible spectrum using nonlinear dynamics characterization
Expert Systems with Applications
(2015) - et al.
Novel automated PD detection system using aspirin pattern with EEG signals
Computers in Biology and Medicine
(2021) - et al.
An accurate valvular heart disorders detection model based on a new dual symmetric tree pattern using stethoscope sounds
Computers in Biology and Medicine
(2022) - et al.
Automatic recognition of anuran species based on syllable identification
Ecological Informatics
(2014) - et al.
Automatic recognition of frog calls using a multi-stage average spectrum
Computers & Mathematics with Applications
(2012) - et al.
Feature evaluation for unsupervised bioacoustic signal segmentation of anuran calls
Expert Systems with Applications
(2018) - et al.
Animal species recognition in the wildlife based on muzzle and shape features using joint CNN
Procedia Computer Science
(2019) - et al.
An automatic acoustic bat identification system based on the audible spectrum
Expert Systems with Applications
(2014) - et al.
Intelligent feature extraction and classification of anuran vocalizations
Applied Soft Computing
(2014)