Explainable automated anuran sound classification using improved one-dimensional local binary pattern and Tunable Q Wavelet Transform techniques

https://doi.org/10.1016/j.eswa.2023.120089Get rights and content

Abstract

Classification of animal species using animal sounds is a critical issue for bioacoustics work. Especially the determination of anurans (frogs or toads) species can be used as an indicator of climate change. However, counting and classifying anurans in their natural habitat is challenging. Therefore, computer-assisted intelligent systems must be used to determine anuran types correctly. This work collected a new anuran sound dataset and proposed a hand-modeled sound classification system. The collected dataset contains 1536 anuran sounds belonging to 26 anuran species. Furthermore, an improved one-dimensional local binary pattern (1D-LBP) and Tunable Q Wavelet Transform (TQWT) based feature extraction method has been proposed to generate features at both frequency and space domains. Our proposed hand-modeled anuran sound classification architecture comprises of feature extractor (TQWT + improved 1D-LBP), iterative neighborhood component analysis (INCA) selector and k nearest neighbor (kNN) classifier. Our proposed 1D-LBP and TQWT-based anuran sound classification model has obtained a classification accuracy of 99.35% in classifying 26 anuran species. Moreover, we discussed explainable results. In the future, we plan to validate this work by increasing more species in each group.

Introduction

Anuran (which includes frogs and toads) is a family of short-bodied, tailless amphibians (Womack & Bell, 2020). They are carnivores, and many species live on earth (Yoshioka et al., 2020). Most of the species are found in the tropics (Covarrubias, González, & Gutiérrez-Rodríguez, 2021). But there are live many more anuran in other regions. There has been a severe decrease in the anuran population in recent years. Factors such as global warming, natural environment losses, alien species, environmental pollution, and the human factor emerge as the main reason for the decrease (Carrasco, de Souza, & de Souza Santos, 2021). Anurans play a vital role in the ecological balance and therefore need protection (Ferreira et al., 2018). Anuran is an important species of life for climate change and ecosystem analysis and is closely related to the environment (Xie et al., 2018). By monitoring the population of anuran, ecological changes can be detected early. Population change helps us understand what is happening in our environment (Colonna, Nakamura, & Rosso, 2018). Anuran sounds have different characteristics by species. Therefore, collecting information about their habitat to protect the anuran population and study their developmental processes is essential (Luque, Romero-Lemos, Carrasco, & Barbancho, 2018). It requires going to and observing the approximate habitats of living things that are generally used by ecologists and naturalists to obtain biodiversity data. Data collection using an acoustic sensor helps to observe a larger area and obtain temporal efficiency. The acoustic sensors generate huge amount of acoustic data and the methods of automatic analysis developed using these acquired data are in great demand (Alabi, 2021). Using sound-based methods, anuran calls can be collected without human intervention. Then, the species can be identified automatically using various methods.

Different methods are used for collecting and analyzing anuran and other animal sound signals. These methods can be manual or automatic. Observation in the field at certain times of the day is a manual method (Favorskaya and Pakhirka, 2019, Myers-Smith et al., 2019, Wood et al., 2020). However, it is a complicated process, and not necessary for a specialist to reach the observation site and collect data. Instead, easier and lower-cost automatic data collection methods are used (Hopp et al., 2012, Measey et al., 2017, Xie et al., 2015, Yuan and Ramli, 2012). Using various acoustic sensors (Cai et al., 2007, Saleem and Lee, 2015), sound from camera recordings (Weinstein, 2015), and collecting microphone and sound data (Gibb, Browning, Glover-Kapfer, & Jones, 2019) are common methods. Manual and computer-aided systems are used for data analysis. However, manual analysis of sound recordings collected from sound sensors is ineffective. Since it depends on expert experience, error rates are high. Therefore, using computer-aided intelligent systems increases the success rate. Artificial intelligence-based acoustic analysis studies have been prevalent in recent years. Factors such as low error rate, low data collection, cost, and ease are the focus of researchers on sound-based studies. Many intelligent methods such as machine learning (Huang, Yang, Yang, & Chen, 2009), deep learning (Li, Dai, Metze, Qu, & Das, 2017), artificial neural networks (Salamon & Bello, 2017), genetic algorithms (Qian, Zhang, Baird, & Schuller, 2017), and fuzzy logic (Pandeya & Lee, 2018) are used for sound recognition and classification.

This study proposes a machine learning-based method for classifying anuran sounds from different sources.

The primary motivation of our study is also to classify different anuran sounds with high accuracy. In this work, a new anuran sounds testbed is gathered, and a new lightweight and simple classification modality are presented. Hence, an anuran sound database with 26 classes collected from 4 sources has been created. In this work, our main objective is to propose an accurate feature engineering model for sound classification, and we have tested this model on an anuran sound dataset. As stated in the literature, deep learning models have dominated the machine learning research area since deep learning models attain high classification performances. Therefore, the deep learning structure has been mimicked in our proposed model. We have presented a new version of the 1D-LBP and TQWT has to generate a multilevel feature extraction. INCA feature selector is an automatic optimal feature selector employed to choose the important features.

Many artificial intelligence-based studies have been conducted in the literature to detect different animal sounds. Machine learning, fuzzy logic, genetic algorithm, and deep learning-based approaches are widely used. Often concentrated on the sounds of birds (Koh et al., 2019, Xie et al., 2019, Zhang et al., 2019), bats (Alonso et al., 2015, Henríquez et al., 2014, Oikarinen et al., 2019), insect (Ganchev and Potamitis, 2007, Hedrick, 2002, Noda et al., 2019) and anuran (Dena et al., 2019, Luque et al., 2018, Luque et al., 2019). The main purpose of the studies is to propose algorithms that achieve high accuracies. Studies on frog sounds are generally aimed at identification. A few of these are as follows. The authors (Alonso et al., 2017) proposed a model based on Mel-frequency cepstrum coefficients (MFCCs) and Gaussian mixture model (GMM) to identify anuran species from sound signals. They achieved a correct classification rate of 98.61% for 17 anuran species. Luque et al. (2018) showed that anuran sounds could be classified using nine frame-based classifiers. In their study, they compared the hidden Markov model and MFCCs methods. Sounds of four species of anuran were classified with an accuracy of 87.3%. Colonna et al. (2018) presented a method that uses low-level acoustic descriptors (LLDs) to segment anuran calls automatically. They reported a performance of 97% in classifying 14 anuran species. Yuan and Ramli (2012) suggested a model using the k-nearest Neighbor (k-NN) classifier with MFCC and linear predictive coding (LPC) feature extractors. They attained 98.1% and 93.1% accuracy using MFCC and LPC sound descriptors, respectively. Huang et al. (2014) presented a method using six statistical features, spectral centroid, signal bandwidth, spectral roll-off, threshold-crossing rate, spectral flatness, and average energy feature extractors. Fast-learning neural networks were used as classifiers, and their model yielded 93.4% accuracy in classifying nine species. Bedoya, Isaza, Daza, and López (2014) proposed an unattended methodology for the automatic identification of anurans. A fuzzy classifier and mel-frequency cepstral coefficients were used. Their model classified 13 anuran species with an accuracy rate between 99.38% and 100%. Xie et al. (2018) used naive Bayes and k-nearest neighbor classifiers to classify anuran species. They aimed to classify four different call types of 4 anuran species and achieved 84.0% species classification and 83.7% call classification rates. In their study, the calculated classification rates were low. Huang et al. (2009) proposed a method using spectral centroid, signal bandwidth, threshold-crossing feature extractors, kNN, and SVM classifiers. Their studies showed that five different frog species from the Microhylidae family were classified correctly between 89.05% and 90.30%. As can be seen from the literature review, the previously presented models used a limited number of classes, and some did not achieve high classification accuracy.

We collected a new anuran sound dataset with 26 classes to fill these gaps above. Then, we proposed an accurate hand-modeled sound classification architecture by mimicking deep learning models to generate features with multiple levels.

This study presents an automated anuran sound classification modality based on 1D-LBP (Kaya, Uyar, Tekin, & Yıldırım, 2014) and TQWT (Selesnick, 2011) feature generation network. The main motivation of this model is to demonstrate the bioacoustics sound classification ability of the handcrafted features. Furthermore, we collected a new bioacoustics sound dataset containing 26 classes. A successful feature engineering method has generally used a feature selection function. In this work, we have used an iterative feature selection, INCA, and a shallow classifier (kNN) to show the classification ability of the generated features.

Novelties and contributions of the proposed 1D-LBP and TQWT-based feature generation network are as follows:

  • A new anuran sound dataset was collected. The collected dataset contains 1536 sounds of the 26 anuran species. This dataset was collected from variable sound sources and was also publicly presented. This dataset can be downloaded from https://www.kaggle.com/datasets/erhanakbal/toads-and-frogs-datasets-anuran URL.

  • A new, improved version of the 1D-LBP is presented. As stated in the literature, 1D-LBP is a histogram-based feature generation function. Therefore, the histogram and statistics of the histogram are employed as features to improve the feature generation capability of 1D-LBP.

  • TQWT is one of the most effective decomposition methods in the literature. This work presents a novel 1D-LBP and TQWT feature generation network. Comprehensive features are extracted using 1D-LBP and TQWT feature generation networks. Our developed model attained 99.35% classification accuracy for our collected anuran sounds dataset.

Section snippets

Material

Generally, the datasets used in the literature consist of a small number of species and examples. In addition, datasets in the literature typically consist of signals obtained from a single data source. Our dataset comprises 1536 signals obtained from different sources belonging to 26 classes. Thus, it contributes to obtaining more accurate results using our proposed method. In our study, a mixed dataset collected from 4 different sources was created to test the performance of the proposed

The proposed modality

In this work, we have proposed a new feature engineering model that has been used as handcrafted features. The TQWT decomposition method is used to create levels. 1D-LBP is employed as a feature generation function. As stated literature, 1D-LBP is a histogram-based feature generation model. In this work, the calculated histogram and statistical features are used together. Our used 1D-LBP version generates 270 (256 of them are histograms and 14 are statistical features) features. TQWT is a

Results

This section presents the classification results obtained using 26 anuran species with the proposed 1D-LBP and TQWT methods. The success of the method is presented using accuracy, F1-score, geometric mean, recall, and precision (Chicco and Jurman, 2020, Powers, 2020, Warrens, 2008) parameters.

The calculated results of our method are listed in Table 2.

The confusion matrix obtained using our proposed method is shown in Fig. 4.

In this work, we have used 10-fold cross-validation to generate the

Discussion and conclusions

This work presents a new anuran sound dataset and a new learning model to classify anuran sounds. Our anuran sound classification model also presents an improved feature generation function. This is an improved version of the 1D-LBP. Using this function and TQWT methods, a new feature generation network is presented to extract low-level, medium-level, and high-level features. Q and r parameters of the TQWT were selected to be 1 and 2, respectively. The dimension of the signal was halved at each

CRediT authorship contribution statement

Erhan Akbal: Conceptualization, Validation, Formal analysis, Investigation, Data curation, Writing – original draft, Writing – review & editing, Visualization. Prabal Datta Barua: Validation, Investigation, Writing – review & editing, Visualization. Sengul Dogan: Conceptualization, Methodology, Validation, Investigation, Resources, Writing – original draft, Writing – review & editing. Turker Tuncer: Methodology, Software, Validation, Writing – original draft, Writing – review & editing,

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (66)

  • C.-J. Huang et al.

    Frog classification using machine learning techniques

    Expert Systems with Applications

    (2009)
  • V. Jahmunah et al.

    Uncertainty quantification in DenseNet model using myocardial infarction ECG signals

    Computer Methods and Programs in Biomedicine

    (2023)
  • Y. Kaya et al.

    1D-local binary pattern based feature extraction for classification of epileptic EEG signals

    Applied Mathematics and Computation

    (2014)
  • A. Luque et al.

    Non-sequential automatic classification of anuran sounds for the estimation of climate-change indicators

    Expert Systems with Applications

    (2018)
  • S. Patidar et al.

    Classification of cardiac sound signals using constrained tunable-Q wavelet transform

    Expert Systems with Applications

    (2014)
  • S. Patidar et al.

    Automatic diagnosis of septal defects based on tunable-Q wavelet transform of cardiac sound signals

    Expert Systems with Applications

    (2015)
  • S. Raghu et al.

    Classification of focal and non-focal EEG signals using neighborhood component analysis and machine learning algorithms

    Expert Systems with Applications

    (2018)
  • S. Tan

    Neighbor-weighted k-nearest neighbor for unbalanced text corpus

    Expert Systems with Applications

    (2005)
  • D. Tanko et al.

    Shoelace pattern-based speech emotion recognition of the lecturers in distance education: ShoePat23

    Applied Acoustics

    (2022)
  • B. Tasar et al.

    Accurate respiratory sound classification model based on piccolo pattern

    Applied Acoustics

    (2022)
  • J. Xie et al.

    Acoustic classification of frog within-species and species-specific calls

    Applied Acoustics

    (2018)
  • J. Xie et al.

    Acoustic classification of australian anurans using syllable features

  • O. Yaman et al.

    DES-Pat: A novel DES pattern-based propeller recognition method using underwater acoustical sounds

    Applied Acoustics

    (2021)
  • X. Zhang et al.

    Spectrogram-frame linear network and continuous frame sequence for bird sound classification

    Ecological Informatics

    (2019)
  • Alabi, M. (2021). Classification of Anuran Frog Species Using Machine Learning. arXiv preprint...
  • AmphibiaWeb. (2020). www.amphibiaweb.com. In (Vol....
  • J. Cai et al.

    Sensor network for the monitoring of ecosystem: Bird species recognition

  • G.H. Carrasco et al.

    Effect of multiple stressors and population decline of frogs

    Environmental Science and Pollution Research

    (2021)
  • D. Chicco et al.

    The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation

    BMC Genomics

    (2020)
  • S. Covarrubias et al.

    Effects of natural and anthropogenic features on functional connectivity of anurans: A review of landscape genetics studies in temperate, subtropical and tropical species

    Journal of Zoology

    (2021)
  • S. Dena et al.

    How much are we losing in not depositing anuran sound recordings in scientific collections?

    Bioacoustics

    (2019)
  • Ecologiyasia. (2020). www.ecologyasia.com/. In (Vol....
  • L. Ferreira et al.

    What do insects, anurans, birds, and mammals have to say about soundscape indices in a tropical savanna

    Journal of Ecoacoustics

    (2018)
  • Cited by (0)

    View full text