Abstract:
Automatic detection of bioacoustic sound events is crucial to monitor wildlife. With a tedious annotation process, limited labeled events and large volume of recordings, ...Show MoreMetadata
Abstract:
Automatic detection of bioacoustic sound events is crucial to monitor wildlife. With a tedious annotation process, limited labeled events and large volume of recordings, few-shot learning (FSL) is suitable for such event detections based on a few examples. Typical FSL frameworks for sound detection make use of Convolutional Neural Networks (CNNs) to extract features. However, CNNs fail to capture long-range relationships and global context in audio data. We present an approach that combines the audio spectrogram transformer (AST), a data augmentation regime and transductive inference to detect sound events on the DCASE2022 (Task 5) dataset. Our results show that the AST model performs better on all recordings when compared to a CNN based model. With transductive inference on FSL tasks, our approach has 6% improvement over the baseline AST feature extraction pipeline. Our approach generalizes well over sound events from different animal species, recordings and durations, suggesting its effectiveness for FSL tasks.
Published in: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 04-10 June 2023
Date Added to IEEE Xplore: 05 May 2023
ISBN Information: