Acoustic domain mismatch compensation in bird audio detection

Tang, Tiantian; Long, Yanhua; Li, Yijie; Liang, Jiaen

doi:10.1007/s10772-022-09957-w

Acoustic domain mismatch compensation in bird audio detection

Published: 12 January 2022

Volume 25, pages 251–260, (2022)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Tiantian Tang¹,
Yanhua Long ORCID: orcid.org/0000-0003-0924-408X¹,
Yijie Li² &
…
Jiaen Liang²

492 Accesses
Explore all metrics

Abstract

Detecting bird calls in audio is an important task for automatic wildlife monitoring, as well as in citizen science and audio library management. This paper presents front-end acoustic enhancement techniques to handle the acoustic domain mismatch problem in bird detection. A time-domain cross-condition data augmentation (TCDA) method is first proposed to enhance the domain coverage of a fixed training dataset. Then, to eliminate the distortion of stationary noise and enhance the transient events, we investigate a per-channel energy normalization (PCEN) to automatic control the gain of every subband in the mel-frequency spectrogram. Furthermore, a harmonic percussive source separation is investigated to extract robust percussive representations of bird call to alleviate the acoustic mismatch. Our experiments are performed on the Bird Audio Detection Task of the IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events 2018. Extensive results show that the proposed TCDA leads to a relative 5.02% AUC improvements on mismatch conditions. And also on the cross-domain test set, the proposed percussive features (RPFs), and these RPFs with PCEN significantly improve the baseline with conventional log mel-spectrogram features from 81.79% AUC to 84.46% and 88.68%, respectively. Moreover, we find that combing different front-end features can further improve the system performances.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sound event detection in real-life audio using joint spectral and temporal features

Article 28 April 2018

Bird Audio Diarization with Faster R-CNN

Birdsong classification based on multi-feature fusion

Article 08 September 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Adavanne, S., Drossos, K., Çakir, E., & Virtanen, T. (2017). Stacked convolutional and recurrent neural networks for bird audio detection. In Proc. EUSIPCO (pp. 1729–1733).
Bai, J. S., Wu, R., Wang, M., et al. (2018). CIAIC-BAD sysytem for DCASE2018 challenge task3. In DCASE challenge.
Battenberg, E., Child, R., Coates, A., et al. (2017). Reducing bias in production speech models. CoRR, 1705, 04400.
Becker, L., Nelus, A., Gauer, J., Rudolph, L., & Martin, R. (2020). Audio feature extraction for vehicle engine noise classification. In Proc. ICASSP (pp. 711–715).
Berger, F., Freillinger, W., Primus, P., & Reisinger, W. (2018). Bird Audio Detection - DCASE 2018. In DCASE challenge
Duan, S., Towsey, M., Zhang, J., Truskinger, A., Wimmer, J., & Roe, P. (2011). Acoustic component detection for automatic species recognition in environmental monitoring. In Proc. ISSNIP (pp. 514–519).
FitzGerald, D. (2010). Harmonic/percussive separation using median filtering. In Proc. DAFx (pp. DAFX1-DAFX-4).
Franceschi, J.-Y., Fawzi, A., & Fawzi, O. (2018). Robustness of classifiers to uniform $\ell _p$ and gaussian noise. In Proc. AISTATS (pp. 1–25).
Grill, T., Schlüter, J. (2017). Two convolutional neural networks for bird detection in audio signals. In Proc. EUSIPCO (pp. 1764–1768)
Himawan, I., Towsey, M., & Roe, P. (2018). 3D convolutional recurrent neural networks for bird sound detection. In Proc. DCASE workshop pp.108–112.
IEEE AASP challenge on detection and classification of acoustic scenes and events. DCASE2018 Challenge. http://dcase.community/challenge2018/task-bird-audio-detection
Jamali, S., Ahmadpanah, J., & Alipoor, G. (2018). Bird audio detection using supervised weighted NMF. In DCASE challenge
Kong, Q., Iqbal, T., Xu, Y., et al. (2018). DCASE 2018 challenge SURREY cross-task convolutional neural network baseline. In Proc. DCASE Workshop (pp. 217–221).
Krstulovic, S. (2018). Audio event recognition in the smart home. Computational analysis of sound scenes and events (pp. 335–371). Springer.
Google Scholar
Lasseck, M. (2018). Acoustic bird detection with deep convolutional neural networks. In Proc. DCASE Workshop (pp. 143–147)
Liaqat, S., Bozorg, N., Jose, N., Conrey, P., Tamasi, A., & Johnson, M. T. (2018). Domain tuning methods for bird audio detection. In Proc. DCASE Workshop (pp. 163–167)
Lostanlen, V., et al. (2019). Per-channel energy normalization: Why and how. IEEE Signal Processing Letters, 26(1), 39–43.
Article Google Scholar
Lostanlen, V., Salamon, J., Farnsworth, A., Kelling, S., & Bello, J. P. (2018). Birdvox-full-night: A dataset and benchmark for avian flight call detection. In Proc. ICASSP (pp. 266–270).
Mukherjee, R., Banerjee, D., Dey, K., & Ganguly, N. (2018). Convolutional recurrent neural network based bird audio detection. In DCASE challenge.
Müller, D. (2014). Disch. Extending harmonic-percussive separation of audio. In Pro. ISMIR (pp. 611–616).
Ono, N., Miyamoto, K., Kameoka, H., & Sagayama, S. (2008a). A real-time equalizer of harmonic and percussive components in music signals. In Proc. ISMIR (pp. 139–144).
Ono, N., Miyamoto, K., Roux, J. L., Kameoka, H., & Sagayama, S. (2008b). Separation of a monaural audio signal into harmonic/percussive components by complementary diffusion on spectrogram. In Proc. EUSIPCO (pp. 240–244).
Park, D. S., Chan, W., Zhang, Y., et al. (2019). SpecAugment: A simple data augmentation method for automatic speech recognition. In Proc. Interspeech (pp. 2613–2617).
Porter, J., & Boll, S. (1984). Optimal estimators for spectral restoration of noisy speech. In Proc. ICASSP (pp. 53–56).
Schluter, J., & Lehner, B. (2018). Zero-mean convolutions for level-invariant singing voice detection. In Proc. ISMIR (pp. 1–6).
Shen, J., Qu, Y., Zhang, W., & Yu, Y. (2018). Wasserstein distance guided representation learning for domain adaptation, AAAI (pp. 4058–4065).
Song, J., & Li, S. (2018). Bird audio detection using convolutional neural networks and binary neural networks. In DCASE challenge.
Sun, B., Feng, J., & Saenko, K. (2016). Return of frustratingly easy domain adaptation, in Proc. AAAI (pp. 2058–2065).
Vesperini, F., Gabrielli, L., Principi, E., & Squartini, S. (2018). A capsule neural networks based approach for bird audio detection. In DCASE Challenge.
Vincent, L., Salamon, J., Farnsworth, A., et al. (2019). Robust sound event detection in bioacoustic sensor networks. PLoS ONE, 14(10).
Wang, Y., Getreuer, P., Hughes, T., Lyon, R. F., & Saurous, R. A. (2017). Trainable frontend for robust and far-field keyword spotting. In Proc. ICASSP (pp. 5670–5674).
Xie, J., Hu, K., Zhu, M., Yu, J., & Zhu, Q. (2019). Investigation of different CNN-based models for improved bird sound classification. IEEE Access, 7, 175353–175361.
Article Google Scholar
Yu, C. C, Hao, Y., Yang, W. B., & Fu, B. (2018). Author guidelines for DCASE2018 challenge technical report. In DCASE challenge
Zinemanas, P., Cancela, P., & Rocamora, M. (2019). End-to-end convolutional neural networks for sound event detection in urban environments. In Proc. FRUCT (pp. 533–539).

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Grant No. 62071302).

Author information

Authors and Affiliations

Key Innovation Group of Digital Humanities Resource and Research, Shanghai Normal University, Shanghai, 200234, China
Tiantian Tang & Yanhua Long
Unisound AI Technology Co., Ltd., Beijing, China
Yijie Li & Jiaen Liang

Authors

Tiantian Tang
View author publications
You can also search for this author in PubMed Google Scholar
Yanhua Long
View author publications
You can also search for this author in PubMed Google Scholar
Yijie Li
View author publications
You can also search for this author in PubMed Google Scholar
Jiaen Liang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanhua Long.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tang, T., Long, Y., Li, Y. et al. Acoustic domain mismatch compensation in bird audio detection. Int J Speech Technol 25, 251–260 (2022). https://doi.org/10.1007/s10772-022-09957-w

Download citation

Received: 29 October 2020
Accepted: 03 January 2022
Published: 12 January 2022
Issue Date: March 2022
DOI: https://doi.org/10.1007/s10772-022-09957-w

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Acoustic domain mismatch compensation in bird audio detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Sound event detection in real-life audio using joint spectral and temporal features

Bird Audio Diarization with Faster R-CNN

Birdsong classification based on multi-feature fusion

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now