Leveraging Weakly Supervised and Multiple Instance Learning for Multi-label Classification of Passive Acoustic Monitoring Data

Troshani, Ilira; Gouvêa, Thiago S.; Sonntag, Daniel

doi:10.1007/978-3-031-70893-0_19

Ilira Troshani^9,10,
Thiago S. Gouvêa^9,10 &
Daniel Sonntag^9,10

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14992 ))

Included in the following conference series:

German Conference on Artificial Intelligence (Künstliche Intelligenz)

563 Accesses

Abstract

Data collection and annotation are time-consuming, resource-intensive processes that often require domain expertise. Existing data collections such as animal sound collections provide valuable data sources, but their utilization is often hindered by the lack of fine-grained labels. In this study, we examine the use of existing weakly supervised methods to extract fine-grained information from existing weakly-annotated data accumulated over time and alleviate the need for collection and annotation of fresh data. We employ TALNet, a Convolutional Recurrent Neural Network (CRNN) model and train it on 60-second sound recordings labeled for the presence of 42 different anuran species and compare it to other models such as BirdNet, a model for detection of bird vocalisation. We conduct the evaluation on 1-second segments, enabling precise sound event localization. Furthermore, we investigate the impact of varying the length of the training input and explore different pooling functions’ effects on the model’s performance on AnuraSet. Finally, we integrate it in an interactive user interface that facilitates training and annotation. Our findings demonstrate the effectiveness of TALNet and BirdNet in harnessing weakly annotated sound collections for wildlife monitoring. Our method not only improves the extraction of information from coarse labels but also simplifies the process of annotating new data for experts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Identifying bird species by their calls in Soundscapes

Article Open access 20 March 2023

Using the BirdNET algorithm to identify wolves, coyotes, and potentially their interactions in a large audio dataset

Article 29 November 2023

Bird Species Recognition in Soundscapes with Self-supervised Pre-training

Notes

References

Sugai, L.S.M., Silva, T.S.F., Ribeiro, J.W., Llusia, D.: Terrestrial passive acoustic monitoring: review and perspectives. Bioscience 69(1), 15–25 (2019). https://doi.org/10.1093/biosci/biy147. Accessed 2023-03-01
Article Google Scholar
Tuia, D., et al.: Perspectives in machine learning for wildlife conservation. Nat. Commun. 13(1), 792 (2022)
Article Google Scholar
Gouvêa, T.S., et al.: Interactive machine learning solutions for acoustic monitoring of animal wildlife in biosphere reserves. In: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, pp. 6405–6413. International Joint Conferences on Artificial Intelligence Organization, Macau, SAR, China (2023). https://doi.org/10.24963/ijcai.2023/711, https://www.ijcai.org/proceedings/2023/711. Accessed 16 Aug 2023
Stowell, D.: Computational bioacoustics with deep learning: a review and roadmap. PeerJ 10, 13152 (2022). https://doi.org/10.7717/peerj.13152. Accessed 2023-08-01
Article Google Scholar
Meineke, E.K., Davies, T.J., Daru, B.H., Davis, C.C.: Biological collections for understanding biodiversity in the Anthropocene. Philos. Trans. Royal Soc. B: Biol. Sci. 374(1763), 20170386 (2018). https://doi.org/10.1098/rstb.2017.0386. Accessed 2023-08-01
Article Google Scholar
Dena, S., Rebouças, R., Augusto-Alves, G., Zornosa-Torres, C., Pontes, M.R., Toledo, L.F.: How much are we losing in not depositing anuran sound recordings in scientific collections? Bioacoustics 29(5), 590–601 (2020). https://doi.org/10.1080/09524622.2019.1633567. Accessed 2023-08-01
Article Google Scholar
Sugai, L.S.M., Llusia, D.: Bioacoustic time capsules: using acoustic monitoring to document biodiversity. Ecol. Ind. 99, 149–152 (2019). https://doi.org/10.1016/j.ecolind.2018.12.021. Accessed 2023-08-01
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Simonyan, K., Zisserman, A.: Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv (2015). https://doi.org/10.48550/arXiv.1409.1556 . http://arxiv.org/abs/1409.1556. Accessed 02 Aug 2023
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Google Scholar
Kahl, S., Wood, C.M., Eibl, M., Klinck, H.: BirdNET: a deep learning solution for avian diversity monitoring. Eco. Inform. 61, 101236 (2021). https://doi.org/10.1016/j.ecoinf.2021.101236. Accessed 2023-05-12
Article Google Scholar
Tzirakis, P., Shiarella, A., Ewers, R., Schuller, B.W.: Computer audition for continuous rainforest occupancy monitoring: the case of Bornean gibbons’ call detection (2020)
Google Scholar
Çakır, E., Parascandolo, G., Heittola, T., Huttunen, H., Virtanen, T.: Convolutional recurrent neural networks for polyphonic sound event detection. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1291–1303 (2017). https://doi.org/10.1109/TASLP.2017.2690575 . Conference Name: IEEE/ACM Transactions on Audio, Speech, and Language Processing
Xie, J., Hu, K., Zhu, M., Guo, Y.: Bioacoustic signal classification in continuous recordings: syllable-segmentation vs sliding-window. Expert Syst. Appl. 152, 113390 (2020)
Article Google Scholar
Dufourq, E., Batist, C., Foquet, R., Durbach, I.: Passive acoustic monitoring of animal populations with transfer learning. Eco. Inform. 70, 101688 (2022). https://doi.org/10.1016/j.ecoinf.2022.101688. Accessed 2023-09-19
Article Google Scholar
Kath, H., Serafini, P.P., Campos, I.B., Gouvea, T., Sonntag, D.: Leveraging transfer learning and active learning for sound event detection in passive acoustic monitoring of wildlife. In: 3rd Annual AAAI Workshop on AI to Accelerate Science and Engineering. AAAI Workshop on AI to Accelerate Science and Engineering (AI2ASE-2024), Befindet Sich AAAI, February 26, Vancouver, BC, Canada (2024)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Wang, Y., Li, J., Metze, F.: A comparison of five multiple instance learning pooling functions for sound event detection with weak labeling. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 31–35 (2019). https://doi.org/10.1109/ICASSP.2019.8682847. ISSN: 2379-190X
Hershey, S., et al.: CNN architectures for large-scale audio classification. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 131–135. IEEE (2017)
Google Scholar
Sprengel, E., Jaggi, M., Kilcher, Y., Hofmann, T.: Audio Based Bird Species Identification using Deep Learning Techniques. LifeCLEF 2016 (2016)
Google Scholar
Gemmeke, J.F., et al.: Audio set: an ontology and human-labeled dataset for audio events. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 776–780 (2017). https://doi.org/10.1109/ICASSP.2017.7952261
Kumar, A., Raj, B.: Audio event detection using weakly labeled data. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 1038–1047 (2016). https://doi.org/10.1145/2964284.2964310 . arXiv:1605.02401 [cs]. http://arxiv.org/abs/1605.02401. Accessed 13 Sept 2023
Xu, Y., Kong, Q., Wang, W., Plumbley, M.D.: Large-scale weakly supervised audio classification using gated convolutional neural network. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 121–125 (2018). https://doi.org/10.1109/ICASSP.2018.8461975. ISSN: 2379-190X
Miyazaki, K., Komatsu, T., Hayashi, T., Watanabe, S., Toda, T., Takeda, K.: Weakly-supervised sound event detection with self-attention. In: ICASSP 2020 – 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 66–70 (2020). https://doi.org/10.1109/ICASSP40776.2020.9053609 . ISSN: 2379-190X
Xin, Y., Yang, D., Zou, Y.: Audio pyramid transformer with domain adaption for weakly supervised sound event detection and audio classification. In: Proceedings of the Interspeech 2022, pp. 1546–1550 (2022)
Google Scholar
Chen, S., et al.: BEATs: Audio Pre-Training with Acoustic Tokenizers. arXiv (2022). https://doi.org/10.48550/arXiv.2212.09058, http://arxiv.org/abs/2212.09058. Accessed 03 Aug 2023
Jiang, J.-J., et al.: Whistle detection and classification for whales based on convolutional neural networks. Appl. Acoust. 150, 169–178 (2019)
Google Scholar
Coffey, K.R., Marx, R.E., Neumaier, J.F.: Deepsqueak: a deep learning-based system for detection and analysis of ultrasonic vocalizations. Neuropsychopharmacology 44(5), 859–868 (2019)
Article Google Scholar
Cohen, Y., Nicholson, D.A., Sanchioni, A., Mallaber, E.K., Skidanova, V., Gardner, T.J.: Automated annotation of birdsong with a neural network that segments spectrograms. Elife 11, 63853 (2022)
Article Google Scholar
Cañas, J.S., et al.: A dataset for benchmarking neotropical anuran calls identification in passive acoustic monitoring. Sci. Data 10(1), 771 (2023)
Google Scholar
Yang, Y.-Y., et al.: TorchAudio: building blocks for audio and speech processing. arXiv preprint arXiv:2110.15018 (2021)
Hershey, S., et al.: CNN Architectures for Large-Scale Audio Classification. arXiv (2017). https://doi.org/10.48550/arXiv.1609.09430 . http://arxiv.org/abs/1609.09430. Accessed 11 Aug 2023
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Park, D.S., et al.: SpecAugment: a simple data augmentation method for automatic speech recognition. arXiv preprint arXiv:1904.08779 (2019)
Troshani, I., Gouvea, T., Sonntag, D.: Leveraging sound collections for animal species classification with weakly supervised learning. In: 3rd Annual AAAI Workshop on AI to Accelerate Science and Engineering. AAAI Workshop on AI to Accelerate Science and Engineering (AI2ASE-2024), AAAI, Vancouver, Canada (2024)
Google Scholar
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc., ??? (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
Shah, A., Kumar, A., Hauptmann, A.G., Raj, B.: A closer look at weak label learning for audio events. CoRR arXiv:abs/1804.09288 (2018)

Download references

Acknowledgements

This research is part of the Computational Sustainability & Technology project area(https://cst.dfki.de/), and has been supported by the Ministry for Science and Culture of Lower Saxony (MWK), the Endowed Chair of Applied Artificial Intelligence, Oldenburg University, and DFKI.

Author information

Authors and Affiliations

Interactive Machine Learning, German Research Centre for Artificial Intelligence (DFKI), Kaiserslautern, Germany
Ilira Troshani, Thiago S. Gouvêa & Daniel Sonntag
Carl von Ossietzky University of Oldenburg, Oldenburg, Germany
Ilira Troshani, Thiago S. Gouvêa & Daniel Sonntag

Authors

Ilira Troshani
View author publications
You can also search for this author in PubMed Google Scholar
Thiago S. Gouvêa
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Sonntag
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ilira Troshani .

Editor information

Editors and Affiliations

Julius-Maximilians-Universität Würzburg, Würzburg, Germany
Andreas Hotho
TU Dresden, Dresden, Germany
Sebastian Rudolph

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Troshani, I., Gouvêa, T.S., Sonntag, D. (2024). Leveraging Weakly Supervised and Multiple Instance Learning for Multi-label Classification of Passive Acoustic Monitoring Data. In: Hotho, A., Rudolph, S. (eds) KI 2024: Advances in Artificial Intelligence. KI 2024. Lecture Notes in Computer Science(), vol 14992 . Springer, Cham. https://doi.org/10.1007/978-3-031-70893-0_19

Download citation

DOI: https://doi.org/10.1007/978-3-031-70893-0_19
Published: 30 August 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70892-3
Online ISBN: 978-3-031-70893-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Leveraging Weakly Supervised and Multiple Instance Learning for Multi-label Classification of Passive Acoustic Monitoring Data