Abstract
Data collection and annotation are time-consuming, resource-intensive processes that often require domain expertise. Existing data collections such as animal sound collections provide valuable data sources, but their utilization is often hindered by the lack of fine-grained labels. In this study, we examine the use of existing weakly supervised methods to extract fine-grained information from existing weakly-annotated data accumulated over time and alleviate the need for collection and annotation of fresh data. We employ TALNet, a Convolutional Recurrent Neural Network (CRNN) model and train it on 60-second sound recordings labeled for the presence of 42 different anuran species and compare it to other models such as BirdNet, a model for detection of bird vocalisation. We conduct the evaluation on 1-second segments, enabling precise sound event localization. Furthermore, we investigate the impact of varying the length of the training input and explore different pooling functions’ effects on the model’s performance on AnuraSet. Finally, we integrate it in an interactive user interface that facilitates training and annotation. Our findings demonstrate the effectiveness of TALNet and BirdNet in harnessing weakly annotated sound collections for wildlife monitoring. Our method not only improves the extraction of information from coarse labels but also simplifies the process of annotating new data for experts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Sugai, L.S.M., Silva, T.S.F., Ribeiro, J.W., Llusia, D.: Terrestrial passive acoustic monitoring: review and perspectives. Bioscience 69(1), 15–25 (2019). https://doi.org/10.1093/biosci/biy147. Accessed 2023-03-01
Tuia, D., et al.: Perspectives in machine learning for wildlife conservation. Nat. Commun. 13(1), 792 (2022)
Gouvêa, T.S., et al.: Interactive machine learning solutions for acoustic monitoring of animal wildlife in biosphere reserves. In: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, pp. 6405–6413. International Joint Conferences on Artificial Intelligence Organization, Macau, SAR, China (2023). https://doi.org/10.24963/ijcai.2023/711, https://www.ijcai.org/proceedings/2023/711. Accessed 16 Aug 2023
Stowell, D.: Computational bioacoustics with deep learning: a review and roadmap. PeerJ 10, 13152 (2022). https://doi.org/10.7717/peerj.13152. Accessed 2023-08-01
Meineke, E.K., Davies, T.J., Daru, B.H., Davis, C.C.: Biological collections for understanding biodiversity in the Anthropocene. Philos. Trans. Royal Soc. B: Biol. Sci. 374(1763), 20170386 (2018). https://doi.org/10.1098/rstb.2017.0386. Accessed 2023-08-01
Dena, S., Rebouças, R., Augusto-Alves, G., Zornosa-Torres, C., Pontes, M.R., Toledo, L.F.: How much are we losing in not depositing anuran sound recordings in scientific collections? Bioacoustics 29(5), 590–601 (2020). https://doi.org/10.1080/09524622.2019.1633567. Accessed 2023-08-01
Sugai, L.S.M., Llusia, D.: Bioacoustic time capsules: using acoustic monitoring to document biodiversity. Ecol. Ind. 99, 149–152 (2019). https://doi.org/10.1016/j.ecolind.2018.12.021. Accessed 2023-08-01
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Simonyan, K., Zisserman, A.: Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv (2015). https://doi.org/10.48550/arXiv.1409.1556 . http://arxiv.org/abs/1409.1556. Accessed 02 Aug 2023
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Kahl, S., Wood, C.M., Eibl, M., Klinck, H.: BirdNET: a deep learning solution for avian diversity monitoring. Eco. Inform. 61, 101236 (2021). https://doi.org/10.1016/j.ecoinf.2021.101236. Accessed 2023-05-12
Tzirakis, P., Shiarella, A., Ewers, R., Schuller, B.W.: Computer audition for continuous rainforest occupancy monitoring: the case of Bornean gibbons’ call detection (2020)
Çakır, E., Parascandolo, G., Heittola, T., Huttunen, H., Virtanen, T.: Convolutional recurrent neural networks for polyphonic sound event detection. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1291–1303 (2017). https://doi.org/10.1109/TASLP.2017.2690575 . Conference Name: IEEE/ACM Transactions on Audio, Speech, and Language Processing
Xie, J., Hu, K., Zhu, M., Guo, Y.: Bioacoustic signal classification in continuous recordings: syllable-segmentation vs sliding-window. Expert Syst. Appl. 152, 113390 (2020)
Dufourq, E., Batist, C., Foquet, R., Durbach, I.: Passive acoustic monitoring of animal populations with transfer learning. Eco. Inform. 70, 101688 (2022). https://doi.org/10.1016/j.ecoinf.2022.101688. Accessed 2023-09-19
Kath, H., Serafini, P.P., Campos, I.B., Gouvea, T., Sonntag, D.: Leveraging transfer learning and active learning for sound event detection in passive acoustic monitoring of wildlife. In: 3rd Annual AAAI Workshop on AI to Accelerate Science and Engineering. AAAI Workshop on AI to Accelerate Science and Engineering (AI2ASE-2024), Befindet Sich AAAI, February 26, Vancouver, BC, Canada (2024)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Wang, Y., Li, J., Metze, F.: A comparison of five multiple instance learning pooling functions for sound event detection with weak labeling. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 31–35 (2019). https://doi.org/10.1109/ICASSP.2019.8682847. ISSN: 2379-190X
Hershey, S., et al.: CNN architectures for large-scale audio classification. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 131–135. IEEE (2017)
Sprengel, E., Jaggi, M., Kilcher, Y., Hofmann, T.: Audio Based Bird Species Identification using Deep Learning Techniques. LifeCLEF 2016 (2016)
Gemmeke, J.F., et al.: Audio set: an ontology and human-labeled dataset for audio events. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 776–780 (2017). https://doi.org/10.1109/ICASSP.2017.7952261
Kumar, A., Raj, B.: Audio event detection using weakly labeled data. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 1038–1047 (2016). https://doi.org/10.1145/2964284.2964310 . arXiv:1605.02401 [cs]. http://arxiv.org/abs/1605.02401. Accessed 13 Sept 2023
Xu, Y., Kong, Q., Wang, W., Plumbley, M.D.: Large-scale weakly supervised audio classification using gated convolutional neural network. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 121–125 (2018). https://doi.org/10.1109/ICASSP.2018.8461975. ISSN: 2379-190X
Miyazaki, K., Komatsu, T., Hayashi, T., Watanabe, S., Toda, T., Takeda, K.: Weakly-supervised sound event detection with self-attention. In: ICASSP 2020 – 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 66–70 (2020). https://doi.org/10.1109/ICASSP40776.2020.9053609 . ISSN: 2379-190X
Xin, Y., Yang, D., Zou, Y.: Audio pyramid transformer with domain adaption for weakly supervised sound event detection and audio classification. In: Proceedings of the Interspeech 2022, pp. 1546–1550 (2022)
Chen, S., et al.: BEATs: Audio Pre-Training with Acoustic Tokenizers. arXiv (2022). https://doi.org/10.48550/arXiv.2212.09058, http://arxiv.org/abs/2212.09058. Accessed 03 Aug 2023
Jiang, J.-J., et al.: Whistle detection and classification for whales based on convolutional neural networks. Appl. Acoust. 150, 169–178 (2019)
Coffey, K.R., Marx, R.E., Neumaier, J.F.: Deepsqueak: a deep learning-based system for detection and analysis of ultrasonic vocalizations. Neuropsychopharmacology 44(5), 859–868 (2019)
Cohen, Y., Nicholson, D.A., Sanchioni, A., Mallaber, E.K., Skidanova, V., Gardner, T.J.: Automated annotation of birdsong with a neural network that segments spectrograms. Elife 11, 63853 (2022)
Cañas, J.S., et al.: A dataset for benchmarking neotropical anuran calls identification in passive acoustic monitoring. Sci. Data 10(1), 771 (2023)
Yang, Y.-Y., et al.: TorchAudio: building blocks for audio and speech processing. arXiv preprint arXiv:2110.15018 (2021)
Hershey, S., et al.: CNN Architectures for Large-Scale Audio Classification. arXiv (2017). https://doi.org/10.48550/arXiv.1609.09430 . http://arxiv.org/abs/1609.09430. Accessed 11 Aug 2023
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Park, D.S., et al.: SpecAugment: a simple data augmentation method for automatic speech recognition. arXiv preprint arXiv:1904.08779 (2019)
Troshani, I., Gouvea, T., Sonntag, D.: Leveraging sound collections for animal species classification with weakly supervised learning. In: 3rd Annual AAAI Workshop on AI to Accelerate Science and Engineering. AAAI Workshop on AI to Accelerate Science and Engineering (AI2ASE-2024), AAAI, Vancouver, Canada (2024)
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc., ??? (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
Shah, A., Kumar, A., Hauptmann, A.G., Raj, B.: A closer look at weak label learning for audio events. CoRR arXiv:abs/1804.09288 (2018)
Acknowledgements
This research is part of the Computational Sustainability & Technology project area(https://cst.dfki.de/), and has been supported by the Ministry for Science and Culture of Lower Saxony (MWK), the Endowed Chair of Applied Artificial Intelligence, Oldenburg University, and DFKI.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Troshani, I., Gouvêa, T.S., Sonntag, D. (2024). Leveraging Weakly Supervised and Multiple Instance Learning for Multi-label Classification of Passive Acoustic Monitoring Data. In: Hotho, A., Rudolph, S. (eds) KI 2024: Advances in Artificial Intelligence. KI 2024. Lecture Notes in Computer Science(), vol 14992 . Springer, Cham. https://doi.org/10.1007/978-3-031-70893-0_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-70893-0_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70892-3
Online ISBN: 978-3-031-70893-0
eBook Packages: Computer ScienceComputer Science (R0)