Skip to main content

Leveraging Weakly Supervised and Multiple Instance Learning for Multi-label Classification of Passive Acoustic Monitoring Data

  • Conference paper
  • First Online:
KI 2024: Advances in Artificial Intelligence (KI 2024)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14992 ))

Included in the following conference series:

  • 563 Accesses

Abstract

Data collection and annotation are time-consuming, resource-intensive processes that often require domain expertise. Existing data collections such as animal sound collections provide valuable data sources, but their utilization is often hindered by the lack of fine-grained labels. In this study, we examine the use of existing weakly supervised methods to extract fine-grained information from existing weakly-annotated data accumulated over time and alleviate the need for collection and annotation of fresh data. We employ TALNet, a Convolutional Recurrent Neural Network (CRNN) model and train it on 60-second sound recordings labeled for the presence of 42 different anuran species and compare it to other models such as BirdNet, a model for detection of bird vocalisation. We conduct the evaluation on 1-second segments, enabling precise sound event localization. Furthermore, we investigate the impact of varying the length of the training input and explore different pooling functions’ effects on the model’s performance on AnuraSet. Finally, we integrate it in an interactive user interface that facilitates training and annotation. Our findings demonstrate the effectiveness of TALNet and BirdNet in harnessing weakly annotated sound collections for wildlife monitoring. Our method not only improves the extraction of information from coarse labels but also simplifies the process of annotating new data for experts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www2.ib.unicamp.br/fnjv/.

  2. 2.

    https://www.macaulaylibrary.org/.

  3. 3.

    https://xeno-canto.org/.

  4. 4.

    https://dash.plotly.com.

References

  1. Sugai, L.S.M., Silva, T.S.F., Ribeiro, J.W., Llusia, D.: Terrestrial passive acoustic monitoring: review and perspectives. Bioscience 69(1), 15–25 (2019). https://doi.org/10.1093/biosci/biy147. Accessed 2023-03-01

    Article  Google Scholar 

  2. Tuia, D., et al.: Perspectives in machine learning for wildlife conservation. Nat. Commun. 13(1), 792 (2022)

    Article  Google Scholar 

  3. Gouvêa, T.S., et al.: Interactive machine learning solutions for acoustic monitoring of animal wildlife in biosphere reserves. In: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, pp. 6405–6413. International Joint Conferences on Artificial Intelligence Organization, Macau, SAR, China (2023). https://doi.org/10.24963/ijcai.2023/711, https://www.ijcai.org/proceedings/2023/711. Accessed 16 Aug 2023

  4. Stowell, D.: Computational bioacoustics with deep learning: a review and roadmap. PeerJ 10, 13152 (2022). https://doi.org/10.7717/peerj.13152. Accessed 2023-08-01

    Article  Google Scholar 

  5. Meineke, E.K., Davies, T.J., Daru, B.H., Davis, C.C.: Biological collections for understanding biodiversity in the Anthropocene. Philos. Trans. Royal Soc. B: Biol. Sci. 374(1763), 20170386 (2018). https://doi.org/10.1098/rstb.2017.0386. Accessed 2023-08-01

    Article  Google Scholar 

  6. Dena, S., Rebouças, R., Augusto-Alves, G., Zornosa-Torres, C., Pontes, M.R., Toledo, L.F.: How much are we losing in not depositing anuran sound recordings in scientific collections? Bioacoustics 29(5), 590–601 (2020). https://doi.org/10.1080/09524622.2019.1633567. Accessed 2023-08-01

    Article  Google Scholar 

  7. Sugai, L.S.M., Llusia, D.: Bioacoustic time capsules: using acoustic monitoring to document biodiversity. Ecol. Ind. 99, 149–152 (2019). https://doi.org/10.1016/j.ecolind.2018.12.021. Accessed 2023-08-01

    Article  Google Scholar 

  8. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  9. Simonyan, K., Zisserman, A.: Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv (2015). https://doi.org/10.48550/arXiv.1409.1556 . http://arxiv.org/abs/1409.1556. Accessed 02 Aug 2023

  10. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)

    Google Scholar 

  11. Kahl, S., Wood, C.M., Eibl, M., Klinck, H.: BirdNET: a deep learning solution for avian diversity monitoring. Eco. Inform. 61, 101236 (2021). https://doi.org/10.1016/j.ecoinf.2021.101236. Accessed 2023-05-12

    Article  Google Scholar 

  12. Tzirakis, P., Shiarella, A., Ewers, R., Schuller, B.W.: Computer audition for continuous rainforest occupancy monitoring: the case of Bornean gibbons’ call detection (2020)

    Google Scholar 

  13. Çakır, E., Parascandolo, G., Heittola, T., Huttunen, H., Virtanen, T.: Convolutional recurrent neural networks for polyphonic sound event detection. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1291–1303 (2017). https://doi.org/10.1109/TASLP.2017.2690575 . Conference Name: IEEE/ACM Transactions on Audio, Speech, and Language Processing

  14. Xie, J., Hu, K., Zhu, M., Guo, Y.: Bioacoustic signal classification in continuous recordings: syllable-segmentation vs sliding-window. Expert Syst. Appl. 152, 113390 (2020)

    Article  Google Scholar 

  15. Dufourq, E., Batist, C., Foquet, R., Durbach, I.: Passive acoustic monitoring of animal populations with transfer learning. Eco. Inform. 70, 101688 (2022). https://doi.org/10.1016/j.ecoinf.2022.101688. Accessed 2023-09-19

    Article  Google Scholar 

  16. Kath, H., Serafini, P.P., Campos, I.B., Gouvea, T., Sonntag, D.: Leveraging transfer learning and active learning for sound event detection in passive acoustic monitoring of wildlife. In: 3rd Annual AAAI Workshop on AI to Accelerate Science and Engineering. AAAI Workshop on AI to Accelerate Science and Engineering (AI2ASE-2024), Befindet Sich AAAI, February 26, Vancouver, BC, Canada (2024)

    Google Scholar 

  17. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)

    Google Scholar 

  18. Wang, Y., Li, J., Metze, F.: A comparison of five multiple instance learning pooling functions for sound event detection with weak labeling. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 31–35 (2019). https://doi.org/10.1109/ICASSP.2019.8682847. ISSN: 2379-190X

  19. Hershey, S., et al.: CNN architectures for large-scale audio classification. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 131–135. IEEE (2017)

    Google Scholar 

  20. Sprengel, E., Jaggi, M., Kilcher, Y., Hofmann, T.: Audio Based Bird Species Identification using Deep Learning Techniques. LifeCLEF 2016 (2016)

    Google Scholar 

  21. Gemmeke, J.F., et al.: Audio set: an ontology and human-labeled dataset for audio events. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 776–780 (2017). https://doi.org/10.1109/ICASSP.2017.7952261

  22. Kumar, A., Raj, B.: Audio event detection using weakly labeled data. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 1038–1047 (2016). https://doi.org/10.1145/2964284.2964310 . arXiv:1605.02401 [cs]. http://arxiv.org/abs/1605.02401. Accessed 13 Sept 2023

  23. Xu, Y., Kong, Q., Wang, W., Plumbley, M.D.: Large-scale weakly supervised audio classification using gated convolutional neural network. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 121–125 (2018). https://doi.org/10.1109/ICASSP.2018.8461975. ISSN: 2379-190X

  24. Miyazaki, K., Komatsu, T., Hayashi, T., Watanabe, S., Toda, T., Takeda, K.: Weakly-supervised sound event detection with self-attention. In: ICASSP 2020 – 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 66–70 (2020). https://doi.org/10.1109/ICASSP40776.2020.9053609 . ISSN: 2379-190X

  25. Xin, Y., Yang, D., Zou, Y.: Audio pyramid transformer with domain adaption for weakly supervised sound event detection and audio classification. In: Proceedings of the Interspeech 2022, pp. 1546–1550 (2022)

    Google Scholar 

  26. Chen, S., et al.: BEATs: Audio Pre-Training with Acoustic Tokenizers. arXiv (2022). https://doi.org/10.48550/arXiv.2212.09058, http://arxiv.org/abs/2212.09058. Accessed 03 Aug 2023

  27. Jiang, J.-J., et al.: Whistle detection and classification for whales based on convolutional neural networks. Appl. Acoust. 150, 169–178 (2019)

    Google Scholar 

  28. Coffey, K.R., Marx, R.E., Neumaier, J.F.: Deepsqueak: a deep learning-based system for detection and analysis of ultrasonic vocalizations. Neuropsychopharmacology 44(5), 859–868 (2019)

    Article  Google Scholar 

  29. Cohen, Y., Nicholson, D.A., Sanchioni, A., Mallaber, E.K., Skidanova, V., Gardner, T.J.: Automated annotation of birdsong with a neural network that segments spectrograms. Elife 11, 63853 (2022)

    Article  Google Scholar 

  30. Cañas, J.S., et al.: A dataset for benchmarking neotropical anuran calls identification in passive acoustic monitoring. Sci. Data 10(1), 771 (2023)

    Google Scholar 

  31. Yang, Y.-Y., et al.: TorchAudio: building blocks for audio and speech processing. arXiv preprint arXiv:2110.15018 (2021)

  32. Hershey, S., et al.: CNN Architectures for Large-Scale Audio Classification. arXiv (2017). https://doi.org/10.48550/arXiv.1609.09430 . http://arxiv.org/abs/1609.09430. Accessed 11 Aug 2023

  33. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  34. Park, D.S., et al.: SpecAugment: a simple data augmentation method for automatic speech recognition. arXiv preprint arXiv:1904.08779 (2019)

  35. Troshani, I., Gouvea, T., Sonntag, D.: Leveraging sound collections for animal species classification with weakly supervised learning. In: 3rd Annual AAAI Workshop on AI to Accelerate Science and Engineering. AAAI Workshop on AI to Accelerate Science and Engineering (AI2ASE-2024), AAAI, Vancouver, Canada (2024)

    Google Scholar 

  36. Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc., ??? (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf

  37. Shah, A., Kumar, A., Hauptmann, A.G., Raj, B.: A closer look at weak label learning for audio events. CoRR arXiv:abs/1804.09288 (2018)

Download references

Acknowledgements

This research is part of the Computational Sustainability & Technology project area(https://cst.dfki.de/), and has been supported by the Ministry for Science and Culture of Lower Saxony (MWK), the Endowed Chair of Applied Artificial Intelligence, Oldenburg University, and DFKI.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ilira Troshani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Troshani, I., Gouvêa, T.S., Sonntag, D. (2024). Leveraging Weakly Supervised and Multiple Instance Learning for Multi-label Classification of Passive Acoustic Monitoring Data. In: Hotho, A., Rudolph, S. (eds) KI 2024: Advances in Artificial Intelligence. KI 2024. Lecture Notes in Computer Science(), vol 14992 . Springer, Cham. https://doi.org/10.1007/978-3-031-70893-0_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-70893-0_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-70892-3

  • Online ISBN: 978-3-031-70893-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics