Automatic Identification of Bird Species from Audio

Carvalho, Silvestre; Gomes, Elsa Ferreira

doi:10.1007/978-3-030-73280-6_4

Silvestre Carvalho¹² &
Elsa Ferreira Gomes^12,13

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12672))

Included in the following conference series:

Asian Conference on Intelligent Information and Database Systems

1787 Accesses

Abstract

Bird species identification is a relevant and time-consuming task for ornithologists and ecologists. With growing amounts of audio annotated data, automatic bird classification using machine learning techniques is an important trend in the scientific community. Analyzing bird behavior and population trends helps detect other organisms in the environment and is an important problem in ecology. Bird populations react quickly to environmental changes, which makes their real time counting and tracking challenging and very useful. A reliable methodology that automatically identifies bird species from audio would therefore be a valuable tool for the experts in different scientific and applicational domains.

The goal of this work is to propose a methodology able to identify bird species by its chirp. In this paper we explore deep learning techniques that are being used in this domain, such as Convolutional Neural Networks and Recurrent Neural Networks to classify the data. In deep learning, audio problems are commonly approached by converting them into images using audio feature extraction techniques such as Mel Spectrograms and Mel Frequency Cepstral Coefficients. We propose and test multiple deep learning and feature extraction combinations in order to find the most suitable approach to this problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Martinsson, J.: Bird Species Identification using Convolutional Neural Networks. Ph.D. thesis (2017). https://odr.chalmers.se/handle/20.500.12380/249467
Gavali, P., et al.: Bird species identification using deep learning. Int. J. Eng. Res. Technol. 8(4) (2019). ISSN 2278-0181. https://www.ijert.org/bird-species-identification-using-deep-learning
Boddapati, V., et al.: Classifying environmental sounds using image recognition networks. Procedia Comput. Sci. 112, 2048–2056 (2017). https://doi.org/10.1016/j.procs.2017.08.250. ISSN 1877-0509
Article Google Scholar
Huang, C.-J., et al.: Frog classification using machine learning techniques. Expert Syst. Appl. 36(2), 3737–3743 (2009). https://doi.org/10.1016/j.eswa.2008.02.059. ISSN 0957-4174
Article Google Scholar
Colonna, J., et al.: Automatic classification of anuran sounds using convolutional neural networks. In: ResearchGate, pp. 73–78 (2016). https://doi.org/10.1145/2948992.2949016
Fagerlund, Seppo: Bird species recognition using support vector machines. EURASIP J. Adv. Signal Process. 2007(1), 1–8 (2007). https://doi.org/10.1155/2007/38637. ISSN 1687-6180
Article MATH Google Scholar
Wielgat, R., et al.: HFCC based recognition of bird species. In: Signal Processing Algorithms, Architectures, Arrangements, and Applications SPA 2007, pp. 129–134 (2007). ISSN 2326-0319. https://doi.org/10.1109/spa.2007.5903313
Roberts, L.: Understanding the mel spectrogram - analytics vidhya - medium. In: Medium (2020). https://medium.com/analytics-vidhya/understanding-the-melspectrogram-fca2afa2ce53
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28, 357–366 (1980). https://doi.org/10.1109/tassp.1980.1163420. ISSN 0096-3518
Article Google Scholar
Kortas, M.: Sound-based bird classification. In: Medium (2020). https://towardsdatascience.com/sound-based-bird-classification-965d0ecacb2b
https://www.deeplearningbook.org/contents/intro.htmls. Accessed 14 Nov 2020
Saha, S.: A comprehensive guide to convolutional neural networks the ELI5 way. In: Medium (2018). ISSN 3211-6453. https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53. Accessed 23 Dec 2019
Nicholson, C.: A beginner’s guide to LSTMs and recurrent neural networks (2019). https://pathmind.com/wiki/lstm. Accessed 27 Dec 2019
Nguyen, M.: Illustrated guide to LSTM’s and GRU’s: a step by step explanation. In: Medium (2019). https://towardsdatascience.com/illustrated-guideto-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21. Accessed 28 Dec 2019
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735. ISSN 0899-7667
Article Google Scholar
Olah, C.: Understanding LSTM networks (2019). https://colah.github
Kostadinov, S.: Understanding GRU networks. In: Medium. https://towardsdatascience.com/understanding-gru-networks-2ef37df6c9be. Accessed 28 Dec 2019
Choi, K., et al.: Convolutional recurrent neural networks for music classification (2016). arXiv:1609.04243[cs.NE]
Lasseck, M.: Bird species identification in soundscapes. In: CLEF (2019)
Google Scholar
Kahl, S., et al.: Overview of BirdCLEF 2019: large-scale bird recognition in soundscapes. In: CLEF (2019)
Google Scholar
Hiatt, S.: Avian vocalizations - report. In: Kaggle (2019).https://www.kaggle.com/samhiatt/avian-vocalizations-report
Butterworth, S., et al.: On the theory of filter amplifiers. Wirel. Eng. 7(6), 536–541 (1930)
Google Scholar
Lyons, J., et al.: “jameslyons/python_speech_features: release v0.6.1”. In: Zenodo. https://doi.org/10.5281/zenodo.3607820. Ph.D. Thesis. https://odr.chalmers.se/handle/20.500.12380/249467
McFee, B., et al.: “librosa/librosa: 0.7.2” (2020). https://doi.org/10.5281/zenodo.3606573
Hunter, J.D.: Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 9(3), 90–95 (2007). https://doi.org/10.1109/mcse.2007.55
Article Google Scholar
Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). Software available from tensorflow.org. https://www.tensorflow.org/
Chollet, F., et al.: Keras (2015). https://keras.io
Adams, S.: Audio-classification (2020). https://github.com/seth814/AudioClassiftion/tree/2f0032d81dcfa3d662cab1c1c4e7e30520f7edd6. Accessed 7 Jun 2020
Doukkali, F.: Batch normalization in neural networks - towards data science. In: Medium (2019). https://towardsdatascience.com/batch-normalization-in-neural-networks-1ac91516821c
Xie, J., Ding, C., Li, W., Cai, C.: Audio-only bird species automated identification method with limited training data based on multi-channel deep convolutional neural networks (2018). arXiv:abs/1803.01107

Download references

Acknowledgements

This work is financed by National Funds through the Portuguese funding agency, FCT - Fundação para a Ciência e a Tecnologia, within project UIDB/50014/2020.

Author information

Authors and Affiliations

Instituto Superior de Engenharia do Porto, Rua Dr. Bernardino de, Almeida, 431, 4200-072, Porto, Portugal
Silvestre Carvalho & Elsa Ferreira Gomes
INESC TEC, Campus da FEUP, Rua Dr. Roberto Frias, 4200-465, Porto, Portugal
Elsa Ferreira Gomes

Authors

Silvestre Carvalho
View author publications
You can also search for this author in PubMed Google Scholar
Elsa Ferreira Gomes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elsa Ferreira Gomes .

Editor information

Editors and Affiliations

Wrocław University of Science and Technology, Wrocław, Poland
Ngoc Thanh Nguyen
King Mongkut's Institute of Technology Ladkrabang, Bangkok, Thailand
Suphamit Chittayasothorn
Nanyang Technological University, Singapore, Singapore
Dusit Niyato
Wrocław University of Science and Technology, Wrocław, Poland
Bogdan Trawiński

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Carvalho, S., Gomes, E.F. (2021). Automatic Identification of Bird Species from Audio. In: Nguyen, N.T., Chittayasothorn, S., Niyato, D., Trawiński, B. (eds) Intelligent Information and Database Systems. ACIIDS 2021. Lecture Notes in Computer Science(), vol 12672. Springer, Cham. https://doi.org/10.1007/978-3-030-73280-6_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-73280-6_4
Published: 05 April 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73279-0
Online ISBN: 978-3-030-73280-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics