Abstract
Bird species identification is a relevant and time-consuming task for ornithologists and ecologists. With growing amounts of audio annotated data, automatic bird classification using machine learning techniques is an important trend in the scientific community. Analyzing bird behavior and population trends helps detect other organisms in the environment and is an important problem in ecology. Bird populations react quickly to environmental changes, which makes their real time counting and tracking challenging and very useful. A reliable methodology that automatically identifies bird species from audio would therefore be a valuable tool for the experts in different scientific and applicational domains.
The goal of this work is to propose a methodology able to identify bird species by its chirp. In this paper we explore deep learning techniques that are being used in this domain, such as Convolutional Neural Networks and Recurrent Neural Networks to classify the data. In deep learning, audio problems are commonly approached by converting them into images using audio feature extraction techniques such as Mel Spectrograms and Mel Frequency Cepstral Coefficients. We propose and test multiple deep learning and feature extraction combinations in order to find the most suitable approach to this problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Martinsson, J.: Bird Species Identification using Convolutional Neural Networks. Ph.D. thesis (2017). https://odr.chalmers.se/handle/20.500.12380/249467
Gavali, P., et al.: Bird species identification using deep learning. Int. J. Eng. Res. Technol. 8(4) (2019). ISSN 2278-0181. https://www.ijert.org/bird-species-identification-using-deep-learning
Boddapati, V., et al.: Classifying environmental sounds using image recognition networks. Procedia Comput. Sci. 112, 2048–2056 (2017). https://doi.org/10.1016/j.procs.2017.08.250. ISSN 1877-0509
Huang, C.-J., et al.: Frog classification using machine learning techniques. Expert Syst. Appl. 36(2), 3737–3743 (2009). https://doi.org/10.1016/j.eswa.2008.02.059. ISSN 0957-4174
Colonna, J., et al.: Automatic classification of anuran sounds using convolutional neural networks. In: ResearchGate, pp. 73–78 (2016). https://doi.org/10.1145/2948992.2949016
Fagerlund, Seppo: Bird species recognition using support vector machines. EURASIP J. Adv. Signal Process. 2007(1), 1–8 (2007). https://doi.org/10.1155/2007/38637. ISSN 1687-6180
Wielgat, R., et al.: HFCC based recognition of bird species. In: Signal Processing Algorithms, Architectures, Arrangements, and Applications SPA 2007, pp. 129–134 (2007). ISSN 2326-0319. https://doi.org/10.1109/spa.2007.5903313
Roberts, L.: Understanding the mel spectrogram - analytics vidhya - medium. In: Medium (2020). https://medium.com/analytics-vidhya/understanding-the-melspectrogram-fca2afa2ce53
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28, 357–366 (1980). https://doi.org/10.1109/tassp.1980.1163420. ISSN 0096-3518
Kortas, M.: Sound-based bird classification. In: Medium (2020). https://towardsdatascience.com/sound-based-bird-classification-965d0ecacb2b
https://www.deeplearningbook.org/contents/intro.htmls. Accessed 14 Nov 2020
Saha, S.: A comprehensive guide to convolutional neural networks the ELI5 way. In: Medium (2018). ISSN 3211-6453. https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53. Accessed 23 Dec 2019
Nicholson, C.: A beginner’s guide to LSTMs and recurrent neural networks (2019). https://pathmind.com/wiki/lstm. Accessed 27 Dec 2019
Nguyen, M.: Illustrated guide to LSTM’s and GRU’s: a step by step explanation. In: Medium (2019). https://towardsdatascience.com/illustrated-guideto-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21. Accessed 28 Dec 2019
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735. ISSN 0899-7667
Olah, C.: Understanding LSTM networks (2019). https://colah.github
Kostadinov, S.: Understanding GRU networks. In: Medium. https://towardsdatascience.com/understanding-gru-networks-2ef37df6c9be. Accessed 28 Dec 2019
Choi, K., et al.: Convolutional recurrent neural networks for music classification (2016). arXiv:1609.04243[cs.NE]
Lasseck, M.: Bird species identification in soundscapes. In: CLEF (2019)
Kahl, S., et al.: Overview of BirdCLEF 2019: large-scale bird recognition in soundscapes. In: CLEF (2019)
Hiatt, S.: Avian vocalizations - report. In: Kaggle (2019).https://www.kaggle.com/samhiatt/avian-vocalizations-report
Butterworth, S., et al.: On the theory of filter amplifiers. Wirel. Eng. 7(6), 536–541 (1930)
Lyons, J., et al.: “jameslyons/python_speech_features: release v0.6.1”. In: Zenodo. https://doi.org/10.5281/zenodo.3607820. Ph.D. Thesis. https://odr.chalmers.se/handle/20.500.12380/249467
McFee, B., et al.: “librosa/librosa: 0.7.2” (2020). https://doi.org/10.5281/zenodo.3606573
Hunter, J.D.: Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 9(3), 90–95 (2007). https://doi.org/10.1109/mcse.2007.55
Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). Software available from tensorflow.org. https://www.tensorflow.org/
Chollet, F., et al.: Keras (2015). https://keras.io
Adams, S.: Audio-classification (2020). https://github.com/seth814/AudioClassiftion/tree/2f0032d81dcfa3d662cab1c1c4e7e30520f7edd6. Accessed 7 Jun 2020
Doukkali, F.: Batch normalization in neural networks - towards data science. In: Medium (2019). https://towardsdatascience.com/batch-normalization-in-neural-networks-1ac91516821c
Xie, J., Ding, C., Li, W., Cai, C.: Audio-only bird species automated identification method with limited training data based on multi-channel deep convolutional neural networks (2018). arXiv:abs/1803.01107
Acknowledgements
This work is financed by National Funds through the Portuguese funding agency, FCT - Fundação para a Ciência e a Tecnologia, within project UIDB/50014/2020.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Carvalho, S., Gomes, E.F. (2021). Automatic Identification of Bird Species from Audio. In: Nguyen, N.T., Chittayasothorn, S., Niyato, D., Trawiński, B. (eds) Intelligent Information and Database Systems. ACIIDS 2021. Lecture Notes in Computer Science(), vol 12672. Springer, Cham. https://doi.org/10.1007/978-3-030-73280-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-73280-6_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73279-0
Online ISBN: 978-3-030-73280-6
eBook Packages: Computer ScienceComputer Science (R0)