Birdsong classification based on multi feature channel fusion

Liu, Zhihua; Chen, Wenjie; Chen, Aibin; Zhou, Guoxiong; Yi, Jizheng

doi:10.1007/s11042-022-12570-3

Birdsong classification based on multi feature channel fusion

Published: 28 February 2022

Volume 81, pages 15469–15490, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Zhihua Liu¹,
Wenjie Chen¹,
Aibin Chen ORCID: orcid.org/0000-0003-4410-412X¹,
Guoxiong Zhou¹ &
…
Jizheng Yi¹

333 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

Aiming at the essential feature of the time-continuity of birdsong in nature, this paper proposed a birdsong classification model composed of two feature channels, which combines the features of time domain and time frequency domain. In order to make better use of the features, we used the improved average threshold method to denoise the original time-domain waveform features to reduce the influence of noise features. The most suitable feature extractor and the best fusion method of these two features are discussed. In this paper, the 3D convolutional neural network (3DCNN) and 2D convolutional neural network (2DCNN) were respectively applied as feature extractors of log_mel spectrum and waveform images. Then the advanced feature, which was extracted from these two feature channels, was fused in the middle stage, and the output enhanced feature was used as the input of double gated recurrent unit (d-GRU) network. In the work, birdsongs of four species from Xeno-Canto were selected for testing. The results showed that these three methods had improved the classification effect: feature fusion method in time domain and time-frequency domain, weighted average threshold noise reduction method and the method of extracting birdsong features via different types of feature extractors. The method of this paper had achieved mean average precision (MAP) of 95.9% in the classification comparison experiments, which was an inspiring outcome.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improved Convolutional Neural Networks for Acoustic Event Classification

Article 08 December 2018

Birdsong classification based on multi-feature fusion

Article 08 September 2021

Convolutional Neural Network-Gated Recurrent Unit Neural Network with Feature Fusion for Environmental Sound Classification

Article 01 July 2021

References

Abdoli S, Cardinal P, Koerich AL (2019) End-to-end environmental sound classification using a 1D convolutional neural network. Expert Syst Appl 136:252–263
Article Google Scholar
Adavanne S, Drossos K, Çakir E, Virtanen T (2017) Stacked convolutional and recurrent neural networks for bird audio detection. In: 2017 25th European signal processing conference (EUSIPCO). IEEE, pp 1729–1733
Chapter Google Scholar
Bae SH, Choi I, Kim NS (2016) Acoustic scene classification using parallel combination of LSTM and CNN. In: Proceedings of the detection and classification of acoustic scenes and events 2016 workshop (DCASE2016), pp 11–15
Google Scholar
Bai S, Kolter JZ, Koltun V (2018) An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271
Bhatt G, Gupta A, Arora A, Raman B (2018) Acoustic features fusion using attentive multi-channel deep architecture. arXiv preprint arXiv:1811.00936
Bold N, Zhang C, Akashi T (2019) Cross-domain deep feature combination for bird species classification with audio-visual data. IEICE Trans Inf Syst 102(10):2033–2042
Article Google Scholar
Briggs F, Lakshminarayanan B, Neal L, Fern XZ, Raich R, Hadley SJ, … Betts MG (2012) Acoustic classification of multiple simultaneous bird species: a multi-instance multi-label approach. The Journal of the Acoustical Society of America 131(6):4640–4650
Article Google Scholar
Chen X, Zhou G, Chen A, Yi J, Zhang W, Hu Y (2020) Identification of tomato leaf diseases based on combination of ABCK-BWTR and B-ARNet. Comput Electron Agric 178:105730
Article Google Scholar
Chou CH, Lee CH, Ni HW (2007) Bird species recognition by comparing the HMMs of the syllables. In: Second international conference on innovative computing, Informatio and control (ICICIC 2007). IEEE, pp 143–143
Chapter Google Scholar
Dennis JW (2014) Sound event recognition in unstructured environments using spectrogram image processing (Doctoral dissertation).
Fagerlund S (2007) Bird species recognition using support vector machines. EURASIP Journal on Advances in Signal Processing 2007(1):038637
Article Google Scholar
Graves A, Mohamed AR, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, pp 6645–6649
Chapter Google Scholar
Grill T, Schlüter J (2017) Two convolutional neural networks for bird detection in audio signals. In: 2017 25th European signal processing conference (EUSIPCO). IEEE, pp 1764–1768
Chapter Google Scholar
Hessel M, Modayil J, Van Hasselt H, Schaul T, Ostrovski G, Dabney W, Silver D (2017). Rainbow: Combining improvements in deep reinforcement learning. arXiv preprint arXiv:1710.02298
Himawan I, Towsey M, Roe P (2018) 3D convolution recurrent neural networks for bird sound detection
Kahl S, Wilhelm-Stein T, Hussein H, Klinck H, Kowerko D, Ritter M, Eibl M (2017) Large-Scale Bird Sound Classification using Convolutional Neural Networks. In: CLEF (Working Notes)
Google Scholar
Kim J, Lee Y, Kim D, Ko H (2020) Temporal attention based animal sound classification. The Journal of the Acoustical Society of Korea 39(5):406–413
Google Scholar
Koh CY, Chang JY, Tai CL, Huang DY, Hsieh HH, Liu YW (2019) Bird sound classification using convolutional neural networks. In: CLEF (Working Notes)
Google Scholar
Lasseck M (2013) Bird song classification in field recordings: winning solution for NIPS4B 2013 competition. In: Proc. of int. symp. Neural information scaled for bioacoustics, sabiod.org/nips4b, joint to NIPS, Nevada, pp 176–181
Google Scholar
Lee CH, Chou CH, Han CC, Huang RZ (2006) Automatic recognition of animal vocalizations using averaged MFCC and linear discriminant analysis. Pattern Recognition Letters 27(2):93–101
Article Google Scholar
Leng YR, Tran HD (2014) Multi-label bird classification using an ensemble classifier with simple features. In: Signal and information processing association annual summit and conference (APSIPA), 2014 Asia-Pacific. IEEE, pp 1–5
Google Scholar
McFee B, Raffel C, Liang D, Ellis DP, McVicar M, Battenberg E, Nieto O (2015) Librosa: audio and music signal analysis in python. In: Proceedings of the 14th python in science conference, vol 8, pp 18–25
Google Scholar
McLoughlin I, Xie Z, Song Y, Phan H, Palaniappan R (2020) Time–frequency feature fusion for noise robust audio event classification. Circ Syst Signal Process 39(3):1672–1687
Article Google Scholar
Müller L, Marti M (2018) Bird sound classification using a bidirectional LSTM. In: CLEF (Working Notes)
Google Scholar
Nanni L, Maguolo G, Brahnam S, Paci M (2020) An ensemble of convolutional neural networks for audio classification. arXiv preprint arXiv:2007.07966
Piczak KJ (2016) Recognizing bird species in audio recordings using deep convolutional neural networks. In: CLEF (working notes), pp 534–543
Google Scholar
Qiao Y, Qian K, Zhao Z (2020) Learning higher representations from bioacoustics: a sequence-to-sequence deep learning approach for bird sound classification. In: International conference on neural information processing. Springer, Cham, pp 130–138
Chapter Google Scholar
Sainath TN, Vinyals O, Senior A, Sak H (2015) Convolutional, long short-term memory, fully connected deep neural networks. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4580–4584
Chapter Google Scholar
Selin A, Turunen J, Tanttu JT (2006) Wavelets in recognition of bird sounds. EURASIP Journal on Advances in Signal Processing 2007:1–9
Article Google Scholar
Shore J, Johnson R (1980) Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy. IEEE Trans Inf Theory 26(1):26–37
Article MathSciNet Google Scholar
Stevens SS, Volkmann J, Newman EB (1937) A scale for the measurement of the psychological magnitude pitch. The Journal of the Acoustical Society of America 8(3):185–190
Article Google Scholar
Takahashi N, Gygli M, Van Gool L (2017) Aenet: learning deep audio features for video analysis. IEEE Transactions on Multimedia 20(3):513–524
Article Google Scholar
Tuncer T, Akbal E, Dogan S (2021) Multileveled ternary pattern and iterative ReliefF based bird sound classification. Appl Acoust 176:107866
Article Google Scholar
Xie J, Zhu M (2019) Handcrafted features and late fusion with deep learning for bird sound classification. Ecol Inform 52:74–81
Article Google Scholar
Xie, J. J., Ding, C. Q., Li, W. B., & Cai, C. H. (2018). Audio-only bird species automated identification method with limited training data based on multi-channel deep convolutional neural networks. arXiv preprint arXiv:1803.01107.
Zhang H, McLoughlin I, Song Y (2015) Robust sound event recognition using convolutional neural networks. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 559–563
Chapter Google Scholar
Zhang X, Chen A, Zhou G, Zhang Z, Huang X, Qiang X (2019) Spectrogram-frame linear network and continuous frame sequence for bird sound classification. Ecol Inform 54:101009
Article Google Scholar

Download references

Funding

This work supported in part by the National Natural Science Foundation of China (Grant No. 61703441).

Author information

Authors and Affiliations

Institute of Artificial Intelligence Application, College of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha, China
Zhihua Liu, Wenjie Chen, Aibin Chen, Guoxiong Zhou & Jizheng Yi

Authors

Zhihua Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wenjie Chen
View author publications
You can also search for this author in PubMed Google Scholar
Aibin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Guoxiong Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jizheng Yi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aibin Chen.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Z., Chen, W., Chen, A. et al. Birdsong classification based on multi feature channel fusion. Multimed Tools Appl 81, 15469–15490 (2022). https://doi.org/10.1007/s11042-022-12570-3

Download citation

Received: 05 November 2020
Revised: 25 February 2021
Accepted: 31 January 2022
Published: 28 February 2022
Issue Date: May 2022
DOI: https://doi.org/10.1007/s11042-022-12570-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Birdsong classification based on multi feature channel fusion

Abstract

Access this article

Similar content being viewed by others

Improved Convolutional Neural Networks for Acoustic Event Classification

Birdsong classification based on multi-feature fusion

Convolutional Neural Network-Gated Recurrent Unit Neural Network with Feature Fusion for Environmental Sound Classification

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Birdsong classification based on multi feature channel fusion

Abstract

Access this article

Similar content being viewed by others

Improved Convolutional Neural Networks for Acoustic Event Classification

Birdsong classification based on multi-feature fusion

Convolutional Neural Network-Gated Recurrent Unit Neural Network with Feature Fusion for Environmental Sound Classification

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation